Linguagens e Ambientes de Programação (2014/2015) [Eng.Inf. - DI/FCT/UNL]

Enunciado do 1º Projeto Prático (OCaml)

Artur Miguel Dias


Datas

  • 27/Mar (21:00) - Divulgação deste enunciado
  • 11/Abr (21:00) - Data e hora limite de entrega do 1º projeto
  • 15/Abr (21:00) - Data e hora limite de entrega com atraso. Um valor de penalização por cada dia de atraso.


    Changelog


    Regras de submissão

    Playing with XML


    Introduction

    XML (Extensible Markup Language) is a widely used textual data format for storing electronic documents and for representing data structures transfered between Web applications. It can express both rigidly structured data, like tables, and loosely structured data, like office documents.

    XML was initially created to overcome the limitations of HTML. While HTML is a textual data format oriented for the graphical presentation of data using a set of predefined tags (e.g. <p>, <h1>, <table>), XML is a format for representing data in general (regardless of their graphical appearance) and allows the users to invent their own tags.

    This LAP project explores the XML format in a basic way, using it as an excuse for practicing the manipulating of n-ary trees. There will more meaningful uses of XML in later courses of the MIEI.


    The XML format

    An XML document is a text document containing certain syntactic constructions that add structure and meaning to the text. These syntactic constructions use special codes called tags that the user invents to convey intended meanings.

    To illustrate, here is an XML document with some data about a lady called Alice and her children:

    Only two tags are used in this example: 'person' and 'children'. When the tags are well chosen, the XML document becomes self-explanatory. You should be able to answer the following questions about Alice and her children: How old is Alice? How many children does she have? What is the name and age of her children? Does she have grandchildren?

    This Alice document describes a n-ary tree. As a matter of fact, every XML document describes a n-ary tree. The part of the text that describes the entire tree or any subtree is called an XML element.

    Syntactically, each XML element begins with the opening of a tag, e.g. <person>, and ends with the closing of this same tag, </person>. The opening of a tag can optionally include some attributes with the corresponding values, as in this example <person gender='M'>. Finally, between the opening and closing of a mark, lies the contents of the element, consisting of plain text optionally interspersed with XML sub-elements. The plain text component is traditionally called pcdata. In the Alice example, the contents of the main element consists in: the segment of pcdata "Alice"; then a sub-element describing the children of Alice; finally the segment of pcdata "35 years old".

    An element without contents, such as <children></children> can be abbreviated like this <children/>.

    Now, let us observe a large XML document - a well known comedy play by William Shakespeare, encoded in XML:


    The Xml-Light library

    XML-Light is the name of an OCaml library, developed by Nicolas Cannasse, that supports XML processing in OCaml. It is a free software package, subject to the LGPL license. It is not part of the official distribution of OCaml.

    This library consists of a small number of modules, but we only need to deal with a particular module called Xml. This module is mainly concerned with the syntax of XML and therefore provide some functions for reading and writing XML documents. The reading functions convert the XML syntax to an OCaml representation, and the writing functions do just the opposite, converting the OCaml representation to XML syntax.

    Representation of XML in OCaml

    The OCaml representation of XML in the module Xml is defined by the following recursive type.

    Any value of this type, we will call an xml tree, or more simply a tree.

    The type xml is a sum type with the following two variants:

    To better understand the type xml, here is how the Alice example gets represented:

    The functions of the module Xml

    The functions provided by the Xml module are documented here: Xml.html.

    The relevant functions for our project are these:

    Examples

    Here is a small program written in OCaml that exemplifies the use of the parser and the printer supplied by the module Xml: Now, an example of a function that processes an xml tree. This function checks if there is any pcdata in a given xml tree.

    Using the Xml-Light library

    Get the source code of Xml-Light here: xml-light-2.2.zip.

    To compile the library, use the following compound command:

    Eclipse, using the compiler

    Do this inside Eclipse:

    1. Create the MoreXml project in the usual way (don't forget, it is an "OCaml Managed Project"!)
    2. Project > Properties > Project Paths > Add -----> the full path to the directory xml-light
    3. Project > Properties > Ocaml Build Flags > Add -----> xml-light.cma
    4. In the file "MoreXml.ml", the following line must appear before the definition of the functions:
        open Xml ;;
        

    Eclipse, using the interpreter

    Do this inside Eclipse:

    1. Create the MoreXml project in the usual way (don't forget, it is an "OCaml Managed Project"!)
    2. Simply place the following three lines before the definition of the functions:
        #cd "the full path to the directory xml-light"
        #load "xml-light.cma"
        open Xml;;
        

    Plan B for Eclipse

    If are unable to make any of the previous technique work, here is a simpler but inferior alternative:

    1. Forget about the Xml-Light library altogether.
    2. Place this definition directly in your program:
        type xml =
              Element of string * (string * string) list * xml list
            | PCData of string
        ;;
        
    3. You can obtain here two XML documents alterady converted to the OCaml representation.

    Console

    To develop the project using a simple text editor and the ocaml interpreter running in a console:

    1. Copy the files xml.cmi and xml-light.cma to the directory of your project
        cp xml.cmi xml-light.cma MyProjDir
        
    2. In the file "MoreXml.ml", the following two lines must appear before the definition of the functions:
        #load "xml-light.cma" ;;
        open Xml ;;
        

    The goal of this project

    The module Xml offers a XML parser and a XML printer. However, to help in the task of writing XML processing programs, we feel the need for some more general purpose XML oriented functions.

    The goal of this project is to develop a new open module, called "MoreXml", that would provide a collection of generic XML processing functions implemented on the top of what the module Xml already offers.

    The name of the source file must be "MoreXml.ml".


    The open module MoreXml

    Here are the specifications of the functions to implement:
    tag : xml -> string
        (* t      res *)
    
    attributes : xml -> (string * string) list
               (* t      res *)
    
    contents : xml -> xml list
             (* t      res *)
    
    leaf : xml -> bool
         (* t     res *)
    
    size : xml -> int
         (* t     res *)
    
    height : xml -> int
           (* t     res *)
    
    width : xml -> int
          (* t     res *)
    
    changeTag : string -> string -> xml -> xml
              (* g1       g2        t      res *)
    
    select : (xml -> bool) -> xml list -> xml list
           (*  f                tl          res *)
    
    select_d : (xml -> bool) -> xml list -> xml list
             (*  f                tl         res *)
    
    count : (xml -> bool) -> xml list -> int
          (*  f               tl         res *)
    
    count_d : (xml -> bool) -> xml list -> int
            (*  f               tl         res *)
    
    project : string -> xml list -> xml list
            (* tag       tl          res *)
    
    project_d : string -> xml list -> xml list
              (* tag       tl         res *)
    
    transform : (xml -> xml) -> xml list -> xml list
              (*  f               tl          res *)
    
    transform_d : (xml -> xml) -> xml list -> xml list
                (*  f               tl          res *)
    
    shakespeare : xml -> int   *  int * int   *  int    *   int
                (* t     speeches lines speakers minspeech  maxspeech *)
    

    Regras principais


    Regras de entrega


    Outras regras


    Avaliação

    O docente responsável pela gestão e pela avaliação deste trabalho é o Professor Artur Miguel Dias.

    A nota do projeto será em grande parte determinada por meios automáticos, através do Mooshak. Portanto é essencial respeitar a especificação contida neste enunciado, em todos os seus detalhes.

    Mas, relativamente a programas que funcionem minimamente, também haverá uma apreciação mais subjetiva da qualidade, tendo em conta aspetos, tais como:

    Obviamente não é obrigatório fazer o trabalho todo para obter nota positiva. Mas, claro, vale a pena trabalhar para produzir uma solução bastante completa e com a melhor qualidade possível.


    Observações


    Final

    Bom trabalho! Esperamos que goste.