Linguagens e Ambientes de Programação (2014/2015) [Eng.Inf. - DI/FCT/UNL]

Enunciado do 1º Projeto Prático (OCaml)

Artur Miguel Dias

Datas

27/Mar (21:00) - Divulgação deste enunciado

11/Abr (21:00) - Data e hora limite de entrega do 1º projeto

15/Abr (21:00) - Data e hora limite de entrega com atraso. Um valor de penalização por cada dia de atraso.

Changelog

06/Abr - Foram adicionadas, logo no início do enunciado, as regras de submissão, que envolvem dois concursos do Mooshak: "concurso de teste" e "concurso para submissão final".
03/Abr - Havia uma ocorrência em excesso de "list" no tipo do argumento "f" das funções "transform" e "transform_d".
02/Abr - Nova secção "Eclipse, using the interpreter", com mais uma via a tentar, no caso de você continuar a ter problemas em usar a biblioteca Xml-Light no Eclipse.
01/Abr - Nova secção "Plan B for Eclipse", que interessa a quem não estiver a conseguir usar o plugin de OCaml dentro do Eclipse.
29/Mar - A secção "Installing the Xml-Light library" foi renomeada "Using the Xml-Light library" e agora explica como se usa a biblioteca do XML dentro do Eclipse e na consola.
27/Mar - Possíveis correções a este enunciado serão assinaladas aqui.

Regras de submissão

Playing with XML

Introduction

XML (Extensible Markup Language) is a widely used textual data format for storing electronic documents and for representing data structures transfered between Web applications. It can express both rigidly structured data, like tables, and loosely structured data, like office documents.

XML was initially created to overcome the limitations of HTML. While HTML is a textual data format oriented for the graphical presentation of data using a set of predefined tags (e.g. <p>, <h1>, <table>), XML is a format for representing data in general (regardless of their graphical appearance) and allows the users to invent their own tags.

This LAP project explores the XML format in a basic way, using it as an excuse for practicing the manipulating of n-ary trees. There will more meaningful uses of XML in later courses of the MIEI.

The XML format

An XML document is a text document containing certain syntactic constructions that add structure and meaning to the text. These syntactic constructions use special codes called tags that the user invents to convey intended meanings.

To illustrate, here is an XML document with some data about a lady called Alice and her children:

<person gender='F'>
    Alice
    <children>
        <person gender='M'> Andre <children></children> 5 years old </person>
        <person gender='F'> Maria <children></children> 4 years old </person>
    </children>
    35 years old
</person>

Only two tags are used in this example: 'person' and 'children'. When the tags are well chosen, the XML document becomes self-explanatory. You should be able to answer the following questions about Alice and her children: How old is Alice? How many children does she have? What is the name and age of her children? Does she have grandchildren?

This Alice document describes a n-ary tree. As a matter of fact, every XML document describes a n-ary tree. The part of the text that describes the entire tree or any subtree is called an XML element.

Syntactically, each XML element begins with the opening of a tag, e.g. <person>, and ends with the closing of this same tag, </person>. The opening of a tag can optionally include some attributes with the corresponding values, as in this example <person gender='M'>. Finally, between the opening and closing of a mark, lies the contents of the element, consisting of plain text optionally interspersed with XML sub-elements. The plain text component is traditionally called pcdata. In the Alice example, the contents of the main element consists in: the segment of pcdata "Alice"; then a sub-element describing the children of Alice; finally the segment of pcdata "35 years old".

An element without contents, such as <children></children> can be abbreviated like this <children/>.

Now, let us observe a large XML document - a well known comedy play by William Shakespeare, encoded in XML:

A Midsummer Night's Dream

The Xml-Light library

XML-Light is the name of an OCaml library, developed by Nicolas Cannasse, that supports XML processing in OCaml. It is a free software package, subject to the LGPL license. It is not part of the official distribution of OCaml.

This library consists of a small number of modules, but we only need to deal with a particular module called Xml. This module is mainly concerned with the syntax of XML and therefore provide some functions for reading and writing XML documents. The reading functions convert the XML syntax to an OCaml representation, and the writing functions do just the opposite, converting the OCaml representation to XML syntax.

Representation of XML in OCaml

The OCaml representation of XML in the module Xml is defined by the following recursive type.

type xml =
      Element of string * (string * string) list * xml list
    | PCData of string
;;

Any value of this type, we will call an xml tree, or more simply a tree.

The type xml is a sum type with the following two variants:

Element - represents an XML element, characterized by a tag, a list of attributes with the corresponding values, and a contents list; any value of this variant we will call an element.
PCData - represents a maximal segment of plain text, occurring inside the contents of a XML element; any value of this variant we will call a pcdata.

To better understand the type xml, here is how the Alice example gets represented:

let alice =
    Element ("person", [("gender", "F")], [
        PCData " Alice ";
        Element ("children", [], [
            Element ("person", [("gender","M")], [
                PCData " Andre "; Element ("children", [], []); PCData " 5 years old "
            ]);
            Element ("person", [("gender","F")], [
                PCData " Maria "; Element ("children", [], []); PCData " 4 years old "
            ])
        ]);
        PCData " 35 years old "
    ]) ;;

The functions of the module Xml

The functions provided by the Xml module are documented here: Xml.html.

The relevant functions for our project are these:

tag : xml -> string
pcdata : xml -> string
attribs : xml -> (string * string) list
attrib : xml -> string -> string
children : xml -> xml list
parse_file : string -> xml
parse_in : in_channel -> xml
parse_string : string -> xml
to_string : xml -> string
to_string_fmt : xml -> string

Examples

Here is a small program written in OCaml that exemplifies the use of the parser and the printer supplied by the module Xml:

#load "xml-light.cma" ;; (* Loads the library Xml-Light -- remove this line in Eclipse *)
open Xml ;;              (* Allows direct access to the symbols of the module Xml *)

let t = parse_string "
    <person gender='F'>
        Alice
        <children>
            <person gender='M'> Andre <children/> 5 anos </person>
            <person gender='F'> Maria <children/> 4 anos </person>
        </children>
        35 anos
    </person>"
in
    print_string ("XML formated = " ^ to_string_fmt t ^ "\n")
;;

Now, an example of a function that processes an xml tree. This function checks if there is any pcdata in a given xml tree.

let rec existsPCData t =
    match t with
          Element (g,a,cs) -> lexistsPCData cs
        | PCData _ -> true
and lexistsPCData tl =
    match tl with
          [] -> false
        | t::ts -> existsPCData t || lexistsPCData ts
;;

Using the Xml-Light library

Get the source code of Xml-Light here: xml-light-2.2.zip.

To compile the library, use the following compound command:

unzip xml-light-2.2.zip ; cd xml-light ; make ; make

Eclipse, using the compiler

Do this inside Eclipse:

Create the MoreXml project in the usual way (don't forget, it is an "OCaml Managed Project"!)
Project > Properties > Project Paths > Add -----> the full path to the directory xml-light
Project > Properties > Ocaml Build Flags > Add -----> xml-light.cma
In the file "MoreXml.ml", the following line must appear before the definition of the functions:

Eclipse, using the interpreter

Do this inside Eclipse:

Create the MoreXml project in the usual way (don't forget, it is an "OCaml Managed Project"!)

Simply place the following three lines before the definition of the functions:

#cd "the full path to the directory xml-light"
#load "xml-light.cma"
open Xml;;

Plan B for Eclipse

If are unable to make any of the previous technique work, here is a simpler but inferior alternative:

Forget about the Xml-Light library altogether.

Place this definition directly in your program:

type xml =
      Element of string * (string * string) list * xml list
    | PCData of string
;;

You can obtain here two XML documents alterady converted to the OCaml representation.

Console

To develop the project using a simple text editor and the ocaml interpreter running in a console:

Copy the files xml.cmi and xml-light.cma to the directory of your project
In the file "MoreXml.ml", the following two lines must appear before the definition of the functions:

The goal of this project

The module Xml offers a XML parser and a XML printer. However, to help in the task of writing XML processing programs, we feel the need for some more general purpose XML oriented functions.

The goal of this project is to develop a new open module, called "MoreXml", that would provide a collection of generic XML processing functions implemented on the top of what the module Xml already offers.

The name of the source file must be "MoreXml.ml".

The open module MoreXml

Here are the specifications of the functions to implement:

tag : xml -> string
    (* t      res *)

t

attributes : xml -> (string * string) list
           (* t      res *)

t

contents : xml -> xml list
         (* t      res *)

t

leaf : xml -> bool
     (* t     res *)

t

size : xml -> int
     (* t     res *)

t

height : xml -> int
       (* t     res *)

t

width : xml -> int
      (* t     res *)

t

changeTag : string -> string -> xml -> xml
          (* g1       g2        t      res *)

t

g1

g2

select : (xml -> bool) -> xml list -> xml list
       (*  f                tl          res *)

tl

f

select_d : (xml -> bool) -> xml list -> xml list
         (*  f                tl         res *)

tl

f

count : (xml -> bool) -> xml list -> int
      (*  f               tl         res *)

tl

f

count_d : (xml -> bool) -> xml list -> int
        (*  f               tl         res *)

tl

f

project : string -> xml list -> xml list
        (* tag       tl          res *)

tl

tag

project_d : string -> xml list -> xml list
          (* tag       tl         res *)

tl

tag

transform : (xml -> xml) -> xml list -> xml list
          (*  f               tl          res *)

f

tl

f

The function f is allowed to call the function transform. This is useful if the transformation of the input tree is a depth transformation (i.e. not shallow).

transform_d : (xml -> xml) -> xml list -> xml list
            (*  f               tl          res *)

f

tl

f

transform_d

This function is similar to transform except for the automatic propagation of the transformation to the contents of the elements that generate exceptions.

transform_d supports systematic depth transformations of a list of xml trees. The function f can consider several patterns of interest and specify transformations for the input trees matching those patterns. The non-matching input trees should rise exceptions, for example "Match_failure" exceptions.

The function f is allowed to call the function transform_d. This is useful is the transformation of an input tree matching a pattern of interest requires a depth transformation (i.e. not shallow).

shakespeare : xml -> int   *  int * int   *  int    *   int
            (* t     speeches lines speakers minspeech  maxspeech *)

this play

Total number of SPEECHes.
Total number of LINEs.
Total number of distinct SPEAKERs.
The length of the shortest SPEECH, measured in the number of LINEs.
The length of the longest SPEECH, measured in the number of LINEs.

Regras principais

Produza um ficheiro chamado MoreXml.ml. Nas regras de submissão será explicada a forma de submeter no Mooshak.
O ficheiro "MoreXml.ml" tem de incluir logo nas primeiras linhas, um comentário inicial contendo: o nome e número dos dois alunos que realizaram o projeto; indicação de quais as partes do trabalho que foram feitas e das que não foram feitas (para facilitar uma correção sem enganos); ainda possivelmente alertando para alguns aspetos da implementação que possam ser menos óbvios para o avaliador.
O projeto é para ser realizado por grupos de dois alunos. Um projeto entregue por três ou mais alunos vale zero valores. Poderão ser permitidos grupos de um aluno em circunstâncias especiais que têm de ser bem explicadas no comentário inicial atrás referido.
Na realização deste projeto é proibido usar os mecanismos imperativos que a linguagem OCaml suporta, mas não foram estudados nas aulas.
Mesmo que desenvolva o programa em Windows ou no MacOS, a versão final do seu programa deverá correr no sistema Linux instalado nos laboratórios.
Programe as funções recursivas usando o método indutivo. Também pode usar livremente funções de biblioteca, especialmente as disponíveis no módulo List.
O programa deve ser bem indentado, por forma a ficar bem legível. Além disso, a largura do programa não deve exceder as 80 colunas para poderem ser impressos. Podem haver algumas exceções, muito pontuais.
O não cumprimento das regras anteriores implica penalizações automáticas na nota.

Regras de entrega

Será ativado um concurso do Mooshak, que servirá para submeter os trabalhos. Os detalhes da forma de fazer a submissão serão divulgados nessa altura. Até lá preocupe-se apenas em escrever um bom programa.
Depois do prazo limite ainda se aceitam trabalhos atrasados, mas com penalizações na nota. Mais detalhes nas primeiras linhas deste enunciado.

Outras regras

Apesar de o projeto ser de grupo, cada aluno, a título individual, tem a responsabilidade de responder por todo o projeto. Assim é indispensável que os dois membros de cada grupo programem efetivamente.
Não se proíbe que alunos de turnos práticos diferentes façam grupo. Isso é apenas desaconselhado.
Não há inscrição prévia dos grupos e basta que cada trabalho tenha 2 autores identificados.
A nota máxima do projeto é 20 valores.

Avaliação

O docente responsável pela gestão e pela avaliação deste trabalho é o Professor Artur Miguel Dias.

A nota do projeto será em grande parte determinada por meios automáticos, através do Mooshak. Portanto é essencial respeitar a especificação contida neste enunciado, em todos os seus detalhes.

Mas, relativamente a programas que funcionem minimamente, também haverá uma apreciação mais subjetiva da qualidade, tendo em conta aspetos, tais como:

organização,
clareza e simplicidade das ideias programadas,
bom uso da linguagem,
legibilidade do código,
em alguma medida, eficiência.

Obviamente não é obrigatório fazer o trabalho todo para obter nota positiva. Mas, claro, vale a pena trabalhar para produzir uma solução bastante completa e com a melhor qualidade possível.

Observações

Os grupos são incentivados a discutir entre si aspetos do projeto, inclusivamente no fórum. Mas sempre que chega o momento de escrever código concreto, esse tem de ser um esforço interno a cada grupo (trabalhando de forma independente de todos os outros grupos). A escrita de código exige esforço intelectual, mas só com esforço se consegue evoluir.
O objetivo deste projeto é levar os alunos a praticar. Um aluno que pratique de forma genuína ganha experiência e provavelmente não terá dificuldade em conseguir aprovação nos testes e exames.
Cuidado com as fraudes. Por exemplo, se alguém dum grupo oferecer o projeto resolvido a um elemento de outro grupo, trata-se duma fraude envolvendo dois grupos. Também se um grupo deixa distraidamente a área aberta e se alguém de outro grupo "rouba" o projeto, então também se considera fraude dos dois grupos. Ainda um terceiro caso: se dois grupos se juntam para fazer o projeto conjuntamente e depois o entregam em duplicado, então também se considera fraude. Em suma, cada grupo é responsável pelo seu projeto e não o pode mostrar ou oferecer, direta ou indiretamente, de propósito ou sem querer, o seu código a outro grupo. Note que é muito melhor ter zero num dos três projetos do que ser logo excluído da cadeira por motivo de fraude.

Final

Bom trabalho! Esperamos que goste.