Linguagens e Ambientes de Programação (2022/2023) [Eng.Inf. - DI/FCT/UNL]

Enunciado do 2º Projeto Prático (C)

Artur Miguel Dias

Datas

05/maio (21:00) - Divulgação da versão provisória do enunciado
08/maio (09:00) - Divulgação da versão final do enunciado
14/maio (20:00) - Data e hora limite de entrega do 1º projeto.
16/maio (20:01) - Data e hora limite de entrega com atraso. Um valor de penalização por cada hora de atraso (ou parte duma hora).

Changelog

05/maio - Possíveis correções a este enunciado serão assinaladas aqui.

Submission rules

Starting point

We offer, as a starting point, this single file: Linker.c. You are allowed to change everything in the file.

Let's write a linker

What is a linker

It is a good practice to create computer programs using multiple modules. If each module has its own responsibility, it becomes easier to manage the complexity of the program and improve its long term maintainability.

In general, each module defines various entities - such as global variables and functions - with each entity being identified by a name. There are three kinds of use of names that must be considered:

private names - used to identify internal entities that are not exported by the module.
public names - used to identify internal entities that are exported by the module.
undefined names - used to reference extern entities, hopefully exported by one of the other modules.

When a module is compiled, it gives rise to a distinct object file.

A linker is a special program that takes multiple object files and combines them into a single executable program. The main task of the linker is to resolve name references across object files, that is to bind these names to memory addresses in the data segment and code segment, and perform other necessary tasks to prepare the final executable file. The linker acts as the final step in the software build process, bringing together all the necessary components to create a fully functional executable program.

In the final part of lesson 9, we briefly address the topic of linkers.

Note that the linker performs name resolution with the help of name tables. Each object file comes with its own name table. The name table stores information about the names that are defined inside the object file and about the extern names that are referenced in the object file. The table associates with each name several attributes: the kind, the public status, the address, etc.

In some cases, the linker may not be able to resolve a name reference, either because a definition is missing or because there are multiple definitions of a public name. In these cases, the linker will generate some for of error information. It is the responsibility of the programmer to ensure that all symbol references make sense.

Runtime environment

The final executable program will run in an environment that contains two segments of memory: the data segment and the code segment. The first segment contains the global variables. The second segment contain the code of the functions. The two segments are independent and the addresses of each segment start at 0000.

Only absolute addressing is supported at runtime. This applies to global variables and to code.

The space occupied by a variable is always 4 bytes. When the program runs, the address of the first variable is 0000, the address of the second variable is 0004, etc.

Each instruction occupies a multiple of 4 bytes. So the length of each function is a multiple of 4. The address of the first function is 0000.

Relocation of addresses

Each object files contains a name table at the beginning, followed by the code of the functions defined in the object file.

In each object file, the addresses of all the locally defined names of variables and functions are seen as local addresses and start at 0000.

However, when several object files are consolidated in the final executable file, all these addresses will need to be relocated to global addresses. This is because the variables of all the object files must be placed in a single data segment and the functions of all the object files must be placed in a single code segment.

The format of the input

Our linker program reads from the standard input one sequence of lines representing a succession of object files. The format is described in this section and you will see an example below. It is assumed that the input is valid in terms of the form and meaning. Please, don't waste time validating the input.

The representation of each object file start with a line consisting in "---".

Next appears a name table, consisting in zero or more lines. Each line describes a name used in the object file and has in 5 attributes :

entry number - a non-negative integer that identifies a table entry - the entry numbers are assigned sequentially, starting at 0001. The integer is written using 4 digits.
kind - there are two kinds of names: "V" variable; "F" function.
status - there are three possible status for names: "P" private; "X" public; "U" undefined.
address - the address that the name represents. It is defined when the status is "P" or "X"; undefined otherwise. The address is written using 4 digits.
name - the name of a variable or function. All the names are distinct in a name table.

After the name table, comes the code of all the functions of the object file. The code of each function is represented by a simple sequence of integer values, one integer per line. The non-negative and the negative integers have different roles: The non-negative integer represent normal code that will not be touched by the linker. The negative numbers represent entries of the name table, and linker will replace these negative numbers with absolute addresses. For example, the integer -3 represents the entry 0003 of the object file and will be replaced by an absolute address during the name resolution process.

Example of input file

This example describes a sequence of three object files.

Note that all the names that are defined (status = "P" or status = "X") have an associated local address. The address of the undefined names (status = "U") is filled up with the address 0000.

A told before, all the names must be distinct in each name table. However, is normal to have some repetitions across several object files. For example, the name "global_x" appears in the first two object files, in both cases privately. The name "global_e" also appears in the first two object files, first as an undefined name, and next as an exported definition. There are more situations where repetitions are normal.

Note in the code of each object file the negative values representing entries of the name table.

---
0001 V P 0000 global_x
0002 V P 0004 global_y
0003 V P 0008 global_z
0004 V U 0000 global_e
0005 V U 0000 global_f
0006 V X 0012 global_a
0007 V X 0016 global_b
0008 V X 0020 global_c
0009 F U 0000 g
0010 F X 0000 h
0011 F U 0000 printf
0012 F X 0020 f
0013 F X 0040 main
11001
16535
   -6
14324
15435
11002
   -3
   -9
15435
14355 
11003
12345
   -5
24543
---
0001 V P 0000 global_x
0002 V U 0000 global_a
0003 V X 0004 global_b
0004 V P 0008 global_c
0005 V U 0000 global_y
0006 V X 0012 global_e
0007 V P 0016 global_h
0008 F U 0000 printf
0009 F X 0000 g
0010 F X 0016 h
11004
16535
   -1
14324
11005
15435
22223 
   -1
14324
15435
14355
---
0001 F X 0000 printf
11006
14324
15435

The format of the output

The format of the final executable program is quite similar to a single object file.

The representation of the executable program start with a line consisting in "+++".

Next appears a name table, consisting in zero or more lines. Each line describes a name used in the program and has in 5 attributes :

entry number - a non-negative integer that identifies a table entry - the entry numbers are assigned sequentially, starting at 0001. The integer is written using 4 digits.
kind - there are two kinds of names: "V" variable; "F" function.
status - there are two possible status for names: "A" absolute address; "E" error.
address - the address that the name represents. It is defined when the status is "A". When the status is "E", it is used the address 0000. The address is written using 4 digits.
name - the name of a variable or function.

In simple terms, the name table joins the tables of the various object files and assigns a absolute addresses to each name.

The treatment of private and public names is different:

In the case of the private names, we know that they are used exclusively in the scope of the object files where they are defined - meaning that the entities they represent are not shared with other object files. Therefore these names can appear multiple times in the final name table. For example, if a name if privately defined in two object files, this name will appear twice in the final table. Dealing with the private names should be trivial for the linker. The linker only needs to bind each private name with the absolute address of a variable or function (and change the status of the entry to "A").

In the case of a public name, we know that it represents a single entity that is defined in one object file and shared with the other object files. Therefore, in the final table, this single entity name is allowed to appear only once as public (but can also appears multiple times as private). The linker ensures the uniqueness of the public uses of each name in the final table.

For each public name, the main task of the linker is to perform name resolution, that is to associate the definition of public name in a object file (status "X") with the multiple uses of the same name in other object files (status "U"). The "X" entry and the various "U" entries are consolidated in an single entry in the final table. The linker also binds each public name with the correct absolute address (and change the status of the entry to "A" if all goes well; the status becomes "E" in case of linkage error).

The sequence of names in the final table results from the concatenation in the same order of the names in the various object files. However, the various entries associated to the same public name are consolidated in the entry that appears in the first place; and the other entries are removed to ensure uniqueness. In the example, study the case of the name "global_e".

In the output file, after the table, comes the code of all the functions. The code of all the object files is concatenated in the same order it appears in the object files. Furthermore, all the negative values, representing entries of the name tables, are replaced by the absolute addresses associated to the entries, and these addresses are written using 4 digits.

Example of output file

This is the output file corresponding to the input example:

+++
0001 V A 0000 global_x
0002 V A 0004 global_y
0003 V A 0008 global_z
0004 V A 0012 global_e
0005 V E 0000 global_f
0006 V A 0016 global_a
0007 V E 0000 global_b
0008 V A 0020 global_c
0009 F A 0056 g
0010 F E 0000 h
0011 F A 0100 printf
0012 F A 0020 f
0013 F A 0040 main
0014 V A 0024 global_x
0015 V A 0028 global_c
0016 V E 0000 global_y
0017 V A 0032 global_h
11001
16535
 0016
14324
15435
11002
 0008
 0056
15435
14355 
11003
12345
 0000
24543
11004
16535
 0024
14324
11005
15435
22223 
 0024
14324
15435
14355
11006
14324
15435

Limits

Maximum length of a name = 64
Maximum number of object files in the input file = 10
Maximum number of name entries in each object file = 100
Maximum length of the code in each object file = 100
Minimum number of object files in the input file = 0
Minimum number of name entries in each object file = 0
Minimum length of the code in each object file = 0

Regras principais

Produza um ficheiro chamado Linker.c. Nas regras de submissão, a publicar mais tarde, será explicada a forma de submeter no Mooshak.
O ficheiro Linker.c tem de incluir, logo nas primeiras linhas, um comentário inicial contendo: o nome e número dos alunos que realizaram o projeto; indicação de quais as partes do trabalho que foram feitas e das que não foram feitas (para facilitar uma correção sem enganos); ainda possivelmente alertando para alguns aspetos da implementação que possam ser menos óbvios para o avaliador.
O projeto é para ser realizado por grupos de dois alunos. Um projeto entregue por três ou mais alunos vale zero valores. Poderão ser permitidos grupos de um aluno em circunstâncias absolutamente especiais que terão de ser previamente autorizadas por AMD - mas estão automaticamente autorizados, sem ser preciso perguntar, os alunos Erasmus, mais os alunos com o estatuto formal de trabalhadores estudantes, mais os alunos com o estatuto formal ENEE.
Mesmo que desenvolva o programa em Windows ou no MacOS, a versão final do seu programa deverá correr na versão do sistema Linux instalado nos laboratórios (equivalente à VM fornecida).
O programa deve ser bem indentado, por forma a ficar bem legível. Além disso, a largura do programa não deve exceder as 100 colunas por razões de legibilidade. Podem haver algumas exceções, muito pontuais. Considera-se que um TAB ocupa quatro posições. O programa deve ser bem indentado, por forma a ficar bem legível.

Regras de entrega

Será ativado um concurso do Mooshak, que servirá para submeter os trabalhos. Os detalhes da forma de fazer a submissão serão divulgados nessa altura. Até lá preocupe-se apenas em escrever um bom programa.
Depois do prazo limite ainda se aceitam trabalhos atrasados, mas com penalizações na nota. Mais detalhes nas primeiras linhas deste enunciado.

Outras regras

Apesar de o projeto ser de grupo, cada aluno, a título individual, tem a responsabilidade de responder por todo o projeto. Assim, é indispensável que os dois membros de cada grupo programem efetivamente.
Não se proíbe que alunos de turnos práticos diferentes façam grupo. Isso é apenas desaconselhado.
Não há inscrição prévia dos grupos e basta que cada trabalho tenha 2 autores identificados.
A nota máxima do projeto é 20 valores.

Avaliação

O docente responsável pela gestão e pela avaliação deste trabalho são os Profs Miguel Monteiro e Artur Miguel Dias.

A maior parte da nota (cerca de 80%) será determinada pelo Mooshak, que testa a correção das funcionalidades implementadas.

Na parte mais subjetiva da apreciação da qualidade do código, serão tidos em conta aspetos, tais como:

organização;
clareza e simplicidade das ideias programadas;
bom uso da linguagem, usando o paradigma imperativo, neste caso;
legibilidade do código;

Observações

Os grupos são incentivados a discutir entre si aspetos gerais da realização do projeto (inclusivamente no fórum). Mas sempre que chega o momento de encontrar soluções detalhadas e escrever código concreto, esse tem de ser um esforço interno a cada grupo. A resolução de problemas e a escrita de código exige esforço intelectual, mas só com esforço se consegue evoluir.
O objetivo deste projeto é levar os alunos a praticar. Um aluno que pratique de forma genuína ganha experiência e provavelmente não terá dificuldade em conseguir aprovação no exame final.
Cuidado com as fraudes. Por exemplo, se alguém dum grupo oferecer o projeto resolvido a um elemento de outro grupo, trata-se duma fraude envolvendo dois grupos (já tem acontecido). Também se um grupo deixa distraidamente a área aberta e se alguém de outro grupo "rouba" o projeto, então também se considera fraude dos dois grupos (já tem acontecido). Ainda um terceiro caso: se dois grupos se juntam para fazer o projeto conjuntamente e depois o entregam em duplicado, então também se considera fraude (já tem acontecido). Além destes três exemplos, existem muitas outras situações de fraude. Em suma, cada grupo é responsável pelo seu projeto, tem de produzir código original, e não o pode mostrar ou oferecer, direta ou indiretamente, o seu código a outro grupo. Note que é muito melhor ter zero num dos três projetos do que ser logo excluído da cadeira por motivo de fraude.

Final

Bom trabalho! Esperamos que goste.