Linguagens e Ambientes de Programação (2022/2023) [Eng.Inf. - DI/FCT/UNL]

Enunciado do 2º Projeto Prático (C)

Artur Miguel Dias


Datas

Changelog

  • 05/maio - Possíveis correções a este enunciado serão assinaladas aqui.


    Submission rules

    Starting point

    We offer, as a starting point, this single file: Linker.c. You are allowed to change everything in the file.


    Let's write a linker

    What is a linker

    It is a good practice to create computer programs using multiple modules. If each module has its own responsibility, it becomes easier to manage the complexity of the program and improve its long term maintainability.

    In general, each module defines various entities - such as global variables and functions - with each entity being identified by a name. There are three kinds of use of names that must be considered:

    When a module is compiled, it gives rise to a distinct object file.

    A linker is a special program that takes multiple object files and combines them into a single executable program. The main task of the linker is to resolve name references across object files, that is to bind these names to memory addresses in the data segment and code segment, and perform other necessary tasks to prepare the final executable file. The linker acts as the final step in the software build process, bringing together all the necessary components to create a fully functional executable program.

    In the final part of lesson 9, we briefly address the topic of linkers.

    Note that the linker performs name resolution with the help of name tables. Each object file comes with its own name table. The name table stores information about the names that are defined inside the object file and about the extern names that are referenced in the object file. The table associates with each name several attributes: the kind, the public status, the address, etc.

    In some cases, the linker may not be able to resolve a name reference, either because a definition is missing or because there are multiple definitions of a public name. In these cases, the linker will generate some for of error information. It is the responsibility of the programmer to ensure that all symbol references make sense.

    Runtime environment

    The final executable program will run in an environment that contains two segments of memory: the data segment and the code segment. The first segment contains the global variables. The second segment contain the code of the functions. The two segments are independent and the addresses of each segment start at 0000.

    Only absolute addressing is supported at runtime. This applies to global variables and to code.

    The space occupied by a variable is always 4 bytes. When the program runs, the address of the first variable is 0000, the address of the second variable is 0004, etc.

    Each instruction occupies a multiple of 4 bytes. So the length of each function is a multiple of 4. The address of the first function is 0000.

    Relocation of addresses

    Each object files contains a name table at the beginning, followed by the code of the functions defined in the object file.

    In each object file, the addresses of all the locally defined names of variables and functions are seen as local addresses and start at 0000.

    However, when several object files are consolidated in the final executable file, all these addresses will need to be relocated to global addresses. This is because the variables of all the object files must be placed in a single data segment and the functions of all the object files must be placed in a single code segment.


    The format of the input

    Our linker program reads from the standard input one sequence of lines representing a succession of object files. The format is described in this section and you will see an example below. It is assumed that the input is valid in terms of the form and meaning. Please, don't waste time validating the input.

    The representation of each object file start with a line consisting in "---".

    Next appears a name table, consisting in zero or more lines. Each line describes a name used in the object file and has in 5 attributes :

    After the name table, comes the code of all the functions of the object file. The code of each function is represented by a simple sequence of integer values, one integer per line. The non-negative and the negative integers have different roles: The non-negative integer represent normal code that will not be touched by the linker. The negative numbers represent entries of the name table, and linker will replace these negative numbers with absolute addresses. For example, the integer -3 represents the entry 0003 of the object file and will be replaced by an absolute address during the name resolution process.

    Example of input file

    This example describes a sequence of three object files.

    Note that all the names that are defined (status = "P" or status = "X") have an associated local address. The address of the undefined names (status = "U") is filled up with the address 0000.

    A told before, all the names must be distinct in each name table. However, is normal to have some repetitions across several object files. For example, the name "global_x" appears in the first two object files, in both cases privately. The name "global_e" also appears in the first two object files, first as an undefined name, and next as an exported definition. There are more situations where repetitions are normal.

    Note in the code of each object file the negative values representing entries of the name table.


    The format of the output

    The format of the final executable program is quite similar to a single object file.

    The representation of the executable program start with a line consisting in "+++".

    Next appears a name table, consisting in zero or more lines. Each line describes a name used in the program and has in 5 attributes :

    In simple terms, the name table joins the tables of the various object files and assigns a absolute addresses to each name.

    The treatment of private and public names is different:

    In the case of the private names, we know that they are used exclusively in the scope of the object files where they are defined - meaning that the entities they represent are not shared with other object files. Therefore these names can appear multiple times in the final name table. For example, if a name if privately defined in two object files, this name will appear twice in the final table. Dealing with the private names should be trivial for the linker. The linker only needs to bind each private name with the absolute address of a variable or function (and change the status of the entry to "A").

    In the case of a public name, we know that it represents a single entity that is defined in one object file and shared with the other object files. Therefore, in the final table, this single entity name is allowed to appear only once as public (but can also appears multiple times as private). The linker ensures the uniqueness of the public uses of each name in the final table.

    For each public name, the main task of the linker is to perform name resolution, that is to associate the definition of public name in a object file (status "X") with the multiple uses of the same name in other object files (status "U"). The "X" entry and the various "U" entries are consolidated in an single entry in the final table. The linker also binds each public name with the correct absolute address (and change the status of the entry to "A" if all goes well; the status becomes "E" in case of linkage error).

    The sequence of names in the final table results from the concatenation in the same order of the names in the various object files. However, the various entries associated to the same public name are consolidated in the entry that appears in the first place; and the other entries are removed to ensure uniqueness. In the example, study the case of the name "global_e".

    In the output file, after the table, comes the code of all the functions. The code of all the object files is concatenated in the same order it appears in the object files. Furthermore, all the negative values, representing entries of the name tables, are replaced by the absolute addresses associated to the entries, and these addresses are written using 4 digits.

    Example of output file

    This is the output file corresponding to the input example:

    Limits


    Regras principais


    Regras de entrega


    Outras regras


    Avaliação

    O docente responsável pela gestão e pela avaliação deste trabalho são os Profs Miguel Monteiro e Artur Miguel Dias.

    A maior parte da nota (cerca de 80%) será determinada pelo Mooshak, que testa a correção das funcionalidades implementadas.

    Na parte mais subjetiva da apreciação da qualidade do código, serão tidos em conta aspetos, tais como:


    Observações


    Final

    Bom trabalho! Esperamos que goste.