We offer, as a starting point, this single file: Linker.c. You are allowed to change everything in the file.
It is a good practice to create computer programs using multiple modules. If each module has its own responsibility, it becomes easier to manage the complexity of the program and improve its long term maintainability.
In general, each module defines various entities - such as global variables and functions - with each entity being identified by a name. There are three kinds of use of names that must be considered:
When a module is compiled, it gives rise to a distinct object file.
A linker is a special program that takes multiple object files and combines them into a single executable program. The main task of the linker is to resolve name references across object files, that is to bind these names to memory addresses in the data segment and code segment, and perform other necessary tasks to prepare the final executable file. The linker acts as the final step in the software build process, bringing together all the necessary components to create a fully functional executable program.
In the final part of lesson 9, we briefly address the topic of linkers.
Note that the linker performs name resolution with the help of name tables. Each object file comes with its own name table. The name table stores information about the names that are defined inside the object file and about the extern names that are referenced in the object file. The table associates with each name several attributes: the kind, the public status, the address, etc.
In some cases, the linker may not be able to resolve a name reference, either because a definition is missing or because there are multiple definitions of a public name. In these cases, the linker will generate some for of error information. It is the responsibility of the programmer to ensure that all symbol references make sense.
Only absolute addressing is supported at runtime. This applies to global variables and to code.
The space occupied by a variable is always 4 bytes. When the program runs, the address of the first variable is 0000, the address of the second variable is 0004, etc.
Each instruction occupies a multiple of 4 bytes. So the length of each function is a multiple of 4. The address of the first function is 0000.
In each object file, the addresses of all the locally defined names of variables and functions are seen as local addresses and start at 0000.
However, when several object files are consolidated in the final executable file, all these addresses will need to be relocated to global addresses. This is because the variables of all the object files must be placed in a single data segment and the functions of all the object files must be placed in a single code segment.
The representation of each object file start with a line consisting in "---".
Next appears a name table, consisting in zero or more lines. Each line describes a name used in the object file and has in 5 attributes :
After the name table, comes the code of all the functions of the object file. The code of each function is represented by a simple sequence of integer values, one integer per line. The non-negative and the negative integers have different roles: The non-negative integer represent normal code that will not be touched by the linker. The negative numbers represent entries of the name table, and linker will replace these negative numbers with absolute addresses. For example, the integer -3 represents the entry 0003 of the object file and will be replaced by an absolute address during the name resolution process.
Note that all the names that are defined (status = "P" or status = "X") have an associated local address. The address of the undefined names (status = "U") is filled up with the address 0000.
A told before, all the names must be distinct in each name table. However, is normal to have some repetitions across several object files. For example, the name "global_x" appears in the first two object files, in both cases privately. The name "global_e" also appears in the first two object files, first as an undefined name, and next as an exported definition. There are more situations where repetitions are normal.
Note in the code of each object file the negative values representing entries of the name table.
--- 0001 V P 0000 global_x 0002 V P 0004 global_y 0003 V P 0008 global_z 0004 V U 0000 global_e 0005 V U 0000 global_f 0006 V X 0012 global_a 0007 V X 0016 global_b 0008 V X 0020 global_c 0009 F U 0000 g 0010 F X 0000 h 0011 F U 0000 printf 0012 F X 0020 f 0013 F X 0040 main 11001 16535 -6 14324 15435 11002 -3 -9 15435 14355 11003 12345 -5 24543 --- 0001 V P 0000 global_x 0002 V U 0000 global_a 0003 V X 0004 global_b 0004 V P 0008 global_c 0005 V U 0000 global_y 0006 V X 0012 global_e 0007 V P 0016 global_h 0008 F U 0000 printf 0009 F X 0000 g 0010 F X 0016 h 11004 16535 -1 14324 11005 15435 22223 -1 14324 15435 14355 --- 0001 F X 0000 printf 11006 14324 15435
The representation of the executable program start with a line consisting in "+++".
Next appears a name table, consisting in zero or more lines. Each line describes a name used in the program and has in 5 attributes :
The treatment of private and public names is different:
In the case of the private names, we know that they are used exclusively in the scope of the object files where they are defined - meaning that the entities they represent are not shared with other object files. Therefore these names can appear multiple times in the final name table. For example, if a name if privately defined in two object files, this name will appear twice in the final table. Dealing with the private names should be trivial for the linker. The linker only needs to bind each private name with the absolute address of a variable or function (and change the status of the entry to "A").
In the case of a public name, we know that it represents a single entity that is defined in one object file and shared with the other object files. Therefore, in the final table, this single entity name is allowed to appear only once as public (but can also appears multiple times as private). The linker ensures the uniqueness of the public uses of each name in the final table.
For each public name, the main task of the linker is to perform name resolution, that is to associate the definition of public name in a object file (status "X") with the multiple uses of the same name in other object files (status "U"). The "X" entry and the various "U" entries are consolidated in an single entry in the final table. The linker also binds each public name with the correct absolute address (and change the status of the entry to "A" if all goes well; the status becomes "E" in case of linkage error).
The sequence of names in the final table results from the concatenation in the same order of the names in the various object files. However, the various entries associated to the same public name are consolidated in the entry that appears in the first place; and the other entries are removed to ensure uniqueness. In the example, study the case of the name "global_e".
In the output file, after the table, comes the code of all the functions. The code of all the object files is concatenated in the same order it appears in the object files. Furthermore, all the negative values, representing entries of the name tables, are replaced by the absolute addresses associated to the entries, and these addresses are written using 4 digits.
+++ 0001 V A 0000 global_x 0002 V A 0004 global_y 0003 V A 0008 global_z 0004 V A 0012 global_e 0005 V E 0000 global_f 0006 V A 0016 global_a 0007 V E 0000 global_b 0008 V A 0020 global_c 0009 F A 0056 g 0010 F E 0000 h 0011 F A 0100 printf 0012 F A 0020 f 0013 F A 0040 main 0014 V A 0024 global_x 0015 V A 0028 global_c 0016 V E 0000 global_y 0017 V A 0032 global_h 11001 16535 0016 14324 15435 11002 0008 0056 15435 14355 11003 12345 0000 24543 11004 16535 0024 14324 11005 15435 22223 0024 14324 15435 14355 11006 14324 15435
A maior parte da nota (cerca de 80%) será determinada pelo Mooshak, que testa a correção das funcionalidades implementadas.
Na parte mais subjetiva da apreciação da qualidade do código, serão tidos em conta aspetos, tais como:
Bom trabalho! Esperamos que goste.