Languages and the machine: linking and loading ( linking and loading).

5.2 Linking and Loading

Most applications of any size will have a number of separately compiled or assembled modules. These modules may be generated by different programming languages or they may be present in a library provided as part of the programming language environment or operating system. Each module must provide the information described above, so that they can be linked together for loading and execution.

A linkage editor, or linker, is a software program that combines separately assembled programs (called object modules) into a single program, which is called a load module. The linker resolves all global-external references and relocates addresses in the separate modules. The load module can then be loaded into memory by a loader, which may also need to modify addresses if the program is loaded at a location that differs from the loading origin used by the linker.

A relatively new technique called dynamic link libraries (DLLs), popularized by Microsoft in the Windows operating system, and present in similar forms in other operating systems, postpones the linking of some components until they are actually needed at run time. We will have more to say about dynamic linking later in this section.

5.3.1 LINKING

In combining the separately compiled or assembled modules into a load module, the linker must:

• Resolve address references that are external to modules as it links them.

• Relocate each module by combining them end-to-end as appropriate. During this relocation process many of the addresses in the module must be changed to reflect their new location.

• Specify the starting symbol of the load module.

• If the memory model includes more than one memory segment, the linker must specify the identities and contents of the various segments.

Resolving external references

In resolving address references the linker needs to distinguish local symbol names (used within a single source module) from global symbol names (used in more than one module). This is accomplished by making use of the .global and

.extern pseudo-ops during assembly. The .global pseudo-op instructs the assembler to mark a symbol as being available to other object modules during the linking phase. The .extern pseudo-op identifies a label that is used in one module but is defined in another. A .global is thus used in the module where a symbol is defined (such as where a subroutine is located) and a .extern is used in every other module that refers to it. Note that only address labels can be

global or external: it would be meaningless to mark a .equ symbol as global or external, since .equ is a pseudo-op that is used during the assembly process only, and the assembly process is completed by the time that the linking process begins.

All labels referred to in one program by another, such as subroutine names, will have a line of the form shown below in the source module:

.global symbol1, symbol2, …

All other labels are local, which means the same label can be used in more than one source module without risking confusion since local labels are not used after the assembly process finishes. A module that refers to symbols defined in another module should declare those symbols using the form:

.extern symbol1, symbol2, …

As an example of how .global and .extern are used, consider the two assembly code source modules shown in Figure 5-6. Each module is separately assem-

image

bled into an object module, each with its own symbol table as shown in Figure 5-7. The symbol tables have an additional field that indicates if a symbol is global or external. Program main begins at location 2048, and each instruction is four bytes long, so x and y are at locations 2064 and 2068, respectively. The symbol sub is marked as external as a result of the .extern pseudo-op. As part of the assembly process the assembler includes header information in the module about symbols that are global and external so they can be resolved at link time.

image

Relocation

Notice in Figure 5-6 that the two programs, main and sub, both have the same starting address, 2048. Obviously they cannot both occupy that same memory address. If the two modules are assembled separately there is no way for an assembler to know about the conflicting starting addresses during the assembly phase. In order to resolve this problem, the assembler marks symbols that may have their address changed during linking as relocatable, as shown in the Relocatable fields of the symbol tables shown in Figure 5-7. The idea is that a pro- gram that is assembled at a starting address of 2048 can be loaded at address 3000 instead, for instance, as long as all references to relocatable addresses within the program are increased by 3000 – 2048 = 952. Relocation is performed by the linker so that relocatable addresses are changed by the same amount that the loading origin is changed, but absolute, or non-relocatable addresses (such as the highest possible stack address, which is 231 – 4 for 32-bit words) stays the same regardless of the loading origin.

The assembler is responsible for determining which labels are relocatable when it builds the symbol table. It has no meaning to call an external label relocatable, since the label is defined in another module, so sub has no relocatable entry in the symbol table in Figure 5-7 for program main, but it is marked as relocatable in the subroutine library. The assembler must also identify code in the object module that needs to be modified as a result of relocation. Absolute numbers, such as constants (marked by .equ ,or that appear in memory locations, such as the contents of x and y, which are 105 and 92, respectively) are not relocatable. Memory locations that are positioned relative to a .org statement, such as x and y (not the contents of x and y!) are generally relocatable. References to fixed locations, such as a permanently resident graphics routine that may be hardwired into the machine, are not relocatable. All of the information needed to relocate a module is stored in the relocation dictionary contained in the assembled file, and is therefore available to the linker.

5.3.2 LOADING

The loader is a software program that places the load module into main memory. Conceptually the tasks of the loader are not difficult. It must load the various memory segments with the appropriate values and initialize certain registers such as the stack pointer %sp, and the program counter, %pc, to their initial values.

If there is only one load module executing at any time, then this model works well. In modern operating systems, however, several programs are resident in memory at any time, and there is no way that the assembler or linker can know at which address they will reside. The loader must relocate these modules at load time by adding an offset to all of the relocatable code in a module. This kind of loader is known as a relocating loader. The relocating loader does not simply repeat the job of the linker: the linker has to combine several object modules into a single load module, whereas the loader simply modifies relocatable addresses within a single load module so that several programs can reside in memory simultaneously. A linking loader performs both the linking process and the loading process: it resolves external references, relocates object modules, and loads them into memory.

The linked executable file contains header information describing where it should be loaded, starting addresses, and possibly relocation information, and entry points for any routines that should be made available externally.

An alternative approach that relies on memory management accomplishes relocation by loading a segment base register with the appropriate base to locate the code (or data) at the appropriate place in physical memory. The memory management unit (MMU), adds the contents of this base register to all memory references. As a result, each program can begin execution at address 0 and rely on the MMU to relocate all memory references transparently.

Dynamic link libraries

Returning to dynamic link libraries, the concept has a number of attractive features. Commonly used routines such as memory management or graphics pack- ages need be present at only one place, the DLL library. This results in smaller program sizes because each program does not need to have its own copy of the DLL code, as would otherwise be needed. All programs share the exact same code, even while simultaneously executing.

Furthermore, the DLL can be upgraded with bug fixes or feature enhancements in just one place, and programs that use it need not be recompiled or relinked in a separate step. These same features can also become disadvantages, however, because program behavior may change in unintended ways (such as running out of memory as a result of a larger DLL). The DLL library must be present at all times, and must contain the version expected by each program. Many Windows users have seen the cryptic message, “A file is missing from the dynamic link library.” Complicating the issue in the Windows implementation, there are a number of locations in the file system where DLLs are placed. The more sophisticated user may have little difficulty resolving these problems, but the naive user may be baffled.

A PROGRAMMING EXAMPLE

Consider the problem of adding two 64-bit numbers using the ARC assembly language. We can store the 64-bit numbers in successive words in memory and then separately add the low and high order words. If a carry is generated from adding the low order words, then the carry is added into the high order word of the result. (See problem 5.3 for the generation of the symbol table, and problem 5.4 for the translation of the assembly code in this example to machine code.)

Figure 5-8 shows one possible coding. The 64-bit operands A and B are stored in memory in a high endian format, in which the most significant 32 bits are stored in lower memory addresses than the least significant 32 bits. The program begins by loading the high and low order words of A into %r1 and %r2, respectively, and then loading the high and low order words of B into %r3 and %r4, respectively. Subroutine add_64 is called, which adds A and B and places the high order word of the result in %r5 and the low order word of the result in %r6. The 64-bit result is then stored in C, and the program returns.

Subroutine add_64 starts by adding the low order words. If a carry is not generated, then the high order words are added and the subroutine finishes. If a carry is generated from adding the low order words, then it must be added into the

image

high order word of the result. If a carry is not generated when the high order words are added, then the carry from the low order word of the result is simply added into the high order word of the result and the subroutine finishes. If, how- ever, a carry is generated when the high order words are added, then when the carry from the low order word is added into the high order word, the final state of the condition codes will show that there is no carry out of the high order word, which is incorrect. The condition code for the carry is restored by placing a large number in %r7 and then adding it to itself. The condition codes for n, z, and v may not have correct values at this point, however. A complete solution is not detailed here, but in short, the remaining condition codes can be set to their proper values by repeating the addcc just prior to the %r7 operation, taking into account the fact that the c condition code must still be preserved. ■

Leave a comment

Your email address will not be published. Required fields are marked *