Accessing Data in Memory— Addressing Modes
Up to this point, we have seen four ways of computing the address of a value in memory: (1) a constant value, known at assembly time, (2) the contents of a register, (3) the sum of two registers, and (4) the sum of a register and a constant.
Table 4.1 gives names to these addressing modes, and shows a few others as well. Notice that the syntax of the table differs from that of the ARC. This is a common, unfortunate feature of assembly languages: each one differs from the rest in its syntax conventions. The notation M[x] in the Meaning column assumes memory is an array, M, whose byte index is given by the address computation in brackets. There may seem to be a bewildering assortment of addressing modes, but each has its usage:
• Immediate addressing allows a reference to a constant that is known at assembly time.
• Direct addressing is used to access data items whose address is known at assembly time.
• Indirect addressing is used to access a pointer variable whose address is known at compile time. This addressing mode is seldom supported in modern processors because it requires two memory references to access the operand, making it a complicated instruction. Programmers who wish to access data in this form must use two instructions, one to access the pointer and another to access the value to which it refers. This has the beneficial side effect of exposing the complexity of the addressing mode, perhaps discouraging its use.
• Register indirect addressing is used when the address of the operand is not known until run time. Stack operands fit this description, and are accessed by register indirect addressing, often in the form of push and pop instructions that also decrement and increment the register respectively.
• Register indexed, register based, and register based indexed addressing are used to access components of arrays such as the one in Figure 4-14, and components buried beneath the top of the stack, in a data structure known as the stack frame, which is discussed in the next section.
Subroutine Linkage and Stacks
A subroutine, sometimes called a function or procedure, is a sequence of instructions that is invoked in a manner that makes it appear to be a single instruction in a high level view. When a program calls a subroutine, control is passed from the program to the subroutine, which executes a sequence of instructions and then returns to the location just past where it was called. There are a number of methods for passing arguments to and from the called routine, referred to as calling conventions. The process of passing arguments between routines is referred to as subroutine linkage.
One calling convention simply places the arguments in registers. The code in Figure 4-15 shows a program that loads two arguments into %r1 and %r2, calls
subroutine add_1, and then retrieves the result from %r3. Subroutine add_1 takes its operands from %r1 and %r2, and places the result in %r3 before returning via the jmpl instruction. This method is fast and simple, but it will not work if the number of arguments that are passed between the routines exceeds the number of free registers, or if subroutine calls are deeply nested.
A second calling convention creates a data link area. The address of the data link area is passed in a predetermined register to the called routine. Figure 4-16 shows
an example of this method of subroutine linkage. The .dwb pseudo-op in the calling routine sets up a data link area that is three words long, at addresses x, x+4, and x+8. The calling routine loads its two arguments into x and x+4, calls subroutine add_2, and then retrieves the result passed back from add_2 from memory location x+8. The address of data link area x is passed to add_2 in register %r5.
Note that sethi must have a constant for its source operand, and so the assembler recognizes the sethi construct shown for the calling routine and replaces x with its address. The srl that follows the sethi moves the address x into the least significant 22 bits of %r5, since sethi places its operand into the leftmost 22 bits of the target register. An alternative approach to loading the address of x into %r5 would be to use a storage location for the address of x, and then simply apply the ld instruction to load the address into %r5. While the latter approach is simpler, the sethi/srl approach is faster because it does not involve a time consuming access to the memory.
Subroutine add_2 reads its two operands from the data link area at locations %r5 and %r5 + 4, and places its result in the data link area at location %r5 + 8 before returning. By using a data link area, arbitrarily large blocks of data can be passed between routines without copying more than a single register during subroutine linkage. Recursion can create a burdensome bookkeeping overhead, however, since a routine that calls itself will need several data link areas. Data link areas have the advantage that their size can be unlimited, but also have the disadvantage that the size of the data link area must be known at assembly time.
A third calling convention uses a stack. The general idea is that the calling rou- tine pushes all of its arguments (or pointers to arguments, if the data objects are large) onto a last-in-first-out stack. The called routine then pops the passed arguments from the stack, and pushes any return values onto the stack. The calling routine then retrieves the return value(s) from the stack and continues execution. A register in the CPU, known as the stack pointer, contains the address of the top of the stack. Many machines have push and pop instructions that automatically decrement and increment the stack pointer as data items are pushed and popped.
An advantage of using a stack is that its size grows and shrinks as needed. This supports arbitrarily deep nesting of procedure calls without having to declare the size of the stack at assembly time. An example of passing arguments using a stack is shown in Figure 4-17. Register %r14 serves as the stack pointer (%sp) which is
initialized by the operating system prior to execution of the calling routine. The calling routine places its arguments (%r1 and %r2) onto the stack by decrementing the stack pointer (which moves %sp to the next free word above the stack) and by storing each argument on the new top of the stack. Subroutine add_3 is called, which pops its arguments from the stack, performs an addition operation, and then stores its return value on the top of the stack before returning. The calling routine then retrieves its argument from the top of the stack and continues execution.
For each of the calling conventions, the call instruction is used, which saves the current PC in %r15. When a subroutine finishes execution, it needs to return to the instruction that follows the call, which is one word (four bytes) past the saved PC. Thus, the statement “jmpl %r15 + 4, %r0” completes the return. If the called routine calls another routine, however, then the value of the PC that was originally saved in %r15 will be overwritten by the nested call, which means that a correct return to the original calling routine through %r15 will no longer be possible. In order to allow nested calls and returns, the current value of %r15 (which is called the link register) should be saved on the stack, along with any other registers that need to be restored after the return.
If a register based calling convention is used, then the link register should be saved in one of the unused registers before a nested call is made. If a data link area is used, then there should be space reserved within it for the link register. If a stack scheme is used, then the link register should be saved on the stack. For each of the calling conventions, the link register and the local variables in the called routines should be saved before a nested call is made, otherwise, a nested call to the same routine will cause the local variables to be overwritten.
There are many variations to the basic calling conventions, but the stack-oriented approach to subroutine linkage is probably the most popular. When a stack based calling convention is used that handles nested subroutine calls, a stack frame is built that contains arguments that are passed to a called routine, the return address for the calling routine, and any local variables. A sample high level program is shown in Figure 4-18 that illustrates nested function calls. The operation that the program performs is not important, nor is the fact that the C programming language is used, but what is important is how the subroutine calls are implemented.
The behavior of the stack for this program is shown in Figure 4-19. The main program calls func_1 with arguments 1 and 2, and then calls func_2 with argument 10 before finishing execution. Function func_1 has two local vari- ables i and j that are used in computing the return value j. Function func_2 has two local variables m and n that are used in creating the arguments to pass through to func_1 before returning m.
The stack pointer (%r14 by convention, which will be referred to as %sp) is ini- tialized before the program starts executing, usually by the operating system. The compiler is responsible for implementing the calling convention, and so the compiler produces code for pushing parameters and the return address onto the stack, reserving room on the stack for local variables, and then reversing the pro-
cess as routines return from their calls. The stack behavior shown in Figure 4-19 is thus produced as the result of executing compiler generated code, but the code may just as well have been written directly in assembly language.
As the main program begins execution, the stack pointer points to the top element of the system stack (Figure 4-19a). When the main routine calls func_1 at line 03 of the program shown in Figure 4-18 with arguments 1 and 2, the arguments are pushed onto the stack, as shown in Figure 4-19b. Control is then transferred to func_1 through a call instruction (not shown), and func_1 then saves the return address, which is in %r15 as a result of the call instruction, onto the stack (Figure 4-19c). Stack space is reserved for local variables i and j of func_1 (Figure 4-19d). At this point, we have a complete stack frame for the func_1 call as shown in Figure 4-19d, which is composed of the arguments passed to func_1, the return address to the main routine, and the local variables for func_1.
Just prior to func_1 returning to the calling routine, it releases the stack space
for its local variables, retrieves the return address from the stack, releases the stack space for the arguments passed to it, and then pushes its return value onto the stack as shown in Figure 4-19e. Control is then returned to the calling routine through a jmpl instruction, and the calling routine is then responsible for retrieving the returned value from the stack and decrementing the stack pointer to its position from before the call, as shown in Figure 4-19f. Routine func_2 is then executed, and the process of building a stack frame starts all over again as shown in Figure 4-19g. Since func_2 makes a call to func_1 before it returns, there will be stack frames for both func_2 and func_1 on the stack at the same time as shown in Figure 4-19h. The process then unwinds as before, finally resulting in the stack pointer at its original position as shown in Figure 4-19(i-k).