ASSEMBLER DETAIL
The assembler (MASM)1 for the microprocessor can be used in two ways: (1) with models that are unique to a particular assembler, and (2) with full-segment definitions that allow complete control over the assembly process and are universal to all assemblers. This section of the text presents both methods and explains how to organize a program’s memory space by using the assembler. It also explains the purpose and use of some of the more important directives used with this assembler. Appendix A provides additional detail about the assembler.
In most cases, the inline assembler found in Visual C+ + is used for developing assembly code for use in a C+ + program, but there are occasions that require separate assembly modules writing using the assembler. This section of the text contrasts, where possible, the inline assembler and the assembler.
Directives
Before the format of an assembly language program is discussed, some details about the directives (pseudo-operations) that control the assembly process must be learned. Some common assembly language directives appear in Table 4–22. Directives indicate how an operand or sec- tion of a program is to be processed by the assembler. Some directives generate and store information in the memory; others do not. The DB (define byte) directive stores bytes of data in the memory, whereas the BYTE PTR directive never stores data. The BYTE PTR directive indicates the size of the data referenced by a pointer or index register. Note that none of the directives function in the inline assembler program that is a part of Visual C+ + . If you are using the inline assembler exclusively, you can skip this part of the text. Be aware that complex sections of assembly code are still written using MASM.
Note that by default the assembler accepts only 8086/8088 instructions, unless a program is preceded by the .686 or .686P directive or one of the other microprocessor selection switches. The .686 directive tells the assembler to use the Pentium Pro instruction set in the real mode, and the .686P directive tells the assembler to use the Pentium Pro protected mode instruction set. Most modern software is written assuming that the microprocessor is a Pentium Pro or newer, so the .686 switch is often used. Windows 95 was the first major operating system to use a 32-bit architecture that conforms to the 80386. Windows XP requires a Pentium class machine (.586 switch) using at least a 233MHz microprocessor.
Storing Data in a Memory Segment. The DB (define byte), DW (define word), and DD (define doubleword) directives, first presented in Chapter 1, are most often used with MASM to define and store memory data. If a numeric coprocessor executes software in the system, the DQ (define quadword) and DT (define ten bytes) directives are also common. These directives label a memory location with a symbolic name and indicate its size.
Example 4–13 shows a memory segment that contains various forms of data definition directives. It also shows the full-segment definition with the first SEGMENT statement to indicate the start of the segment and its symbolic name. Alternately, as in past examples in this and prior chapters, the SMALL model can be used with the .DATA statement. The last statement in this example contains the ENDS directive, which indicates the end of the segment. The name of the segment (LIST_SEG) can be anything that the programmer desires to call it. This allows a pro- gram to contain as many segments as required.
Example 4–13 shows various forms of data storage for bytes at DATA1. More than 1 byte can be defined on a line in binary, hexadecimal, decimal, or ASCII code. The DATA2 label shows how to store various forms of word data. Doublewords are stored at DATA3; they include floating-point, single-precision real numbers.
Memory is reserved for use in the future by using a question mark (?) as an operand for a DB, DW, or DD directive. When a ? is used in place of a numeric or ASCII value, the assembler sets aside a location and does not initialize it to any specific value. (Actually, the assembler usu- ally stores a zero into locations specified with a?.) The DUP (duplicate) directive creates an array, as shown in several ways in Example 4–12. A 10 DUP (?) reserves 10 locations of mem- ory, but stores no specific value in any of the 10 locations. If a number appears within the ( ) part of the DUP statement, the assembler initializes the reserved section of memory with the data indicated. For example, the LIST2 DB 10 DUP (2) instruction reserves 10 bytes of memory for array LIST2 and initializes each location with a 02H.
The ALIGN directive, used in this example, makes sure that the memory arrays are stored on word boundaries. An ALIGN 2 places data on word boundaries and an ALIGN 4 places them on doubleword boundaries. In the Pentium–Pentium 4, quadword data for double-precision floating-point numbers should use ALIGN 8. It is important that word-sized data are placed at word boundaries and doubleword-sized data are placed at doubleword boundaries. If not, the microprocessor spends additional time accessing these data types. A word stored at an odd- numbered memory location takes twice as long to access as a word stored at an even-numbered memory location. Note that the ALIGN directive cannot be used with memory models because the size of the model determines the data alignment. If all doubleword data are defined first, fol- lowed by word-sized and then byte-sized data, the ALIGN statement is not necessary to align data correctly.
ASSUME, EQU, and ORG. The equate directive (EQU) equates a numeric, ASCII, or label to another label. Equates make a program clearer and simplify debugging. Example 4–14 shows several equate statements and a few instructions that show how they function in a program.
The THIS directive always appears as THIS BYTE, THIS WORD, THIS DWORD, or THIS QWORD. In certain cases, data must be referred to as both a byte and a word. The assembler can only assign either a byte, word, or doubleword address to a label. To assign a byte label to a word, use the software listed in Example 4–15.
This example also illustrates how the ORG (origin) statement changes the starting off- set address of the data in the data segment to location 300H. At times, the origin of data or the code must be assigned to an absolute offset address with the ORG statement. The ASSUME statement tells the assembler what names have been chosen for the code, data, extra, and stack segments. Without the ASSUME statement, the assembler assumes nothing and automatically uses a segment override prefix on all instructions that address memory data. The ASSUME statement is only used with full-segment definitions, as described later in this section of the text.
PROC and ENDP. The PROC and ENDP directives indicate the start and end of a procedure (subroutine). These directives force structure because the procedure is clearly defined. Note that if structure is to be violated for whatever reason, use the CALLF, CALLN, RETF, and RETN instructions. Both the PROC and ENDP directives require a label to indicate the name of the procedure. The PROC directive, which indicates the start of a procedure, must also be followed with a NEAR or FAR. A NEAR procedure is one that resides in the same code segment as the program. A FAR procedure may reside at any location in the memory system. Often the call NEAR procedure is considered to be local, and the call FAR procedure is considered to be global. The term global denotes a procedure that can be used by any program; local defines a procedure that is only used by the current program. Any labels that are defined within the procedure block are also defined as either local (NEAR) or global (FAR).
Example 4–16 shows a procedure that adds BX, CX, and DX and stores the sum in register AX. Although this procedure is short and may not be particularly useful, it does illustrate how to use the PROC and ENDP directives to delineate the procedure. Note that information about the operation of the procedure should appear as a grouping of comments that show the registers changed by the procedure and the result of the procedure.
If version 6.x of the Microsoft MASM assembler program is available, the PROC directive specifies and automatically saves any registers used within the procedure. The USES statement indicates which registers are used by the procedure, so that the assembler can automatically save them before your procedure begins and restore them before the procedure ends with the RET instruction. For example, the ADDS PROC USES AX BX CX statement automatically pushes AX, BX, and CX on the stack before the procedure begins and pops them from the stack before the RET instruction executes at the end of the procedure. Example 4–17 illustrates a procedure written using MASM version 6.x that shows the USES statement. Note that the registers in the list are not separated by commas, but by spaces, and the PUSH and POP instructions are dis- played in the procedure listing because it was assembled with the .LIST ALL directive. The instructions prefaced with an asterisk (*) are inserted by the assembler and were not typed in the source file. The USES statement appears elsewhere in this text, so if MASM version 5.10 is in use, the code will need to be modified.
The assembler uses two basic formats for developing software: One method uses models and the other uses full-segment definitions. Memory models, as presented in this section and briefly in earlier chapters, are unique to the MASM assembler program. The TASM assembler also uses memory models, but they differ somewhat from the MASM models. The full-segment definitions are common to most assemblers, including the Intel assembler, and are often used for soft- ware development. The models are easier to use for simple tasks. The full-segment definitions offer better control over the assembly language task and are recommended for complex pro- grams. The model was used in early chapters because it is easier to understand for the beginning programmer. Models are also used with assembly language procedures that are used by high- level languages such as C>C+ + . Although this text fully develops and uses the memory model definitions for its programming examples, realize that full-segment definitions offer some advantages over memory models, as discussed later in this section.
Models. There are many models available to the MASM assembler, ranging from tiny to huge. Appendix A contains a table that lists all the models available for use with the assembler. To designate a model, use the .MODEL statement followed by the size of the memory system. The TINY model requires that all software and data fit into one 64K-byte memory segment; it is useful for many small programs. The SMALL model requires that only one data segment be used with one code segment for a total of 128K bytes of memory. Other models are available, up to the HUGE model.
Example 4–18 illustrates how the .MODEL statement defines the parameters of a short program that copies the contents of a 100-byte block of memory (LISTA) into a second 100- byte block of memory (LISTB). It also shows how to define the stack, data, and code segments. The .EXIT 0 directive returns to DOS with an error code of 0 (no error). If no parameter is added to .EXIT, it still returns to DOS, but the error code is not defined. Also note that special directives such as @DATA (see Appendix A) are used to identify various segments. If the .STARTUP directive is used (MASM version 6.x), the MOV AX,@DATA fol- lowed by MOV DS,AX statements can be eliminated. The .STARTUP directive also eliminates the need to store the starting address next to the END label. Models are important with both Microsoft Visual C+ + and Borland C+ + development systems if assembly language is included with C+ + programs. Both development systems use inline assembly programming for adding assembly language instructions and require an understanding of programming models.
Full-Segment Definitions. Example 4–19 illustrates the same program using full segment definitions. Full-segment definitions are also used with the Borland and Microsoft C>C+ + environments for procedures developed in assembly language. The program in Example 4–19 appears longer than the one pictured in Example 4–18, but it is more structured than the model method of setting up a program. The first segment defined is the STACK_SEG, which is clearly delineated with the SEG- MENT and ENDS directives. Within these directives, a DW 100 DUP (?) sets aside 100H words for the stack segment. Because the word STACK appears next to SEGMENT, the assembler and linker automatically load both the stack segment register (SS) and stack pointer (SP).
Next, the data are defined in the DATA_SEG. Here, two arrays of data appear as LISTA and LISTB. Each array contains 100 bytes of space for the program. The names of the segments in this program can be changed to any name. Always include the group name ‘DATA’, so that the Microsoft program CodeView can be effectively used to symbolically debug this software. CodeView is a part of the MASM package used to debug software. To access CodeView, type CV, followed by the file name at the DOS command line; if operating from Programmer’s Work Bench, select Debug under the Run menu. If the group name is not placed in a program, CodeView can still be used to debug a program, but the program will not be debugged in symbolic form. Other group names such as ‘STACK’, ‘CODE’, and so forth are listed in Appendix A. You must at least place the word ‘CODE’ next to the code segment SEGMENT statement if you want to view the program symbolically in CodeView.
The CODE_SEG is organized as a far procedure because most software is procedure- oriented. Before the program begins, the code segment contains the ASSUME statement. The ASSUME statement tells the assembler and linker that the name used for the code segment (CS) is CODE_SEG; it also tells the assembler and linker that the data segment is DATA_SEG and the stack segment is STACK_SEG. Notice that the group name ‘CODE’ is used for the code segment for use by CodeView. Other group names appear in Appendix A with the models.
After the program loads both the extra segment register and data segment register with the location of the data segment, it transfers 100 bytes from LISTA to LISTB. Following this is a sequence of two instructions that return control back to DOS (the disk operating system). Note that the program loader does not automatically initialize DS and ES. These registers must be loaded with the desired segment addresses in the program.
The last statement in the program is END MAIN. The END statement indicates the end of the program and the location of the first instruction executed. Here, we want the machine to exe- cute the main procedure so the MAIN label follows the END directive.
In the 80386 and above, an additional directive is found attached to the code segment. The USE16 or USE32 directive tells the assembler to use either the 16- or 32-bit instruction modes for the microprocessor. Software developed for the DOS environment must use the USE16 directive for the 80386 through the Core2 program to function correctly because MASM assumes that all segments are 32 bits and all instruction modes are 32 bits by default.
A Sample Program
Example 4–20 provides a sample program, using full-segment definitions, that reads a character from the keyboard and displays it on the CRT screen. Although this program is trivial, it illus- trates a complete workable program that functions on any personal computer using DOS, from the earliest 8088-based system to the latest Core2-based system. This program also illustrates the use of a few DOS function calls. (Appendix A lists the DOS function calls with their parameters.) The BIOS function calls allow the use of the keyboard, printer, disk drives, and everything else that is available in your computer system.
This example program uses only a code segment because there is no data. A stack segment should appear, but it has been left out because DOS automatically allocates a l28-byte stack for all programs. The only time that the stack is used in this example is for the INT 21H instructions that call a procedure in DOS. Note that when this program is linked, the linker signals that no stack segment is present. This warning may be ignored in this example because the stack is fewer than 128 bytes.
Notice that the entire program is placed into a far procedure called MAIN. It is good programming practice to write all software in procedural form, which allows the program to be used as a procedure at some future time if necessary. It is also fairly important to document register use and any parameters required for the program in the program header, which is a section of comments that appear at the start of the program.
The program uses DOS functions 06H and 4CH. The function number is placed in AH before the INT 21H instruction executes. The 06H function reads the keyboard if DL = 0FFH, or displays the ASCII contents of DL if it is not 0FFH. Upon close examination, the first section of the program moves 06H into AH and 0FFH into DL, so that a key is read from the keyboard. The INT 21H tests the keyboard; if no key is typed, it returns equal. The JE instruction tests the equal condition and jumps to MAIN if no key is typed.
When a key is typed, the program continues to the next step, which compares the contents of AL with an @ symbol. Upon return from the INT 21H, the ASCII character of the typed key is found in AL. In this program, if an @ symbol is typed, the program ends. If the @ symbol is not typed, the program continues by displaying the character typed on the keyboard with the next INT 21H instruction.
The second INT 21H instruction moves the ASCII character into DL so it can be displayed on the CRT screen. After displaying the character, a JMP executes. This causes the program to continue at MAIN, where it repeats reading a key.
If the @ symbol is typed, the program continues at MAIN1, where it executes the DOS function code number 4CH. This causes the program to return to the DOS prompt so that the computer can be used for other tasks.
More information about the assembler and its application appears in Appendix A and in the next several chapters. Appendix A provides a complete overview of the assembler, linker, and DOS functions. It also provides a list of the BIOS (basic I/O system) functions. The information provided in the following chapters clarifies how to use the assembler for certain tasks at different levels of the text.
Example 4–21 shows the program listed in Example 4–20, except models are used instead of full-segment descriptions. Please compare the two programs to determine the differences. Notice how much shorter and cleaner looking the models can make a program.