6.6 Microcomputer Programming Concepts
This section includes the fundamental concepts of microcomputer programming. Typical programming characteristics such as programming languages, microprocessor instruction sets, addressing modes, and instruction formats are discussed.
6.6.1 Microcomputer Programming Languages
Microcomputers are typically programmed using semi-English-language statements (assembly language). In addition to assembly languages, microcomputers use a more understandable human-oriented language called the "high-level language." No matter what type oflanguage is used to write the programs, the microcomputers only understand binary numbers. Therefore, the programs must eventually be translated into their appropriate binary forms. The main ways of accomplishing this are discussed later.
Microcomputer programming languages can typically be divided into three main types:
1. Machine language
2. Assembly language
3. High-levellanguage
A machine language program consists of either binary or hexadecimal op-codes. Programming a microcomputer with either one is relatively difficult, because one must deal only with numbers. The architecture and microprograms of a microprocessor determine
ll its instructions. These instructions are called the microprocessor’s "instruction set." Programs in assembly and high-level languages are represented by instructions that use English- language-type statements. The programmer finds it relatively more convenient to write the programs in assembly or a high-level language than in machine language. However, a translator must be used to convert the assembly or high-level programs into binary machine language so that the microprocessor can execute the programs. This is shown in Figure 6.30.
An assembler translates a program written in assembly language into a machine
language program. A compiler or interpreter, on the other hand, converts a high-level language program such as C or C++ into a machine language program. Assembly or high level language programs are called "source codes." Machine language programs are known as "object codes." A translator converts source codes to object codes. Next, we discuss the three main types of programming language in more detail.
6.6.2 Machine Language
A microprocessor has a unique set of machine language instructions defined by its manufacturer. No two microprocessors by two different manufacturers have the same machine language instruction set. For example, the Intel 8086 microprocessor uses the code OlD8 16 for its addition instruction whereas the Motorola 68000 uses the code D282 16• Therefore, a machine language program for one microcomputer will not usually run on another microcomputer of a different manufacturer.
At the most elementary level, a microprocessor program can be written using its instruction set in binary machine language. As an example, a program written for adding two numbers using the Intel 8086 machine language is
Obviously, the program is very difficult to understand, unless the programmer remembers all the 8086 codes, which is impractical. Because one finds it very inconvenient to work with I’sand O’s, it is almost impossible to write an error-free program at the first try. Also, it is very tiring for the programmer to enter a machine language program written in binary into the microcomputer’s RAM. For example, the programmer needs a number of binary switches to enter the binary program. This is definitely subject to errors.
To increase the programmer’s efficiency in writing a machine language program,
hexadecimal numbers rather than binary numbers are used. The following is the same addition program in hexadecimal, using the Intel 8086 instruction set:
It is easier to detect an error in a hexadecimal program, because each byte contains only two hexadecimal digits. One would enter a hexadecimal program using a hexadecimal
keyboard. A keyboard monitor program in ROM, usually offered by the manufacturer, provides interfacing of the hexadecimal keyboard to the microcomputer. This program converts each key actuation into binary machine language in order for the microprocessor to understand the program. However, programming in hexadecimal is not normally used.
6.6.3 Assembly Language
The next programming level is to use the assembly language. Each line in an assembly language program includes four fields:
I. Label field
2. Instruction, mnemonic, or op-code field
3. Operand field
4. Comment field
As an example, a typical program for adding two 16-bit numbers written in 8086 assembly language is
Obviously, programming in assembly language is more convenient than programming in machine language, because each mnemonic gives an idea of the type of operation it is supposed to perform. Therefore, with assembly language, the programmer Does not have to find the numerical op-codes from a table of the instruction set, and programming efficiency is significantly improved.
The assembly language program is translated into binary via a program called
an "assembler." The assembler program reads each assembly instruction of a program as ASCII characters and translates them into the respective binary op-codes. As an example, consider the HLT instruction for the 8086. Its binary op-code is 1111 0100. An assembler would convert HLT into 111 0100 as shown in Figure 6.31.
An advantage of the assembler is address computation. Most programs use addresses
within the program as data storage or as targets for jumps or calls. When programming in machine language, these addresses must be calculated by hand. The assembler solves this problem by allowing the programmer to assign a symbol to an address. The programmer may then reference that address elsewhere by using the symbol. The assembler computes the actual address for the programmer and fills it in automatically. One can obtain hands-
on experience with a typical assembler for a microprocessor by D0wnloading it from the Internet.
Most assemblers use two passes to assemble a program. This means that they read the input program text twice. The first pass is used to compute the addresses of all labels in the program. In order to find the address of a label, it is necessary to know the total length of all the binary code preceding that label. Unfortunately, however, that address may be needed in that preceding code. Therefore, the first pass computes the addresses of all labels and stores them for the next pass, which generates the actual binary code. Various types of assemblers are available today. We define some of them in the following paragraphs.
-
One-Pass Assembler. This assembler goes through the assembly language program once and translates it into a machine language program. This assembler has the problem of defining forward references. This means that a JUMP instruction using an address that appears later in the program must be defined by the programmer after the program is assembled.
-
Two-Pass Assembler. This assembler scans the assembly language program twice. In the first pass, this assembler creates a symbol table. A symbol table consists of labels with addresses assigned to them. This way labels can be used for JUMP statements and no address calculation has to be D0ne by the user. On the second pass, the assembler translates the assembly language program into the machine code. The two-pass assembler is more desirable and much easier to use.
-
Macroassembler. This type of assembler translates a program written in macrolanguage into the machine language. This assembler lets the programmer define all instruction sequences using macros. Note that, by using macros, the programmer can assign a name to an instruction sequence that appears repeatedly in a program. The programmer can thus avoid writing an instruction sequence that is required many times in a program by using macros. The macroassembler replaces a macroname with the appropriate instruction sequence each time it encounters a macroname.
It is interesting to see the difference between a subroutine and a macroprogram. A specific subroutine occurs once in a program. A subroutine is executed by CALLing it from a main program. The program execution jumps out of the main program and then executes the subroutine. At the end of the subroutine, a RET instruction is used to resume program execution following the CALL SUBROUTINE instruction in the main program. A macro, on the other hand, D0es not cause the program execution to branch out of the main program. Each time a macro occurs, it is replaced with the appropriate instruction sequence in the main program. Typical advantages of using macros are shorter source programs and better program D0cumentation. A disadvantage is that effects on registers and flags may not be obvious.
Conditional macroassembly is very useful in determining whether or not an
instruction sequence is to be included in the assembly depending on a condition that is true or false. If two different programs are to be executed repeatedly based on a condition that can be either true or false, it is convenient to use conditional macros. Based on each condition, a particular program is assembled. Each condition and the appropriate program are typically included within IF and ENDIF pseuD0-instructions.
-
Cr-oss Assembler. This type of assembler is typically resident in a processor and
assembles programs for another for which it is written. The cross assembler program is written in a high-level language so that it can run on different types of processors that understand the same high-level language.
-
Resident Assembler. This type of assembler assembles programs for a processor
in which it is resident. The resident assembler may slow D0wn the operation of the processor on which it runs.
-
Meta-assembler. This type of assembler can assemble programs for many different types of processors. The programmer usually defines the particular processor being used.
As mentioned before, each line of an assembly language program consists of four fields: label, mnemonic or op-code, operand, and comment. The assembler ignores the comment field but translates the other fields. The label field must start with an uppercase alphabetic character. The assembler must know where one field starts and another ends. Most assemblers allow the programmer to use a special symbol or delimiter to indicate the beginning or end of each field. Typical delimiters used are spaces, commas, semicolons, and colons:
-
Spaces are used between fields.
-
Commas (,) are used between addresses in an operand field.
-
A semicolon (;) is used before a comment.
-
A colon (:) or no delimiter is used after a label.
To handle numbers, most assemblers consider all numbers as decimal numbers unless specified. Most assemblers will also allow binary, octal, or hexadecimal numbers. The user must define the type of number system used in some way. 1his is usually D0ne by using a letter following the number. Typical letters used are
-
B for binary
-
Q for octal
-
H for hexadecimal
Assemblers generally require hexadecimal numbers to start with a digit. A 0 is typically used if the first digit of the hexadecimal number is a letter. This is D0ne to distinguish between numbers and labels. For example, most assemblers will require the number ASH to be represented as OA5H.
Assemblers use pseuD0-instructions or directives to make the formatting of the edited text easier. These pseuD0-instructions are not directly translated into machine language instructions. They equate labels to addresses, assign the program to certain areas of memory, or insert titles, page numbers, and so on. To use the assembler directives or pseuD0-instructions, the programmer puts them in the op-code field, and, if the pseuD0 instructions require an address or data, the programmer places them in the label or data field. Typical pseuD0-instructions are ORIGIN (ORG), EQUATE (EQU), DEFINE BYTE (DB), and DEFINE WORD (DW).
ORIGIN (ORG)
The pseuD0-instruction ORG lets the programmer place the programs anywhere in memory. Internally, the assembler maintains a program-counter-type register called the "address counter." This counter maintains the address of the next instruction or data to be processed.
An ORG pseuD0-instruction is similar in concept to the JUMP instruction. Recall that the JUMP instruction causes the processor to place a new address in the program counter. Similarly, the ORG pseuD0-instruction causes the assembler to place a new value in the address counter.
Typical ORG statements are
ORG 7000H
CLC
The 8086 assembler will generate the following code for these statements:
7000 F8
Most assemblers assign a value of zero to the starting address of a program if the programmer D0es not define this by means of an ORG.
Equate (EQU)
The pseuD0-instruction EQU assigns a value in its operand field to an address in its label field. This allows the user to assign a numeric value to a symbolic name. The user can then use the symbolic name in the program instead of its numeric value. This reduces errors.
A typical example ofEQU is START EQU 0200H, which assigns the value 0200 in hexadecimal to the label START. Another example is
In this example, the EQU gives PORTA the value 40 hex, and FF hex is the data to be written into register AL by MOV AL, OFFH. OUT PORTA, AL then outputs this data FF hex to port 40, which has already been equated to PORTA before.
Note that, if a label in the operand field is equated to another label in the label field, then the label in the operand field must be previously defined. For example, the EQU statement
BEGIN EQU START
will generate an error unless START is defined previously with a numeric value.
Define Byte (DB)
The pseuD0-instruction DB is usually used to set a memory location to certain byte value. For example,
START DB 45H
will store the data value 45 hex to the address START.
With some assemblers, the DB pseuD0-instruction can be used to generate a table of data as follows:
In this case, 20 hex is the first data of the memory location 7000; 30 hex, 40 hex, and 50 hex occupy the next three memory locations. Therefore, the data in memory will look like this:
Note that some assemblers use DC.B instead of DB. DC stands for Define Constant.
Define Word (DW)
The pseuD0-instruction DW is typically used to assign a 16-bit value to two memory locations. For example,
will assign C2 to location 7000 and 4A to location 700 I. It is assumed that the assembler will assign the low byte first (C2) and then the high byte (4A).
With some assemblers, the DW pseuD0-instruction can be used to generate a table of 16-bit data as follows:
In this case, the three 16-bit values 5000H, 6000H, and 7000H are assigned to memory locations starting at the address 8000H. That is, the array would look like this:
Note that some assemblers use DC.W instead ofDW.
Assemblers also use a number of housekeeping pseuD0-instructions. Typical housekeeping pseuD0-instructions are TITLE, PAGE, END, and LIST. The following are the housekeeping pseuD0-instructions that control the assembler operation and its program listing.
TITLE prints the specified heading at the top of each page of the program listing. For example,
TITLE "Square Root Algorithm"
will print the name "Square Root Algorithm" on top of each page.
PAGE skips to the next line.
END indicates the end of the assembly language source program.
LIST directs the assembler to print the assembler source program.
In the following, assembly language instruction formats, instruction sets, and addressing modes available with typical microprocessors will be discussed.
Assembly Language Instruction Formats
Depending on the number of addresses specified, we have the following instruction
formats:
-
Three address
-
Two address
-
One address
-
Zero address
Because all instructions are stored in the main memory, instruction formats are designed in such a way that instructions take less space and have more processing capabilities. It should be emphasized that the microprocessor architecture has considerable influence on a specific instruction format. The following are some important technical points that have to be considered while designing an instruction format:
-
The size of an instruction word is chosen in such a way that it facilitates the specification of more operations by a designer. For example, with 4- and 8-bit op-code fields, we can specify 16 and 256 distinct operations respectively.
-
Instructions are used to manipulate various data elements such as integers, floating point numbers, and character strings. In particular, all programs written in a symbolic language such as C are internally stored as characters. Therefore, memory space will not be wasted if the word length of the machine is some integral multiple of the number of bits needed to represent a character. Because all characters are represented using typical 8-bit character codes such as ASCII or EBCDIC, it is desirable to have 8-, I 6-, 32-, or 64-bit words for the word length.
-
The size of the address field is chosen in such a way that a high resolution is guaranteed. Note that in any microprocessor, the ultimate resolution is a bit. Memory resolution is function of the instruction length, and in particular, short instructions provide less resolution. For example, in a microcomputer with 32K 16-bit memory words, at least 19 bits are required to access each bit of the word. (This is because 2 15 = 32K and 24 = 16)
The general form of a three address instruction is shown below:
<op-code> Addrl, Addr2, Addr3
Some typical three-address instructions are
In this specification, all alphabetic characters are assumed to represent memory addresses, and the string that begins with the letter R indicates a register. The third address of this type of instruction is usually referred to as the "destination address." The result of an operation is always assumed to be saved in the destination address.
Typical programs can be written using these. three address instructions. For example, consider the following sequence of three address instructions
This sequence implements the statement z = A * B + C * D – E * F. The three-address format is normally used by 32-bit microprocessors in addition to the other formats.
If we drop the third address from the three-address format, we obtain the two address format. Its general form is
<op-code> Addrl, Addr2
Some typical two-address instructions are
In this format, the addresses Addrl and Addr2 respectively represent source and destination addresses. The following sequence of two-address instructions is equivalent to the program using three-address format presented earlier:
This format is preD0minant in typical general-purpose microprocessors such as the Intel 8086 and the Motorola 68000. Typical 8-bit microprocessors such as the Intel 8085 and the Motorola 6809 are accumulator based. In these microprocessors, the accumulator register is assumed to be the destination for all arithmetic and logic operations. Also, this register always holds one of the source operands. Thus, we only need to specify one address in the instruction, and therefore, this idea reduces the instruction length. The one-address format is preD0minant in 8-bit microprocessors. Some typical one-address instructions are
In this program, Tl and T2 represent the addresses of memory locations used to store temporary results. Instructions that D0 not require any addresses are called "zero address instructions." All microprocessors include some zero-address instructions in the instruction set. Typical examples of zero-address instructions are CLC (clear carry) and NOP.
Typical Assembly Language Instruction Sets
An instruction set of a specific microprocessor consists of all the instructions that it can execute. The capabilities of a microprocessor are determined, to some extent, by the types of instructions it is able to perform. Each microprocessor has a unique instruction set designed by its manufacturer to D0 a specific task. We discuss some of the instructions that are common to all microprocessors. We will group chunks of these instructions together which have similar functions. These instructions typically include
• Data Processing Instructions. These operations perform actual data manipulations.
The instructions typically include arithmetic/logic operations and increment/ decrement and rotate/shift operations. Typical arithmetic instructions include ADD, SUBTRACT, COMPARE, MULTIPLY, AND DIVIDE. Note that the SUBTRACT
instruction provides the result and also affects the status flags while the COMPARE instruction performs subtraction without any result and affects the flags based on the result. Typical logic instructions perform traditional Boolean operations such as AND, OR, and EXCLUSIVE-OR. The AND instruction can be used to perform a masking operation. If the bit value in a particular bit position is desired in a word, the
word can be logically ANDed with appropriate data to accomplish this. For example, the bit value at bit 2 of an 8-bit number 0100 1Y 10 (where unknown bit value of Y is to be determined) can be obtained as follows:
If the bit value Y at bit 2 is 1, then the result is nonzero (Flag Z=O); otherwise, the result is zero (Flag Z= 1) . The Z flag can be tested using typical conditional JUMP instructions such as JZ (Jump if Z= 1) or JNZ(Jump if Z=O) to determine whether Y is 0 or 1. This is called masking operation. The AND instruction can also be used to determine whether a binary number is ODD or EVEN by checking the Least Significant bit (LSB) of the number (LSB=O for even and LSB= 1 for odd). The OR instruction can typically be used to insert a 1 in a particular bit position of a binary number without changing the values of the other bits. For example, a 1 can be inserted using the OR instruction at bit number 3 of the 8-bit binary number 0 1 1 1 0 0 1 1 without changing the values of the other bits as follows:
-
Instructions for Controlling Microprocessor Operations. These instructions typically include those that set the reset specific flags and halt or stop the microprocessor.
-
Data Movement Instructions. These instructions move data from a register to memory and vice versa, between registers, and between a register and an I/O device.
-
Instructions Using Memory Addresses. An instruction in this category typically contains a memory address, which is used to read a data word from memory into a microprocessor register or for writing data from a register into a memory location. Many instructions under data processing and movement fall in this category.
-
Conditional and Unconditional JUMPS. These instructions typically include one of the following:
1. Unconditional JUMP, which always transfers the memory address specified in the instruction into the program counter.
2. Conditional JUMP, which transfers the address portion of the instruction into the
program counter based on the conditions set by one of the status flags in the flag register.
Typical Assembly Language Addressing Modes
One of the tasks performed by a microprocessor during execution of an instruction is the determination of the operand and destination addresses. The manner in which a microprocessor accomplishes this task is called the "addressing mode." Now, let us present the typical microprocessor addressing modes, relating them to the instruction sets of Motorola 68000.
An instruction is said to have "implied or inherent addressing mode" if it D0es not have any operand. For example, consider the following instruction: RTS, which means "return from a subroutine to the main program." The RTS instruction is a no-operand instruction. The program counter is implied in the instruction because although the program counter is not included in the RTS instruction, the return address is loaded in the program counter after its execution.
Whenever an instruction/operand contains data, it is called an "immediate mode"
instruction. For example, consider the following 68000 instruction:
ADD #15, D0 D0 <- D0 + 15
In this instruction, the symbol # indicates to the assembler that it is an immediate mode
instruction. This instruction adds 15 to the contents of register D0 and then stores the result in D0. An instruction is said to have a register mode if it contains a register as opposed to a memory address. This means that the operand values are held in the microprocessor registers. For example, consider the following 68000 instruction:
ADD D1, D0 ; D0 <- D1 + D0
This ADD instruction is a two-operand instruction. Both operands (source and destination) have register mode. The instruction adds the 16-bit contents of D0 to the 16-bit contents ofD1 and stores the 16-bit result in D0.
An instruction is said to have an absolute or direct addressing mode if it contains a memory address in the operand field. For example, consider the 68000 instruction
ADD 3000, D2
This instruction adds the 16-bit contents of memory address 3000 to the 16- bit contents of D2 and stores the 16-bit result in D2. The source operand to this ADD instruction contains 3000 and is in absolute or direct addressing mode. When an instruction specifies a microprocessor register to hold the address, the resulting addressing mode is known as the "register indirect mode." For example, consider the 68000 instruction:
CLR (AO)
This instruction clears the 16-bit contents of a memory location whose address is in register AO to zero. The instruction is in register indirect mode.
The conditional branch instructions are used to change the order of execution of a program based on the conditions set by the status flags. Some microprocessors use conditional branching using the absolute mode. The op-code verifies a condition set by a particular status flag. If the condition is satisfied, the program counter is changed to the value ofthe operand address (defined in the instruction). If the condition is not satisfied, the program counter is incremented, and the program is executed in its normal order.
Typical 16-bit microprocessors use conditional branch instructions. Some conditional branch instructions are 16 bits wide. The first byte is the op-code for checking a particular flag. The second byte is an 8-bit offset, which is added to the contents of the program eounter if the condition is satisfied to determine the effective address. This offset is considered as a signed binary number with the most significant bit as the sign bit. It means that the offset can vary from -12810 to +127 10 (0 being positive). This is called relative mode.
Consider the following 68000 example, which uses the branch not equal (BNE) instruction:
BNE 8
Suppose that the program counter contains 2000 (address of the next instruction to be executed) while executing this BNE instruction. Now, if Z = 0, the microprocessor will load 2000 + 8 = 2008 into the program counter and program execution resumes at address 2008. On the other hand, if Z = 1, the microprocessor continues with the next instruction.
In the last example the program jumped forward, requiring positive offset. An example for branching with negative offset is
BNE -14
Therefore, to branch backward to 1FF6 16, the assembler uses an offset of F2 following the op-code for BNE.
An advantage of relative mode is that the destination address is specified relaive to the address of the instruction after the instruction. Since these conditional Jump instructions D0 not contain an absolute address, the program can be placed anywhere in memory which can still be excuted properly by the microprocessor. A program which can be placed anywhere in memory, and can still run correctly is called a "relocatable" program. It is a good practice to write relocatable programs.
Subroutine Calls in Assembly Language
It is sometimes desirable to execute a common task many times in a program. Consider the case when the sum of squares of numbers is required several times in a program. One could write a sequence of instructions in the main program for carrying out the sum of squares every time it is required. This is all right for short programs. For long programs, however, it is convenient for the programmer to write a small program known as a "subroutine" for performing the sum of squares, and then call this program each time it is needed in the main program.
Therefore, a subroutine can be defined as a program carrying out a particular function that can be called by another program known as the "main program." The subroutine only needs to be placed once in memory starting at a particular memory location. Each time the main program requires this subroutine, it can branch to it, typically by using a jump to subroutine (JSR) instruction along with its starting address. The subroutine is then executed. At the end of the subroutine, a RETURN instruction takes control back to the main program.
The 68000 includes two subroutine call instructions. Typical examples include JSR 4 00 0 and BSR 2 4. JSR 4 0 0 0 is an instruction using absolute mode. In response to the execution of JSR, the 68000 saves (pushes) the current program counter contents (address of the next instruction to be executed) onto the stack. The program counter is then
loaded, with 4000 included in the JSR instruction. The starting address of the subroutine is 4000. The RTS (return from subroutine) at the end of the subroutine reads (pops) the return address saved into the stack before jumping to the subroutine into the program counter. The program execution thus resumes in the main program. BSR 2 4 is an instruction using relative mode. This instruction works in the same way as the JSR 4 0 0 0 except that displacement 2 4 is added to the current program counter contents to jump to the subroutine.
The stack must always be balanced. This means that a PUSH instruction in a
subroutine must be followed by a POP instruction before the RETURN from subroutine instruction so that the stack pointer points to the right return address saved onto the stack. This will ensure returning to the desired location in the main program after execution of the subroutine. If multiple registers are PUSHED in a subroutine, one must POP them in the reverse order before the subroutine RETURN instruction.
6.6.4 High-Level Languages
As mentioned before, the programmer’s efficiency with assembly language increases significantly compared to machine language. However, the programmer needs to be well acquainted with the microprocessor’s architecture and its instruction set. Further, the programmer has to provide an op-code for each operation that the microprocessor has to carry out in order to execute a program. As an example, for adding two numbers, the programmer would instruct the microprocessor to load the first number into a register, add the second number to the register, and then store the result in memory. However, the programmer might find it tedious to write all the steps required for a large program. Also, to become a reasonably good assembly language programmer, one needs to have a lot of experience.
High-level language programs composed of English-language-type statements rectify all these deficiencies of machine and assembly language programming. The programmer D0es not need to be familiar with the internal microprocessor structure or its instruction set. Also, each statement in a high-level language corresponds to a number of assembly or machine language instructions. For example, consider the statement F = A + B written in a high-level language called FORTRAN. This single statement adds the contents of A with B and stores the result in F. This is equivalent to a number of steps in machine or assembly language, as mentioned before. It should be pointed out that the letters A, B, and F D0 not refer to particular registers within the microprocessor. Rather, they are memory locations.
A number of high-level languages such as C and C++ are widely used these days. Typical microprocessors, namely, the Intel 8086, the Motorola 68000, and others, can be programmed using these high-level languages. A high-level language is a problem oriented language. The programmer D0es not have to know the details of the architecture of the microprocessor and its instruction set. Basically, the programmer follows the rules of the particular language being used to solve the problem at hand. A second advantage is that a program written in a particular high-level language can be executed by two different microcomputers, provided they both understand that language. For example, a program written in C for an Intel 8086-based microcomputer will run on a Motorola 68000-based microcomputer because both microprocessors have a compiler to translate the C language into their particular machine language; minor modifications are required for input/output programs.
As mentioned before, like the assembly language program, a high-level language program requires a special program for converting the high-level statements into object codes. This program can be either an interpreter or a compiler. They are usually very large programs compared to assemblers.
An interpreter reads each high-level statement such as F = A + Band directs the microprocessor to perform the operations required to execute the statement. The interpreter converts each statement into machine language codes but D0es not convert the entire program into machine language codes prior to execution. Hence, it D0es not generate an object program. Therefore, an interpreter is a program that executes a set of machine language instructions in response to each high-level statement in order to carry out the function. A compiler, however, converts each statement into a set of machine language instructions and also produces an object program that is stored in memory. This program must then be executed by the microprocessor to perform the required task in the high level program. In summary, an interpreter executes each statement as it proceeds, without generating an object code, whereas a compiler converts a high-level program into an object program that is stored in memory. This program is then executed. Compilers normally provide inefficient machine codes because of the general guidelines that must be followed for designing them. C, C++, and Java are the only high-level languages that include Input/ Output instructions. However, the compiled codes generate many more lines of machine code than an equivalent assembly language program. Therefore, the assembled program will take up less memory space and will execute much faster compared to the compiled C, C++, or Java codes. 110 programs written inC are compared with assembly language programs written in 8086 and 68000 in Chapters 9 and 10. C language is a popular high levellanguage, the C++ language, based on C, is also very popular, and Java, developed by Sun Microsystems, is gaining wide acceptance.
Therefore, one of the main uses of assembly language is in writing programs for real-time applications. "Real-time" means that the task required by the application must be completed before any other input to the program can occur which will change its operation. Typical programs involving non-real-time applications and extensive mathematical computations may be written inC, C++, or Java. A brief description of these languages is given in the following.
C Language
The C Programming language was developed by Dennis Ritchie of Bell Labs in 1972. C has become a very popular langu,age for many engineers and scientists, primarily because it is portable except for 110 and however, can be used to write programs requiring 110 operations with minor modifications. This means that a program written in C for the 8086 will run on the 68000 with some modifications related to 110 as long as C compilers for both microprocessors are available.
C is case sensitive. This means that uppercase letters are different from lowercase letters. Hence Start and start are two different variables. Cis a general-purpose programming language and is found in numerous applications as follows:
-
Systems Programming. Many operating systems, compilers, and assemblers are written in C. Note that an operating system typically is included with the personal computer when it is purchased. The operating system provides an interface between the user and the hardware by including a set of commands to select and execute the software on the system
-
Computer-Aided Design (CAD) Applications. CAD programs are written in C. Typical tasks to be accomplished by a CAD program are logic synthesis and simulation.
-
Numerical Computation. To solve mathematical problems such as integration and differentiation
-
Other Applications. These include programs for printers and floppy disk controllers, and digital control algorithms using single-chip microcomputers.
A C program may be viewed as a collection of functions. Execution of a C program will always begin by a call to the function called "main." This means that all C programs should have its main program named as main. However, one can give any name to other functions.
Here, #include is a preprocessor directive for the C language compiler. These directives give instructions to the compiler that are performed before the program is compiled. The directive #include <stdio. h> inserts additional statements in the program. These statements are contained in the file stdio.h. The file s tdio. h is included with the standard C library. The stdio. h file contains information related to the input/ output statement.
The n in the last line of the program is C notation for the newline character. Upon printing, the cursor moves forward to the left margin on the next line. print f never supplies a newline automatically. Therefore, multiple printf’s may be used to output "I wrote a C-program" on a single line in a few steps. The escape sequence n can be used to print three statements on three different lines. An illustration is given in the following:
All variables inC must be declared before use, normally at the start of the function before any executable statements. The compiler provides an error message if one forgets a declaration. A declaration includes a type and a list of variables that have that type. For example, the declaration in t a, b implies that the variables a and b are integers. Next, write a program to add and subtract two integers a and b where a= 100 and b = 200. The C program is
The %din the printf statement represents "decimal integer."Note that printf is not part of the C language; there is no input or output defined inC itself. printf is a function that is contained in the standard library of routines that can be accessed by C programs. The values of a and b can be entered via the keyboard by using the scanf function. The scanf allows the programmer to enter data from the keyboard. A typical expression for scanf is
This expression indicates that the two values to be entered via the keyboard are in decimal. These two decimal numbers are to be stored in addresses a and b. Note that the symbol & is an address operator.
The C program for adding and subtracting two integers a and b using scan f is
In summary, writing a working C program involves four steps as follows:
Step 1: Using a text editor, prepare a file containing the C code. This file is called the "source file."
Step 2 Preprocess the code. The preprocessor makes the code ready for compiling. The preprocessor looks through the source file for lines that start with a#. In the previous programming examples, #include <stdio. h> is a preprocessor. This preprocessor instruction copies the contents of the standard header file st dio .h into the source code. This header file stdio. h describes typical input/output functions such as scanf ( ) and printf ( ) functions.
Step 3: The compiler translates the preprocessed code into machine code. The output from the compiler is called object code.
Step 4: The linker combines the object file with code from the C libraries. For instance, in the examples shown here, the actual code for the library function print f ( ) is inserted from the standard library to the object code by the linker. The linker generates an executable file. Thus, the linker makes a complete program.
Before writing C programs, the programmer must make sure that the computer runs either the UNIX or MS-D0S operating system. Two essential programming tools are required. These are a text editor and a C compiler. The text editor is a program provided with a computer system to create and modify compiler files. The C compiler is also a program that translates C code into machine code. C++
C++ is a modified version ofC language. C++ was developed by Bjarne Stroustrup of Bell Labs in 1980. It includes all features of C and also supports object-oriented programming (OOP). A program can be divided into subprograms using OOP. Each subprogram is an independent object with its own instructions and data. Thus, complexity of programming is reduced. It is therefore easier for the programmer to manage larger programs.
All OOP languages including C++, have three characteristics: encapsulation, polymorphism, and inheritance. Encapsulation is a technique that keeps code and data together in such a way that they are protected form outside interference and misuse. A subprogram thus created is called an "object."
Code, data, or both may be private or public. Private code and/or data may be accessed by another part of the same object. On the other hand, public code and/or data may be accessed by a program resident outside the object containing them. One of the most important characteristic of C++ is the class. The class declaration is a technique for creating an object. Note that a class consists of data and functions.
Encapsulation is available with C to some extent. For example, when a library function such as printf is used, one uses a black box program. When printf is used, several internal variables are created and intialized that are not accessible to the programmer.
Polymorphism (from Greek word meaning "several forms") allows one to define a general class of actions. Within a general class, the specific action is determined by the type of data. For example, in C, the absolute value actions abs ( ) and f abs ( ) compute the absolute values of an integer and a floating point number respectively. In C++, on the other hand, one absolute value action, abs ( ) is used for both data types. The type of data is then used to call abs ( ) to determine which specific version of the function is actually used. Thus, one function name for two different data items is used.
Inheritance is the ability by which one class called subclass obtains the properties of another class called a superclass. Inheritance is convenient for code reusability. Inheritance supports hierarchy classes.
Following are some basic differences between C and C++:
I. InC, one must use void with the prototype for a function with no arguments. For example, in C, the prototype int rand (void) ; returns an integer that is a ranD0m number.
In C++, the void is optional. Therefore, in C++, the prototype for rand ( ) can be written as int rand ( ) ; . Of course, int rand (void); is a valid prototype in C++. This means that both prototypes are allowed in C++
2. C++ can use the C type of comment mechanism. That is, a comment can start with I* and end with *I.C++ can also use a simple line comment that starts with a I I and stops at the end of the line terminated by a carriage return. Typically, C++ uses C-like comments for multiline comments and the C++ comment mechanism for short comments.
3. In C++, local variables can be declared anywhere. In contrast, in C, local variables must be declared at the start of a block before any action statements.
4. In C++, all functions need to be prototyped. InC, prototypes are optional.
Note that a function prototype allows the compiler to check that the function is called with the proper number and types of arguments. It also tells the compiler the type of value that the function is supposed to return. In C, if the function prototype is omitted, the compiler will return an integer. An example of a prototype function is int abs (int n) , this provides an integer that is an absolute value of n.
Java
Introduced in 1991 by Sun MicroSystems, Java is based on C++ and is a true
object oriented language. That is, everything in a Java program is an object and everything is obtained from a single object class.
A Java program must include at least one class. A class includes data type declarations and statements. Every Java standalone program requires a main method at the beginning. Java only supports class methods and not separate functions. There is no preprocessor in Java. However, there is an import statement, which is similar to the
#include preprocessor statement in C. The purpose of the import statement in Java is
to instruct the interpreter to load the class, which exists in another compilation statement. Java uses the same comment syntax, I* *I and I I,as C and C++. In addition, a special comment syntax, I** *I,that can precede declarations is used in Java.
Java D0es not require pointers. In C, a pointer may be substituted for the array name to access array elements. In Java, arrays are created by using the "new" operator by including the size of the array in the new expression (rather than in the declaration) as follows:
Also, all arrays store the specified size in a variable named length as follows:
Therefore, in Java, arrays and strings are not subject to the errors or confusion that is common to arrays and strings in C.