11.7.2 Motorola MC68030
The MC68030 is a virtual memory microprocessor based on the MC68020 with additional features. The MC68030 is designed by using HCMOS technology and can be operated at clock rates of 16.67 and 33 MHz. The MC68030 contains all features of the MC68020, plus some additional ones. The basic differences between the MC68020 and MC68030 are as follows:
11.7.3 Motorola MC68040 I MC68060
This section presents an overview of the Motorola MC68040 and MC 68060 32-bit microprocessors. The MC68040 is Motorola’s enhanced 68030, 32-bit microprocessor, implemented in HCMOS technology. Providing balance between speed, power, and physical device size, the MC68040 integrates on-chip MC68030-compatible integer unit, an MC68881/ MC68882-compatible floating-point unit (FPU), dual independent demand paged memory management units (MMUs) for instruction and data stream accesses, and an independent 4 KB instruction and data cache. A high degree of instruction execution parallelism is achieved through the use of multiple independent execution pipelines, multiple internal buses, and separate physical caches for both instruction and data accesses. The MC68040 also includes 32-bit nonmultiplexed external address and data buses.
The MC68060 is a superscalar ( two instructions per cycle) 32-bit microprocessor. The 68060, like the Pentium, is designed using a combination of RISC and CISC architectures to obtain high performance. For some reason, Motorola does not offer MC68050 microprocessor. The 68060 is fully compatible with the 68040 in the user mode. The 68060 can operate at 50- and 66-MHz clocks with performance much faster than the 68040. An striking feature of the 68060 is the power consumption control. The 68060 is designed using static HCMOS to reduce power during normal operation.
11.7.4 PowerPC Microprocessor
This section provides an overview of the hardware, software, and interfacing features associated with the RISC microprocessor called the PowerPC. Finally, the basic features of both 32-bit and 64-bit PowerPC microprocessors are discussed
Basics of RISC
RISC is an acronym for Reduced Instruction Set Computer. This type of microprocessor emphasizes simplicity and efficiency. RISC designs start with a necessary and sufficient instruction set. The purpose of using RISC architecture is to maximize speed by reducing clock cycles per instruction. Almost all computations can be obtained from a few simple operations. The goal of RISC architecture is to maximize the effective speed of a design by performing infrequent operations in software and frequent functions in hardware, thus obtaining a net performance gain. The following summarizes the typical features of a RISC microprocessor:
1. The RISC microprocessor is designed using hardwired control with little or no microcode. Note that variable-length instruction formats generally require microcode design. All RISC instructions have fixed formats, so microcode design is not necessary.
2. A RISC microprocessor executes most instructions in a single cycle.
3. The instruction set of a RISC microprocessor typically includes only register, load, and store instructions. All instructions involving arithmetic operations use registers, and load and store operations are utilized to access memory.
4. The instructions have a simple fixed format with few addressing modes.
5. A RISC microprocessor has several general-purpose registers and large cache memones.
6. A RISC microprocessor processes several instructions simultaneously and thus includes pipelining.
7. Software can take advantage of more concurrency. For example, Jumps occur after execution of the instruction that follows. This allows fetching of the next instruction during execution of the current instruction.
RISC microprocessors are suitable for embedded applications. Embedded microprocessors or controllers are embedded in the host system. This means that the presence and operation of these controllers are basically hidden from the host system. Typical embedded control applications include office automation systems such as laser printers. Since a laser printer requires a high performance microprocessor with on-chip floating-point hardware, RISC microprocessors such as PowerPC are ideal for these types of applications.
RJSC microprocessors are well suited for applications such as image processing, robotics, graphics, and instrumentation. The key features of the RJSC microprocessors that make them ideal for these applications are their relatively low level of integration in the chip and instruction pipeline architecture. These characteristics result in low power consumption, fast instruction execution, and fast recognition of interrupts. Typical 32- and 64-bit RJSC microprocessors include PowerPC microprocessors.
IBM/Motorola/Apple PowerPC 601
This section provides an overview of the basic features of PowerPC microprocessors. The PowerPC 601 was jointly developed by Apple, IBM, and Motorola. It is available from IBM as PP 601 and from Motorola as MPC 601. The PowerPC 601 is the first implementation of the PowerPC family of Reduced Instruction Set Computer (RJSC) microprocessors. There are two types of PowerPC implementations: 32-bit and 64-bit. The PowerPC 601 implements the 32-bit portion of the IBM PowerPC architectures and Motorola 88100 bus control logic. It includes 32-bit effective (logical) addresses, integer data types of 8, 16, and 32 bits, and floating-point data types of 32 and 64 bits. For 64-bit PowerPC implementations, the PowerPC architecture provides 64-bit integer data types, 64-bit addressing, and other features necessary to complete the 64-bit architecture.
The 60 I is a pipelined superscalar processor and is capable of executing three instructions per clock cycle. A pipelined processor is one in which the processing of an instruction is broken down into discrete stages, such as decode, execute, and write-back (the result of the operation is written back in the register file).
Because the tasks required to process an instruction are broken into a series of tasks, an instruction does not require the entire resources of an execution unit. For example, after an instruction completes the decode stage, it can pass on to the next stage, and the subsequent instruction can advance into the decode stage. This improves the throughput of the instruction flow. For example, it may take three cycles for an integer instruction to complete, but if there are no stalls in the integer pipeline, a series of integer instructions can have a throughput of one instruction per cycle. Each unit is kept busy in each cycle.
A superscalarprocessor is one in which multiple pipelines are provided to allow instructions to execute in parallel. The PowerPC 60 I includes three execution units: a 32-bit integer unit (IU), a branch processing unit (BPU), and a pipelined floating-point unit (FPU).
The PowerPC 60 I contains an on-chip, 32 KB unified cache (combined instruction and data cache) and an on-chip memory management unit (MMU). It has a 64-bit data bus and a 32-bit address bus. The 601 supports single-beat and four-beat burst data transfer for memory accesses. Note that a single-beat transaction indicates data transfer of up to 64 bits. The PowerPC 601 uses memory-mapped I/O. Input/output devices can also be interfaced to the PowerPC 601 by using the I/O controller. The 60 I is designed by using an advanced, CMOS process technology and maintains full compatibility with TTL devices.
The PowerPC 601 contains an on-chip real-time clock (RTC). The RTC was normally an I/O device completely outside the CPU in earlier microcomputers. Although the RTC appearing inside the microcomputer chip is common on single-chip microcomputers, this is the first time the RTC is implemented inside a top-of-the-line microprocessor such as the PowerPC. This implication is that modem multitasking operating systems require time keeping for task switching as well as keeping the calendar date. The 601 real-time clock (RTC) on-chip hardware provides a measure of real time in terms of time of day and date, with a calendar range of 136.19 years.
To specify the ordering of four bytes (ABCD) within 32 bits, the 601 can use either the ABCD (big-endian) or DCBA (little-endian) ordering. The 601 big- or little endian modes can be selected by setting the LM bit (bit 28) in the HIDO register. Note that big-endian ordering (ABCD) assigns the lowest address to the highest-order eight bits of the multibyte data. On the other hand, little-endian byte ordering (DCBA) assigns the lowest address to the lowest order (rightmost) 8 bits of the multibyte data.
Note that Motorola 68XXX microprocessors support big-endian byte ordering whereas Intel 80XXX microprocessors support little-endian byte ordering.
PowerPC 601 Registers
PowerPC 601 registers can be accessed depending on the program’s access privilege level (supervisor or user mode). The privilege level is determined by the privilege level (PR) bit in the machine status register (MSR). The supervisor mode of operation is typically used by the operating system, and user mode is used by the application software. The PowerPC 601 programming model contains user- and supervisor-level registers. Some of these are
-
The user-level register can be accessed by all software with either user or supervisor privileges.
-
The 32-bit GPRs (general-purpose registers, GPRO-GPR31) can be used as the data source or destination for all integer instructions. They can also provide data for generating addresses.
-
The 32-bit FPRs (floating-point registers, FPRO-FPR31) can be used as data sources and destinations for all floating-point instructions.
-
The floating-point status and control register (FPCSR) is a user control register in the floating-point unit (FPU). It contains floating-point status and control bits such as floating-point exception signal bits, exception summary bits, and exception enable bits.
-
The condition register (CR) is a 32-bit register, divided into eight 4-bit fields, CRO-CR7. These fields reflect the results of certain arithmetic operations and provide mechanisms for testing and branching.
-
The remaining user-level registers are 32-bit special purpose registers-SPR0, SPR1, SPR4, SPR5, SPR8, and SPR9.
-
SPRO is known as the MQ register and is used as a register extension to hold the product for the multiplication instructions and the dividend for the divide instructions. The MQ register is also used as an operand of long shift and rotate instructions.
-
SPRl is called the integer exception register (XER). The XER is a 32-bit register that indicates carries and overflow bits for integer operations. It also contains two fields for load string and compare byte indexed instructions.
-
SPR4 and SPR5 respectively represent two 32-bit read only registers and hold the upper (RTCU) and lower (RTCL) portions of the real-time clock (RTC). The RTCU register maintains the number of seconds from a time specified by software. The RTCL register maintains the fraction of the current second in nanoseconds. SPR8 is the 32-bit link register (LR). The link register can be used to provide the branch target address and to hold the return address after branch and link instructions.
-
SPR9 represents the 32-bit count register (CTR). The CTR can be used to hold a loop count that can be decremented during execution of certain branch instructions. The CTR can also be used to hold the target address for the branch conditional to count register instruction.
PowerPC 601 Addressing Modes
The effective address (EA) is the 32-bit address computed by the processor when executing a memory access or branch instruction or when fetching the next sequential instruction. Since the PowerPC is based on the RISC architecture, arithmetic and logical instructions do not read or modify memory.
Load and store operations have two types of effective address generation:
i) Register Indirect with Immediate Index Mode
Instructions using this mode contain a signed 16-bit index (d operand in the 32- bit instruction) which is sign extended to 32-bits, and added to the contents of a general purpose register specified by five bits in the 32-bit instruction (rA operand) to generate the effective address. A zero in the rA operand causes a zero to be added to the immediate index (d operand). The option to specify rA or 0 is shown in the instruction descriptions of the 601 user’s manual as the notation (rAIO).
An example is lbz rD,d (rA) where rA specifies a general-purpose register (GPR) containing an address, d is the the 16-bit immediate index and rD specifies a general purpose register as destination. Consider lb z r 1, 2 0 ( r 3) . The effective address (EA) is the sum r3+20. The byte in memory addressed by the EA is loaded into bits 31 through 24 of register rl. The remaining bits in rl are cleared to zero. Note that the registers r I and r3 represent GPR1 and GPR3 respectively.
ii) Register Indirect with Index Mode
Instructions using this addressing mode add the contents of two general-purpose registers (one GPR holds an address and another holds the index). An example is lbzx rD, rA, rB where rD specifies a GPR as destination, rA specifies a GPR as the index, and rB specifies a GPR holding an address. Consider lbzx rl, r4, r6. The effective address (EA) is the sum (r4IO)+(r6). The byte in memory adressed by the EA is loaded into register r 1 (24-31). The remaining bits in register rD are cleared to zero.
PowerPC 601 conditional and unconditional branch instructions compute the effective address (EA) or the next instruction address using various addressing modes A few of them are described below:
-
Branch Relative Branch instructions (32-bit wide) using the relative mode generate the address of the next instruction by adding an offset and the current program counter contents. An example of this mode is an instruction be start unconditionally jumps to the address PC + start.
-
Branch Absolute Branch instructions using this mode include the address of the next instruction to be executed. For example, the instruction ba begin unconditionally branches to the absolute address "begin" specified in the instruction.
-
Branch to Link Register Branch instructions using this mode branch to the address computed as the sum of the immediate offset and the address of the current instruction. The instruction address following the instruction is placed into the link register. For example, the instruction bl, start unconditionally jumps to the address computed from current PC contents plus start. The return address is placed in the link register.
-
Branch to Count Register Instructions using this mode branch to the address contained in the current register. Consider bet tr B0, BI means branch conditional to count register. This instruction branches conditionally to the address specified in the count register.
The BI operand specifies the bit in the condition register to be used as the condition of the branch. The B0 operand specifies how the branch is affected by or affects condition or count registers. Numerical values specifying BI and BO can be obtained from the 60 I manual.
Note that some instructions combine the link register and count register modes. An example is b cct r BO, B I .This instruction first performs the same operation as the bcttr and then places the instruction address following the instruction into the link register. This instruction is a form of "conditional call" because the return address is saved in the link register.
Typical PowerPC 601 Instructions
The 601 instructions are divided into the following categories:
1. Integer Instructions
2. Floating-point Instructions
3. Load/store Instructions
4. Flow Control Instructions
5. Processor Control Instructions
Integer instructions operate on byte (8-bit), half-word (16-bit), and word (32-bit) operands. Floating-point instructions operate on single-precision and double-precision floating-point operands.
Integer Instructions
The integer instructions include integer arithmetic, integer compare, integer rotate and shift, and integer logical instructions. The integer arithmetic instructions always set the integer exception register bit, CA, to reflect the carry out of bit 7. Integer instructions with the overflow enable (OE) bit set will cause the XER bits SO (summary overflow -overflow bit set due to exception) and OV (overflow bit set due to instruction execution) to be set to reflect overflow of the 32-bit result. Some examples of integer instructions are provided in the following. Note that rS, rD, rA, and rB in the following examples are 32-bit general purpose registers (GPRs) of the 601 and SIMM is 16-bit signed immediate number.
-
add rD, rA, SIMM performs the following immediate operation: rD +- (rAIO) + SIMM; rAIO) can be either (rA) or 0. An example is add rD, rA, SIMM or add rD, 0, SIMM.
-
add rD, rA, rB performs rD +- rA + rB.
-
add. rD, rA, rB adds with CR update as follows: rD +- rA + rB. The dot suffix enables the update of the condition register.
-
subf rD, rA, rB performs rD +- rB- rA.
-
sub r D, rA, r B performs the same operation as subf but updates the condition code register.
-
addme rD, rA performs the (add to minus one extended) operation: rD +- (rA) + FFFF FFFFH + CA bit in XER.
-
subfme rD, rA performs the (subtract from minus one extended) operation: rD +- (rA) + FFFF FFFFH + CA bit in XER, where (rA) represents the ones complement of the contents of rA.
-
mulhwu rD, rA, rB performs an unsigned multiplication of two 32-bit numbers in rA and rB. The high-order 32 bits of the 64-bit product are placed in rD.
-
mulhw rD, rA, rB performs the same operation as the mulhwu except that the multiplication is for signed numbers.
-
mullw rD, rA, rB places the low order 32-bits of the 64-bit product (rA)*(rB) into rD. The low-order 32-bit products are independent whether the operands are treated as signed or unsigned integers.
-
mulli rD, rA, SIMMplaces the low-order 32 bits of the 48-bitproduct(rA)*SIMM 16 into rD. The low-order bits of the 32-bit product are independent whether the operands are treated as signed or unsigned integers.
-
divw rD, rA, rB divides the 32-bit signed dividend in rA by the 32-bit signed divisor in rB. The 32-bit quotient is placed in rD and the remainder is discarded.
-
divwu rD, rA, rB is the same as the divw instruction except that the division is for unsigned numbers.
-
cmpi crfD, L, rA, SIMM compares 32 bits in rA with immediate SIMM treating operands as signed integer. The result of comparison is placed in crfd field (0 for CRO, I for CR 1, and so on) of the condition register. L=0 indicates 32-bit operands while L=l represents the 64-bit operands. For example, cmpi 0, 0, rA, 2 0 0 compares 32 bits in register rA with immediate value 200 and CRO is affected according to the comparison.
-
xor rA, rS, rB performs exclusive-or operation between the contents ofrS and rB. The result is placed into register rA.
-
extsb rA, rS places bits 24-31 ofrS into bits 24-31 ofrA. Bit 24 ofrS is then sign extended through bits 0-23 of rA.
-
slw rA, rS, rB shifts the contents ofrS left by the shift count specified by rB [27- 31]. Bits shifted out of position 0 are lost. Zeros are placed in the vacated positions on the right. The 32-bit result is placed into rA.
-
s rw rA, r S , r B is similar to s 1w r A, r S, r B except that the operation is for right shift.
Floating-Point Instructions
Some of the 601 floating-point instructions are provided below:
-
fadd frD, frA, frB adds the contents of the floating-point register, fr A to the contents of the floating-point register frB. If the most significant bit of the resultant significand is not a one, then the result is normalized. The result is rounded to the specified position under control of the FPSCR register. The result is rounded to the specified precision under control of the FPSCR register. The result is then placed in frD.
Note that this fadd instruction requires one cycle in execute stage, assuming normal operations; however, there is an execute stage delay of three cycles if the next instruction is dependent.
The 601 floating point addition is based on "exponent comparison and add by one" for each bit shifted, until the two exponents are equal. The two significands are then added algebraically to form an intermediate sum. If a carry occurs, the sum’s significand is shifted right one bit position and the exponent is increased by one.
-
f sub f r D, f rA, f r B performs frA – frB, normalization, and rounding of the result are performed in the same way as the f add instruction.
-
fmul frD1 frA 1 frC performs frD– frA * frC.Normalization and rounding of the result are performed in the same way as the fadd. Floating-point multiplication is based on exponent addition and multiplication of the significands.
-
fdiv frD , frA , frB performs the floating-point division frD — frNfrB. No remainder is provided. Normalization and rounding of the result are performed in the same way as the fadd instruction.
-
fmsub frDI frA1 FrC 1 frB performs frD <— frA * frC- frB. Normalization and rounding of the result are performed in the same way as the fadd instruction.
Load/Store Instructions
Some examples of the 60 I load and store instructions are
-
lhzx rD 1 rA 1 rB loads the half word (16 bits) in memory addressed by the sum (rAfO) + (rB) into bits 16 through 31 of rD. The remaining bits of rD are cleared to zero.
-
sthux rS 1 rA 1 rB stores the 16-bit halfword from bits 16-31 of register rS in memory addressed by the sum (rAfO) + (rB). The value (rAfO) + rB is placed into register rA.
-
lmw rD , d ( rA) loads n (where n = 32- D and D = 0 through 31) consecutive words starting at memory location addressed by the sum (r/0) + d into the general-purpose register specified by rD through r31.
-
stmu rS 1 d ( rA) is similar to lmw except that stmw stores n consecutive words.
Flow Control Instructions
Flow control instructions include conditional and unconditional branch instructions. An example of one of these instructions is
-
be (branch conditional) BO 1 BI 1 target branch with offset target if the condition bit in CR specified by bit number BI is true (The condition "true" is specified by a value inBO).
For example, be 12 1 0 1 target means that branch with offset target if the condition specified by bit 0 in CR (BI = 0 indicates the result is negative) is true (specified by the value BO = 12 according to Motorola PowerPC 601 manual).
Processor Control Instructions
Processor control instructions are used to read from and write to the machine state register (MSR), condition register (CR), and special status register (SPRs). Some examples of these instructions are
-
mfer rD places the contents of the condition register into rD.
-
mtmsr rS places the contents of rS into the MSR. This is a supervisor-level instruction.
-
mfimsr rD places the contents of MSR into rD. This is a supervisor-level instruction.
PowerPC 601 Exception Model
All 601 exceptions can be described as either precise or imprecise and either synchronous or asynchronous. Asynchronous exceptions are caused by events external to the processor’s execution. Synchronous exceptions, on the other hand, are handled precisely by the 601 and are caused by instructions; precise exception means that the machine state at the time the exception occurs is known and can be completely restored. That is, the instructions that invoke trap and system call exceptions complete execution before the exception is taken. When exception processing completes, execution resumes at the address of the next instruction.
An example of a maskable asynchronous, precise exception is the external interrupt. When an asynchronous, precise exception such as the external interrupt occurs, the 601 postpones its handling until all instructions and any exceptions associated with those instructions complete execution. System reset and machine check exceptions are two nonrnaskable exceptions that are asynchronous and imprecise. These exceptions may not be recoverable or may provide a limited degree of recoverability for diagnostic purpose.
Asynchronous, imprecise exceptions have the highest priority with the synchronous, precise exceptions having the next priority and the asynchronous, precise exceptions the lowest priority.
The 601 exception mechanism allows the processor to change automatically to supervisor state as a result of exceptions. When exceptions occur, information about the state of the processor is saved to certain registers rather than in memory as is usually done with other processors in order to achieve high speeds. The processor then begins execution at an address (exception vector) predetermined for each exception. The exception handler at the specified vector is then processed with processor in supervisor mode.
601 System Interface
The pins and signals of the PowerPC 601 include a 32-bit address bus and 52 control and information signals. Memory access allows transfer sizes of 8, 16, 24, 32, 40, 48, 56, or 64 bits in one bus clock cycle. Data transfer occurs in either single-beat transactions or four-beat burst transactions. Both memory and I/O accesses can use the same bus transfer protocols. The 601 also has the ability to define memory areas as I/O controller interface areas. The 601 uses the TS pin for memory-mapped accesses and the XATS pin for I/O controller interface accesses.
Summary of PowerPC 601 Features
The PowerPC 601 is a RlSC-based superscalar microprocessor. That is, it can execute two or more instructions per cycle. The PowerPC 601 is based on load/store architectures. This means that all instructions that access memory are either loads or stores, and all operate instructions are from register to register. Both load and store instructions have 32-bit fixed length instructions along with 32-bit integer and 32-bit floating-point registers.
The PowerPC 601 includes two primary addressing modes: register plus
displacement and register plus register. In addition, the 601 load and store instructions perform the load or store operation and also modify the index register by placing the effective address just computed. In the PowerPC 60 I, Branch target addresses are normally determined by using program counter relative mode. That is, the branch target address is determined by adding a displacement to the program counter. However, as mentioned before, conditional branches in the 601 may test fields in the condition code register and the contents of a special register called the count register (CTR). A single 601 branch instruction can implement a loop-closing branch by decrementing the CTR, testing its value, and branching if it is nonzero.
The PowerPC 601 saves the return address for certain control transfer instructions such as subroutine call in a general-purpose register. The 601 does this in any branch by setting the link (LK) bit to one. The return address is saved in the link register. The PowerPC 601 utilizes sophisticated pipelines. The 601 uses relatively short independent
pipelines with more buffering. The 601 does a lot of computation in each pipe stage. The 601 has a unified (combined) 32 KB cache. That is, instructions and data reside in the same cache in the 60 l. Finally, the 601 offers high performance by utilizing sophisticated design tricks. For example, the 601 includes powerful instructions such as floating-point multiply add and update load/store that perform more tasks with fewer instructions.
PowerPC 64-Bit Microprocessors
PowerPC 64-bit microprocessors include the PowerPC 620, 603e, 750/740, and 604e. These microprocessors are 64-bit superscalar processors. This means that they can execute more than one instruction in a cycle. Table 11.14 compares the basic features of the 32-bit PowerPC 601 with the 64-bit PowerPC 620.
There are a few versions of the 64-bit PowerPC available: PowerPC 603e, PowerPC 750/740, and PowerPC 604e. The PowerPC 603e microprocessor is available at speeds of 250, 275, and 300 MHz. The 603e has high performance and low power consumption, which makes it suited for applications found in the embedded system market. The PowerPC 603e is used in the Power Macintosh C500 series, which offers features such as accelerated multimedia, advanced video capture, and publishing. The PowerPC 750/740 is available at speeds up to 266 MHz and uses only 5 watts of power. The unique features offered by this microprocessor are built-in power-saving modes, an on-chip thermal sensor to regulate processor temperature, and a choice of packaging configurations. The PowerPC 604e microprocessor, another member of the PowerPC family, provides speeds of 350 MHz and using 8.0 watts of power. Like Intel, Motorola used the 0.25 micron process technology to achieve this speed. The PowerPC 604e is intended for high-end Macintosh and Mac-compatible systems.
Apple Computer’s original G3 (Marketing name used by Apple) utilized PowerPC 750 for Apple’s iMac and Power Macintosh personal computers. Apple’s G3 (later version) used Motorola’s copper-based PowerPC microprocessor, providing speed of up to 400 MHz.