THE PENTIUM AND PENTIUM PRO MICROPROCESSORS:PENTIUM MEMORY MANAGEMENT.

PENTIUM MEMORY MANAGEMENT

The memory-management unit within the Pentium is upward-compatible with the 80386 and 80486 microprocessors. Many of the features of these earlier microprocessors are basically unchanged in the Pentium. The main change is in the paging unit and a new system memory- management mode.

Paging Unit

The paging mechanism functions with 4K-byte memory pages or with a new extension available to the Pentium with 4M-byte memory pages. As detailed in Chapters 1 and 17, the size of the paging table structure can become large in a system that contains a large memory. Recall that to fully repage 4G bytes of memory, the microprocessor requires slightly over 4M bytes of memory just for the page tables. In the Pentium, with the new 4M-byte paging feature, this is dramatically reduced to just a single page directory and no page tables. The new 4M-byte page sizes are selected by the PSE bit in control register 0.

The main difference between 4K paging and 4M paging is that in the 4M paging scheme there is no page table entry in the linear address. See Figure 18–10 for the 4M paging system in the Pentium microprocessor. Pay close attention to the way the linear address is used with this scheme. Notice that the leftmost 10 bits of the linear address select an entry in the page directory (just as with 4K pages). Unlike 4K pages, there are no page tables; instead, the page directory addresses a 4M-byte memory page.

Memory-Management Mode

The system memory-management mode (SMM) is on the same level as protected mode, real mode, and virtual mode, but it is provided to function as a manager. The SMM is not intended to be used as an application or a system-level feature. It is intended for high-level system functions such as power management and security, which most Pentiums use during operation, but that are controlled by the operating system.

Access to the SMM is accomplished via a new external hardware interrupt applied to the SMI pin on the Pentium. When the SMM interrupt is activated, the processor begins executing system-level software in an area of memory called the system management RAM, or SMMRAM, called the SMM state dump record. The SMI interrupt disables all other interrupts that are normally handled by user applications and the operating system. A return from the SMM interrupt is accomplished with a new instruction called RSM. RSM returns from the memory-management mode interrupt and returns to the interrupted program at the point of the interruption.

The Pentium and Pentium Pro Microprocessors-0473

The SMM interrupt calls the software, initially stored at memory location 38000H, using CS = 3000H and EIP = 8000H. This initial state can be changed using a jump to any location within the first 1M byte of the memory. An environment similar to real mode memory addressing is entered by the management mode interrupt, but it is different because, instead of being able to address the first 1M of memory, SMM mode allows the Pentium to treat the memory system as a flat, 4G-byte system.

In addition to executing software that begins at location 38000H, the SMM interrupt also stores the state of the Pentium in what is called a dump record. The dump record is stored at memory locations 3FFA8H through 3FFFFH, with an area at locations 3FE00H through 3FEF7H that is reserved by Intel. The dump record allows a Pentium-based system to enter a sleep mode and reactivate at the point of program interruption. This requires that the SMMRAM be powered during the sleep period. Many laptop computers have a separate battery to power the SMMRAM for many hours during sleep mode. Table 18–2 lists the contents of the dump record.

The Halt auto restart and I/O trap restarts are used when the SMM mode is exited by the RSM instruction. These data allow the RSM instruction to return to the halt-state or return to the interrupt I/O instruction. If neither a halt nor an I/O operation is in effect upon entering the SMM mode, the RSM instruction reloads the state of the machine from the state dump and returns to the point of interruption.

The SMM mode can be used by the system before the normal operating system is placed in the memory and executed. It can also be used periodically to manage the system, provided that normal software doesn’t exist at locations 38000H–3FFFFH. If the system relocates the SMRAM before booting the normal operating system, it becomes available for use in addition to the normal system.

The base address of the SMM mode SMRAM is changed by modifying the value in the state dump base address register (locations 3FEF8H through 3F3FBH) after the first memory- management mode interrupt. When the first RSM instruction is executed, returning control back to the interrupted system, the new value from these locations changes the base address of the SMM interrupt for all future uses. For example, if the state dump base address is changed to

The Pentium and Pentium Pro Microprocessors-0474

 

THE PENTIUM AND PENTIUM PRO MICROPROCESSORS:SPECIAL PENTIUM REGISTERS.

SPECIAL PENTIUM REGISTERS

The Pentium is essentially the same microprocessor as the 80386 and 80486, except that some additional features and changes to the control register set have occurred. This section highlights the differences between the 80386 control register structure and the flag register.

Control Registers

Figure 18–8 shows the control register structure for the Pentium microprocessor. Note that a new control register, CR4, has been added to the control register array.

This section of the text only explains the new Pentium components in the control registers.

See Figure 17-14 for a description and illustration of the 80386 control registers. Following is a description of the new control bits and new control register CR4:

CD Cache disable controls the internal cache. If CD = 1, the cache will not fill with new data for cache misses, but it will continue to function for cache hits. If CD = 0, misses will cause the cache to fill with new data.

The Pentium and Pentium Pro Microprocessors-0471

NW Not write-through selects the mode of operation for the data cache. If NW = 1, the data cache is inhibited from cache write-through.

AM Alignment mask enables alignment checking when set. Note that alignment checking only occurs for protected mode operation when the user is at privilege level 3.

WP Write protect protects user-level pages against supervisor-level write operations.

When WP = 1, the supervisor can write to user-level segments.

NE Numeric error enables standard numeric coprocessor error detection. If NE = 1, the FERR pin becomes active for a numeric coprocessor error. If NE = 0, any coprocessor error is ignored.

VME Virtual mode extension enables support for the virtual interrupt flag in protected mode. If VME = 0, virtual interrupt support is disabled.

PVI Protected mode virtual interrupt enables support for the virtual interrupt flag in protected mode.

TSD Time stamp disable controls the RDTSC instruction.

DE Debugging extension enables I/O breakpoint debugging extensions when set.

PSE Page size extension enables 4M-byte memory pages when set.

MCE Machine check enable enables the machine checking interrupt.

The Pentium contains new features that are controlled by CR4 and a few bits in CR0. These new features are explained in later sections of the text.

EFLAG Register

The extended flag (EFLAG) register has been changed in the Pentium microprocessor. Figure 18–9 pictures the contents of the EFLAG register. Note that four new flag bits have been added to this register to control or indicate conditions about some of the new features in the Pentium. Following is a list of the four new flags and the function of each:

ID The identification flag is used to test for the CPUID instruction. If a program can set and clear the ID flag, the processor supports the CPUID instruction.

The Pentium and Pentium Pro Microprocessors-0472

VIP Virtual interrupt pending indicates that a virtual interrupt is pending.

VIF Virtual interrupt is the image of the interrupt flag IF used with VIP.

AC Alignment check indicates the state of the AM bit in control register 0.

Built-In Self-Test (BIST)

The built-in self-test (BIST) is accessed on power-up by placing a logic 1 on INIT while the RESET pin changes from 1 to 0. The BIST tests 70% of the internal structure of the Pentium in approximately 150 μs. Upon completion of the BIST, the Pentium reports the outcome in regis- ter EAX. If EAX = 0, the BIST has passed and the Pentium is ready for operation. If EAX con- tains any other value, the Pentium has malfunctioned and is faulty.

 

THE PENTIUM AND PENTIUM PRO MICROPROCESSORS:INTRODUCTION TO THE PENTIUM MICROPROCESSOR.

THE PENTIUM AND PENTIUM PRO MICROPROCESSORS

INTRODUCTION

The Pentium microprocessor signals an improvement to the architecture found in the 80486 microprocessor. The changes include an improved cache structure, a wider data bus width, a faster numeric coprocessor, a dual integer processor, and branch prediction logic. The cache has been reorganized to form two caches that are each 8K bytes in size, one for caching data, and the other for instructions. The data bus width has been increased from 32 bits to 64 bits. The numeric coprocessor operates at approximately five times faster than the 80486 numeric coprocessor. A dual integer processor often allows two instructions per clock. Finally, the branch prediction logic allows programs that branch to execute more efficiently. Notice that these changes are internal to the Pentium, which makes software upward-compatible from earlier Intel 80X86 microprocessors. A later improvement to the Pentium was the addition of the MMX instructions.

The Pentium Pro is a still faster version of the Pentium. It contains a modified internal architecture that can schedule up to five instructions for execution and an even faster floating- point unit. The Pentium Pro also contains a 256K-byte or 512K-byte level 2 cache in addition to the 16K-byte (8K for data and 8K for instruction) level 1 cache. The Pentium Pro includes error correction circuitry (ECC), as described in Chapter 10, to correct a one-bit error and indicate a two-bit error. Also added are four additional address lines, giving the Pentium Pro access to an astounding 64G bytes of directly addressable memory space.

CHAPTER OBJECTIVES

Upon completion of this chapter, you will be able to:

1. Contrast the Pentium and Pentium Pro with the 80386 and 80486 microprocessors.

2. Describe the organization and interface of the 64-bit-wide Pentium memory system and its variations.

3. Contrast the changes in the memory-management unit and paging unit when compared to the 80386 and 80486 microprocessors.

4. Detail the new instructions found with the Pentium microprocessor.

5. Explain how the superscalar dual integer units improve performance of the Pentium microprocessor.

6. Describe the operation of the branch prediction logic.

7. Detail the improvements in the Pentium Pro when compared with the Pentium.

8. Explain how the dynamic execution architecture of the Pentium Pro functions.

INTRODUCTION TO THE PENTIUM MICROPROCESSOR

Before the Pentium or any other microprocessor can be used in a system, the function of each pin must be understood. This section of the chapter details the operation of each pin, along with the external memory system and I/O structures of the Pentium microprocessor.

Figure 18–1 illustrates the pin-out of the Pentium microprocessor, which is packaged in a huge 237-pin PGA (pin grid array). The Pentium was made available in two versions: the full- blown Pentium and the P24T version called the Pentium Over Drive. The P24T version contains a 32-bit data bus, compatible for insertion into 80486 machines, which contains the P24T socket. The P24T version also comes with a fan built into the unit. The most notable difference in the

The Pentium and Pentium Pro Microprocessors-0463

pin-out of the Pentium, when compared to earlier 80486 microprocessors, is that there are 64 data bus connections instead of 32, which require a larger physical footprint.

As with earlier versions of the Intel family of microprocessors, the early versions of the Pentium require a single +5.0 V power supply for operation. The power supply current averages

3.3 A for the 66 MHz version of the Pentium, and 2.91 A for the 60 MHz version. Because these currents are significant, so are the power dissipations of these microprocessors: 13 W for the 66 MHz version and 11.9 W for the 60 MHz version. The current versions of the Pentium, 90 MHz and above, use a 3.3 V power supply with reduced current consumption. At present, a good heat sink with considerable airflow is required to keep the Pentium cool. The Pentium contains multiple VCC and VSS connections that must all be connected to +5.0 V or +3.3 V and ground for proper operation. Some of the pins are labeled N/C (no connection) and must not be connected.

The latest versions of the Pentium have been improved to reduce the power dissipation. For example, the 233 MHz Pentium requires 3.4 A or current, which is only slightly more than the 3.3 A required by the early 66 MHz version.

Each Pentium output pin is capable of providing 4.0 mA of current at a logic 0 level and 2.0 mA at a logic 1 level. This represents an increase in drive current, compared to the 2.0 mA available on earlier 8086, 8088, and 80286 output pins. Each input pin represents a small load requiring only 15 μA of current. In some systems, except the smallest, these current levels require bus buffers.

The function of each Pentium group of pins follows:

A20

The address A20 mask is an input that is asserted in the real mode to signal the Pentium to perform address wraparound, as in the 8086 microprocessor, for use of the HIMEM.SYS driver.

A31–A3 Address bus connections address any of the 5l2K × 64 memory locations found in the Pentium memory system. Note that A0, A1, and A2 are encoded in the bus enable (BE7–BE0), described elsewhere, to select any or all of the eight bytes in a 64-bit-wide memory location.

ADS

The address data strobe becomes active whenever the Pentium has issued a valid memory or I/O address. This signal is combined with the W>R and M>IO signals to generate the separate read and write signals present in the earlier 8086–80286 microprocessor-based systems.

AHOLD Address hold is an input that causes the Pentium to hold the address and AP signals for the next clock.

APCHK

BE7–BE0

BOFF

Address parity provides even parity for the memory address on all Pentium- initiated memory and I/O transfers. The AP pin must also be driven with even parity information on all inquire cycles in the same clocking period as the EADS signal. Address parity check becomes a logic 0 whenever the Pentium detects an address parity error.

Bank enable signals select the access of a byte, word, doubleword, or quadword of data. These signals are generated internally by the microprocessor from address bits A0, A1, and A2.

The back-off input aborts all outstanding bus cycles and floats the Pentium buses until BOFF is negated. After BOFF is negated, the Pentium restarts all aborted bus cycles in their entirety.

BP3–BP0 The breakpoint pins BP3–BP0 indicate a breakpoint match when the debug registers are programmed to monitor for matches.

PM1–PM0 The performance monitoring pins PM1 and PM0 indicate the settings of the performance monitoring bits in the debug mode control register.

BRDY

The burst ready input signals the Pentium that the external system has applied or extracted data from the data bus connections. This signal is used to insert wait states into the Pentium timing.

BREQ The bus request output indicates that the Pentium has generated a bus request.

BT3–BT0 The branch trace outputs provide bits 2–0 of the branch target linear address and the default operand size on BT3. These outputs become valid during a branch trace special message cycle.

BUSCHK

CACHE

The bus check input allows the system to signal the Pentium that the bus transfer has been unsuccessful.

The cache output indicates that the current Pentium cycle can cache data.

CLK The clock is driven by a clock signal that is at the operating frequency of the Pentium. For example, to operate the Pentium at 66 MHz, apply a 66 MHz clock to this pin.

D63–D0 Data bus connections transfer byte, word, doubleword, and quadword data between the microprocessor and its memory and I/O system.

D>C

Data/control indicates that the data bus contains data for or from memory or I/O when a logic 1. If D>C is a logic 0, the microprocessor is either halted or executing an interrupt acknowledge.

DP7–DP0 Data parity is generated by the Pentium and detects its eight memory banks through these connections.

EADS EWBE FERR FLUSH

FRCMC HIT

HITM

The external address strobe input signals that the address bus contains an address for an inquire cycle.

The external write buffer empty input indicates that a write cycle is pending in the external system.

The floating-point error is comparable to the ERROR line in the 80386 and shows that the internal coprocessor has erred.

The flush cache input causes the cache to flush all write-back lines and invalidate its internal caches. If the FLUSH input is a logic 0 during a reset operation, the Pentium enters its test mode.

The functional redundancy check is sampled during a reset to configure the Pentium in the master (1) or checker mode (0).

Hit shows that the internal cache contains valid data in the inquire mode.

Hit modified shows that the inquire cycle found a modified cache line. This output is used to inhibit other master units from accessing data until the cache line is writ- ten to memory.

HOLD Hold requests a DMA action.

HLDA Hold acknowledge indicates that the Pentium is currently in a hold condition.

IBT Instruction branch taken indicates that the Pentium has taken an instruction branch.

IERR IGNNE

The internal error output shows that the Pentium has detected an internal parity error or functional redundancy error.

The ignore numeric error input causes the Pentium to ignore a numeric coprocessor error.

INIT The initialization input performs a reset without initializing the caches, write-back buffers, and floating-point registers. This may not be used to reset the microprocessor in lieu of RESET after power-up.

INTR The interrupt request is used by external circuitry to request an interrupt.

INV The invalidation input determines the cache line state after an inquiry.

IU The U-pipe instruction complete output shows that the instruction in the U-pipe is complete.

IV The V-pipe instruction complete output shows that the instruction in the V-pipe is complete.

KEN LOCK
M>IO
NA

The cache enable input enables internal caching.

LOCK becomes a logic 0 whenever an instruction is prefixed with the LOCK: pre- fix. This is most often used during DMA accesses.

Memory/IO selects a memory device when a logic 1 or an I/O device when a logic

1. During the I/O operation, the address bus contains a 16-bit I/O address on address connections A15–A3.

Next address indicates that the external memory system is ready to accept a new bus cycle.

NMI The non-maskable interrupt requests a non-maskable interrupt, just as on the earlier versions of the microprocessor.

PCD The page cache disable output shows that the internal page caching is disabled by reflecting the state of the CR3 PCD bit.

PCHK

PEN

The parity check output signals a parity check error for data read from memory

or I/O.

The parity enable input enables the machine check interrupt or exception.

PRDY The probe ready output indicates that the probe mode has been entered for debugging.

PWT The page write-through output shows the state of the PWT bit in CR3. This pin is provided for use with the Intel Debugging Port and causes an interrupt.

RESET Reset initializes the Pentium, causing it to begin executing software at memory location FFFFFFF0H. The Pentium is reset to the real mode and the leftmost 12 address connections remain logic 1s (FFFH) until a far jump or far call is executed. This allows compatibility with earlier microprocessors. See Table 18–1 for the state of the Pentium after a hardware reset.

The Pentium and Pentium Pro Microprocessors-0464

SCYC The split cycle output signals a misaligned LOCKed bus cycle.

SMI SMIACT

The system management interrupt input causes the Pentium to enter the system management mode of operation.

The system management interrupt active output shows that the Pentium is operating in the system management mode.

TCK The testability clock input selects the clocking function in accordance to the IEEE 1149.1 Boundary Scan interface.

TDI The test data input is used to test data clocked into the Pentium with the TCK signal.

TDO The test data output is used to gather test data and instructions shifted out of the Pentium with TCK.

TMS The test mode select input controls the operation of the Pentium in test mode. The test reset input allows the test mode to be reset.

W>R WB>WT

Write/read indicates that the current bus cycle is a write when a logic 1 or a read when a logic 0.

Write-back/write-through selects the operation for the Pentium data cache.

The Memory System

The memory system for the Pentium microprocessor is 4G bytes in size, just as in the 80386DX and 80486 microprocessors. The difference lies in the width of the memory data bus. The Pentium uses a 64-bit data bus to address memory organized in eight banks that each contain 512M bytes of data. See Figure 18–2 for the organization of the Pentium physical memory system.

The Pentium memory system is divided into eight banks where each bank stores byte-wide data with a parity bit. The Pentium, like the 80486, employs internal parity generation and checking logic for the memory system’s data bus information. (Note that most Pentium systems do not use parity checks, because ECC is available.) The 64-bit-wide memory is important to double-precision floating-point data. Recall that a double-precision floating-point number is 64 bits wide. Because of the change to a 64-bit-wide data bus, the Pentium is able to retrieve float- ing-point data with one read cycle, instead of two as in the 80486. This causes the Pentium to function at a higher throughput than an 80486. As with earlier 32-bit Intel microprocessors, the memory system is numbered in bytes, from byte 00000000H to byte FFFFFFFFH.

Memory selection is accomplished with the bank enable signals (BE7–BE0). These separate memory banks allow the Pentium to access any single byte, word, doubleword, or quadword with one memory transfer cycle. As with earlier memory selection logic, eight separate write strobes are generated for writing to the memory system.

A new feature added to the Pentium is its capability to check and generate parity for the address bus (A31–A5) during certain operations. The AP pin provides the system with parity

The Pentium and Pentium Pro Microprocessors-0465

information and the APCHK indicates a bad parity check for the address bus. The Pentium takes no action when an address parity error is detected. The error must be assessed by the system and the system must take appropriate action (an interrupt), if so desired.

Input/Output System

The input/output system of the Pentium is completely compatible with earlier Intel microprocessors. The I/O port number appears on address lines A15–A3 with the bank enable signals used to select the actual memory banks used for the I/O transfer.

Beginning with the 80386 microprocessor, I/O privilege information is added to the TSS segment when the Pentium is operated in the protected mode. Recall that this allows I/O ports to be selectively inhibited. If the blocked I/O location is accessed, the Pentium generates a type 13 interrupt to signal an I/O privilege violation.

System Timing

As with any microprocessor, the system timing signals must be understood in order to interface the microprocessor. This portion of the text details the operation of the Pentium through its timing diagrams and shows how to determine memory access times.

The basic Pentium nonpipelined memory cycle consists of two clocking periods: T1 and T2. See Figure 18–3 for the basic nonpipelined read cycle. Notice from the timing diagram that the 66 MHz Pentium is capable of 33 million memory transfers per second. This assumes that the memory can operate at that speed.

Also notice from the timing diagram that the W>R signal becomes valid if ADS is a logic 0 at the positive edge of the clock (end of T1). This clock must be used to qualify the cycle as a read or a write.

During T1, the microprocessor issues the ADS, W>R, address, and M>IO signals. In order to qualify the W>R signal and generate appropriate MRDC and MWTC signals, we use a flip-flop

The Pentium and Pentium Pro Microprocessors-0466The Pentium and Pentium Pro Microprocessors-0467

to generate the W>R signal. Then a two-line-to-one-line multiplexer generates the memory and I/O control signals. See Figure 18–4 for a circuit that generates the memory and I/O control sig- nals for the Pentium microprocessor.

During T2, the data bus is sampled in synchronization with the end of T2 at the positive transition of the clock pulse. The setup time before the clock is given as 3.8 ns, and the hold time after the clock is given as 2.0 ns. This means that the data window around the clock is 5.8 ns. The address appears on the 8.0 ns maximum after the start of T1. This means that the Pentium micro- processor operating at 66 MHz allows 30.3 ns (two clocking periods), minus the address delay time of 8.0 ns and minus the data setup time of 3.8 ns. Memory access time without any wait states is 30.3 – 8.0 – 3.8, or 18.5 ns. This is enough time to allow access to a SRAM, but not to any DRAM without inserting wait states into the timing. The SRAM is normally found in the form of an external level 2 cache.

Wait states are inserted into the timing by controlling the BRDY input to the Pentium. The BRDY signal must become a logic 0 by the end of T2 or additional T2 states are inserted into the timing. See Figure 18–5 for a read cycle timing diagram that contains wait states for slower

The Pentium and Pentium Pro Microprocessors-0468The Pentium and Pentium Pro Microprocessors-0469

memory. The effect of inserting wait states into the timing is to lengthen the timing, allowing additional time for the memory to access data. In the timing shown, the access time has been lengthened so that standard 60 ns DRAM can be used in a system. Note that this requires the insertion of four wait states of 15.2 ns (one clocking period) each to lengthen the access time to 79.5 ns. This is enough time for the DRAM and any decoder in the system to function.

The BRDY signal is a synchronous signal generated by using the system clock. Figure 18–6 illustrates a circuit that can be used to generate BRDY for inserting any number of wait states into the Pentium timing diagram. You may recall a similar circuit inserting wait states into the timing diagram of the 80386 microprocessor. The ADS signal is delayed between 0 and 7 clocking periods by the 74Fl61 shift register to generate the BRDY signal. The exact number of wait states is selected by the 74F151 eight-line-to-one-line multiplexer. In this example, the multiplexer selects the four-wait output from the shift register.

A more efficient method of reading memory data is via the burst cycle. The burst cycle in the Pentium transfers four 64-bit numbers per burst cycle in five clocking periods. A burst with- out wait states requires that the memory system transfers data every 15.2 ns. If a level 2 cache is in place, this speed is no problem as long as the data are read from the cache. If the cache does not contain the data, then wait states must be inserted, which will reduce the data throughput. See Figure 18–7 for the Pentium burst cycle transfer without wait states. As before, wait states can be inserted to allow more time to the memory system for accesses.

The Pentium and Pentium Pro Microprocessors-0470

Branch Prediction Logic

The Pentium microprocessor uses branch prediction logic to reduce the time required for a branch caused by internal delays. These delays are minimized because when a branch instruction (short or near only) is encountered, the microprocessor begins prefetch instruction at the branch address. The instructions are loaded into the instruction cache, so when the branch occurs, the instructions are present and allow the branch to execute in one clocking period. If for any reason the branch prediction logic errs, the branch requires an extra three clocking periods to execute. In most cases, the branch prediction is correct and no delay ensues.

Cache Structure

The cache in the Pentium has been changed from the one found in the 80486 microprocessor. The Pentium contains two 8K-byte cache memories instead of one as in the 80486. There is an 8K-byte data cache and an 8K-byte instruction cache. The instruction cache stores only instructions, while the data cache stores data used by instructions.

In the 80486 with its unified cache, a program that was data-intensive quickly filled the cache, allowing little room for instructions. This slowed the execution speed of the 80486 micro- processor. In the Pentium, this cannot occur because of the separate instruction cache.

Superscalar Architecture

The Pentium microprocessor is organized with three execution units. One executes floating-point instructions, and the other two (U-pipe and V-pipe) execute integer instructions. This means that it is possible to execute three instructions simultaneously. For example, the FADD ST,ST(2) instruction, MOV EAX,10H instruction, and MOV EBX,12H instruction can all execute simultaneously because none of these instructions depend on each other. The FADD ST,ST(2) instruction is executed by the coprocessor; the MOV EAX,10H is executed by the U-pipe; and the MOV EBX,12H instruction is executed by the V-pipe. Because the floating-point unit is also used for MMX instructions, if available, the Pentium can execute two integers and one MMX instruction simultaneously.

Software should be written to take advantage of this feature by looking at the instructions in a program, and then modifying them when cases are discovered in which dependent instructions can be separated by nondependent instructions. These changes can result in up to a 40% execution speed improvement in some software. Make sure that any new compiler or other application package takes advantage of this new superscalar feature of the Pentium.

 

QUESTIONS AND PROBLEMS ON THE 80386 AND 80486 MICROPROCESSORS.

QUESTIONS AND PROBLEMS

1. The 80386 microprocessor addresses bytes of memory in the protected mode.

2. The 80386 microprocessor addresses bytes of virtual memory through the memory-management unit.

3. Describe the differences between the 80386DX and the 80386SX.

4. Draw the memory map of the 80386 when operated in the

(a) protected mode

(b) real mode

5. How much current is available on various 80386 output pin connections? Compare these currents with the currents available at the output pin connection of an 8086 microprocessor.

6. Describe the 80386 memory system, and explain the purpose and operation of the bank selection signals.

7. Explain the action of a hardware reset on the address bus connections of the 80386.

8. Explain how pipelining lengthens the access time for many memory references in the 80386 microprocessor-based system.

9. Briefly describe how the cache memory system functions.

10. I/O ports in the 80386 start at I/O address ____________ and extend to I/O address.

11. What I/O ports communicate data between the 80386 and its companion 80387 coprocessor?

12. Compare and contrast the memory and I/O connections found on the 80386 with those found in earlier microprocessors.

13. If the 80386 operates at 20 MHz, what clocking frequency is applied to the CLK2 pin?

14. What is the purpose of the BS16 pin on the 80386 microprocessor?

15. Define the purpose of each of the control registers (CR0, CR1, CR2, and CR3) found within the 80386.

16. Define the purpose of each 80386 debug register.

17. The debug registers cause which level of interrupt?

18. What are the test registers?

19. Select an instruction that copies control register 0 into EAX.

20. Describe the purpose of PE in CR0.

21. Form an instruction that accesses data in the FS segment at the location indirectly addressed by

the DI register. The instruction should store the contents of EAX into this memory location.

22. What is scaled index addressing?

23. Is the following instruction legal? MOV AX,[EBX+ECX]

24. Explain how the following instructions calculate the memory address:

(a) ADD [EBX+8*ECX],AL

(b) MOV DATA[EAX+EBX],CX

(c) SUB EAX,DATA

(d) MOV ECX,[EBX]

25. What is the purpose of interrupt type number 7?

26. Which interrupt vector type number is activated for a protection privilege violation?

27. What is a double interrupt fault?

28. If an interrupt occurs in the protected mode, what defines the interrupt vectors?

29. What is a descriptor?

30. What is a selector?

31. How does the selector choose the local descriptor table?

32. What register is used to address the global descriptor table?

33. How many global descriptors can be stored in the GDT?

34. Explain how the 80386 can address a virtual memory space of 64T bytes when the physical memory contains only 4G bytes of memory.

35. What is the difference between a segment descriptor and a system descriptor?

36. What is the task state segment (TSS)?

37. How is the TSS addressed?

38. Describe how the 80386 switches from the real mode to the protected mode.

39. Describe how the 80386 switches from the protected mode to the real mode.

40. What is virtual 8086 mode operation of the 80386 microprocessor?

41. How is the paging directory located by the 80386?

42. How many bytes are found in a page of memory?

43. Explain how linear memory address D0000000H can be assigned to physical memory address C0000000H with the paging unit of the 80386.

44. What are the differences between an 80386 and 80486 microprocessor?

45. What is the purpose of the FLUSH input pin on the 80486 microprocessor?

46. Compare the register set of the 80386 with the 80486 microprocessor.

47. What differences exist in the flags of the 80486 when compared to the 80386 microprocessor?

48. Which pins are used for parity checking on the 80486 microprocessor?

49. The 80486 microprocessor uses parity.

50. The cache inside the 80486 microprocessor is -K bytes.

51. A cache line is filled by reading -bytes from the memory system.

52. What is an 80486 burst?

53. Define the term cache write-through.

54. What is a BIST?

 

SUMMARY OF THE 80386 AND 80486 MICROPROCESSORS.

SUMMARY

1. The 80386 microprocessor is an enhanced version of the 80286 microprocessor and includes a memory-management unit that is enhanced to provide memory paging. The 80386 also includes 32-bit extended registers, and a 32-bit address and data bus. A scaled-down version of the 80386DX with a 16-bit data and 24-bit address bus is available as the 80386SX micro- processor. The 80386EX is a complete AT-style personal computer on a chip.

2. The 80386 has a physical memory size of 40 bytes that can be addressed as a virtual memory with up to 64T bytes. The 80386 memory is 32 bits wide, and it is addressed as bytes, words, or double words.

3. When the 80386 is operated in the pipelined mode, it sends the address of the next instruction or memory data to the memory system prior to completing the execution of the current instruction. This allows the memory system to begin fetching the next instruction or data before the current is completed. This increases access time, thus reducing the speed of the memory.

4. A cache memory system allows data that are frequently read to be accessed in less time because they are stored in high-speed semiconductor memory. If data are written to memory, they are also written to the cache, so the most current data are always present in the cache.

5. The I/O structure of the 80386 is almost identical to the 80286, except that I/O can be inhibited when the 80386 is operated in the protected mode through the I/O bit protection map stored with the TSS.

6. In the 80386 microprocessor, interrupts have been expanded to include additional predefined interrupts in the interrupt vector table. These additional interrupts are used with the memory- management system.

7. The 80386 memory manager is similar to the 80286, except that the physical addresses generated by the MMU are 32 bits wide instead of 24 bits wide. The 80386 MMU is also capable of paging.

8. The 80386 is operated in the real mode (8086 mode) when it is reset. The real mode allows the microprocessor to address data in the first 1M byte of memory. In the protected mode, the 80386 addresses any location in its 4G bytes of physical address space.

9. A descriptor is a series of eight bytes that specifies how a code or data segment is used by the 80386. The descriptor is selected by a selector that is stored in one of the segment registers. Descriptors are used only in the protected mode.

10. Memory management is accomplished through a series of descriptors, stored in descriptor tables. To facilitate memory management, the 80386 uses three descriptor tables: the global descriptor table (GDT), the local descriptor table (LDT), and the interrupt descriptor table (IDT). The GDT and LDT each hold up to 8192 descriptors; the IDT holds up to 256 descriptors. The GDT and LDT describe code and data segments as well as tasks. The IDT describes the 256 different interrupt levels through interrupt gate descriptors.

11. The TSS (task state segment) contains information about the current task and the previous task. Appended to the end of the TSS is an I/O bit protection map that inhibits selected I/O port addresses.

12. The memory paging mechanism allows any 4K-byte physical memory page to be mapped to any 4K-byte linear memory page. For example, memory location 00A00000H can be assigned memory location A0000000H through the paging mechanism. A page directory and page tables are used to assign any physical address to any linear address. The paging mechanism can be used in the protected mode or the virtual mode.

13. The 80486 microprocessor is an improved version of the 80386 microprocessor that contains an 8K-byte cache and an 80387 arithmetic coprocessor; it executes many instructions in one clocking period.

14. The 80486 microprocessor executes a few new instructions that control the internal cache memory and allow addition (XADD) and comparison (CMPXCHG) with an exchange and a byte swap (BSWAP) operation. Other than these few additional instructions, the 80486 is 100% upward-compatible with the 80386 and 80387.

15. A new feature found in the 80486 is the BIST (built-in self-test) that tests the microprocessor, coprocessor, and cache at reset time. If the 80486 passes the test, EAX contains a zero.

 

THE 80386 AND 80486 MICROPROCESSORS:THE MEMORY PAGING MECHANISM.

THE MEMORY PAGING MECHANISM

The paging mechanism allows any linear (logical) address, as it is generated by a program, to be placed into any physical memory page, as generated by the paging mechanism. A linear memory page is a page that is addressed with a selector and an offset in either the real or protected mode.

A physical memory page is a page that exists at some actual physical memory location. For example, linear memory location 20000H could be mapped into physical memory location 30000H, or any other location, with the paging unit. This means that an instruction that accesses location 20000H actually accesses location 30000H.

Each 80386 memory page is 4K bytes long. Paging allows the system software to be placed at any physical address with the paging mechanism. Three components are used in page address translation: the page directory, the page table, and the actual physical memory page. Note that EEM386.EXE, the extended memory manager, uses the paging mechanism to simulate expanded memory in extended memory and to generate upper memory blocks between system ROMs.

The Page Directory

The page directory contains the location of up to 1024 page translation tables. Each page translation table translates a logic address into a physical address. The page directory is stored in the memory and accessed by the page descriptor address register (CR3) (see Figure 17–14). Control register CR3 holds the base address of the page directory, which starts at any 4K-byte boundary in the memory system. The MOV CR3,reg instruction is used to initialize CR3 for paging. In a virtual 8086 mode system, each 8086 DOS partition would have its own page directory.

The page directory contains up to 1024 entries, which are each four bytes long. The page directory itself occupies one 4K-byte memory page. Each entry in the page directory (see Figure 17–26) translates the leftmost 10 bits of the memory address. This 10-bit portion of the linear address is used to locate different page tables for different page table entries. The page table address (A32–A12), stored in a page directory entry, accesses a 4K-byte-long page translation table. To completely translate any linear address into any physical address requires 1024 page tables that are each 4K bytes long, plus the page table directory, which is also 4K bytes long. This translation scheme requires up to 4M plus 4K bytes of memory for a full address translation. Only the largest operating systems support this size address translation. Many commonly found operating systems translate only the first 16M bytes of the memory system if paging is enabled. This includes programs such as Windows. This translation requires four entries in the page directory (16 bytes) and four complete page tables (16K bytes).

The page table directory entry control bits, as illustrated in Figure 17–26, each perform the following functions:

D Dirty is undefined for page table directory entries by the 80386 microprocessor and is provided for use by the operating system.

A Accessed is set to a logic 1 whenever the microprocessor accesses the page directory entry.

R/W and Read/write and user/supervisor are both used in the protection scheme, as

U/S

listed in Table 17–2. Both bits combine to develop paging priority level protec- tion for level 3, the lowest user level.

P Present, if a logic 1, indicates that the entry can be used in address translation.

If P = 0, the entry cannot be used for translation. A not present entry can be used for other purposes, such as indicating that the page is currently stored on the disk. If P = 0, the remaining bits of the entry can be used to indicate the location of the page on the disk memory system.

The 80186, 80188, and 80286 Microprocessors-0448

The 80186, 80188, and 80286 Microprocessors-0449

The Page Table

The page table contains 1024 physical page addresses, accessed to translate a linear address into a physical address. Each page table translates a 4M section of the linear memory into 4M of physical memory. The format for the page table entry is the same as for the page directory entry (refer to Figure 17–26). The main difference is that the page directory entry contains the physical address of a page table, while the page table entry contains the physical address of a 4K-byte physical page of memory. The other difference is the D (dirty bit), which has no function in the page directory entry, but indicates that a page has been written to in a page table entry.

Figure 17–27 illustrates the paging mechanism in the 80386 microprocessor. Here, the linear address 00C03FFCH, as generated by a program, is converted to physical address XXXXXF-

The 80186, 80188, and 80286 Microprocessors-0450

FCH, as translated by the paging mechanism. (Note: XXXXX is any 4K-byte physical page address.) The paging mechanism functions in the following manner:

1. The 4K-byte long page directory is stored as the physical address located by CR3. This address is often called the root address. One page directory exists in a system at a time. In the 8086 virtual mode, each task has its own page directory, allowing different areas of phys- ical memory to be assigned to different 8086 virtual tasks.

2. The upper 10 bits of the linear address (bits 31–22), as determined by the descriptors described earlier in this chapter or by a real address, are applied to the paging mechanism to select an entry in the page directory. This maps the page directory entry to the leftmost 10 bits of the linear address.

3. The page table is addressed by the entry stored in the page directory. This allows up to 4K page tables in a fully populated and translated system.

4. An entry in the page table is addressed by the next 10 bits of the linear address (bits 21–12).

5. The page table entry contains the actual physical address of the 4K-byte memory page.

6. The rightmost 12 bits of the linear address (bits 11–0) select a location in the memory page.

The paging mechanism allows the physical memory to be assigned to any linear address through the paging mechanism. For example, suppose that linear address 20000000H is selected by a program, but this memory location does not exist in the physical memory system. The 4K- byte linear page is referenced as locations 20000000H–20000FFFH by the program. Because this section of physical memory does not exist, the operating system might assign an existing physical memory page such as 12000000H–12000FFFH to this linear address range.

In the address translation process, the leftmost 10 bits of the linear address select page directory entry 200H located at offset address 800H in the page directory. This page directory entry contains the address of the page table for linear addresses 20000000H–203FFFFFH. Linear address bits (21–12) select an entry in this page table that corresponds to a 4K-byte memory page. For linear addresses 20000000H–20000FFFH, the first entry (entry 0) in the page table is selected. This first entry contains the physical address of the actual memory page, or 12000000H–12000FFFH in this example.

Take, for example, a typical DOS-based computer system. The memory map for the sys- tem appears in Figure 17–28. Note from the map that there are unused areas of memory, which can be paged to a different location, giving a DOS real mode application program more memory. The normal DOS memory system begins at location 00000H and extends to location 9FFFFH, which is 640K bytes of memory. Above location 9FFFFH, we find sections devoted to video cards, disk cards, and the system BIOS ROM. In this example, an area of memory just above 9FFFFH is unused (A0000–AFFFFH). This section of the memory could be used by DOS, so that the total application-memory area is 704K instead of 640K. Be careful when using A0000H–AFFFFH for additional RAM because the video card uses this area for bit-mapped graphics in mode 12H and 13H.

This section of memory can be used by mapping it into extended memory at locations 1002000H–11FFFFH. Software to accomplish this translation and initialize the page table directory, and page tables required to set up memory, are illustrated in Example 17–4. Note that this procedure initializes the page table directory, a page table, and loads CR3. It does not switch to protected mode and it does enable paging. Note that paging functions in real mode memory operation.

The 80186, 80188, and 80286 Microprocessors-0451The 80186, 80188, and 80286 Microprocessors-0452The 80186, 80188, and 80286 Microprocessors-0453

 

THE 80386 AND 80486 MICROPROCESSORS:VIRTUAL 8086 MODE.

VIRTUAL 8086 MODE

One special mode of operation not discussed thus far is the virtual 8086 mode. This special mode is designed so that multiple 8086 real-mode software applications can execute at one time. The PC operates in this mode for DOS applications using the DOS emulator cmd.exe (the command prompt). Figure 17–25 illustrates two 8086 applications mapped into the 80386 using the virtual mode. The operating system allows multiple applications to execute, usually done through a technique called time-slicing. The operating system allocates a set amount of time to each task. For example, if three tasks are executing, the operating system can allocate 1 ms to each task. This means that after each millisecond, a task switch occurs to the next task. In this manner, all tasks receive a portion of the microprocessor’s execution time, resulting in a system that appears to execute more than one task at a time. The task times can be adjusted to give any task any percentage of the microprocessor’s execution time.

The 80186, 80188, and 80286 Microprocessors-0447

A system that can use this technique is a print spooler. The print spooler can function in one DOS partition and be accessed 10% of the time. This allows the system to print using the print spooler, but it doesn’t detract from the system because it uses only 10% of the system time.

The main difference between 80386 protected mode operation and the virtual 8086 mode is the way the segment registers are interpreted by the microprocessor. In the virtual 8086 mode, the segment registers are used as they are in the real mode: as a segment address and an offset address capable of accessing a 1M-byte memory space from locations 00000H–FFFFFH. Access to many virtual 8086 mode systems is made possible by the paging unit that is explained in the next section. Through paging, the program still accesses memory below the 1M-byte boundary, yet the microprocessor can access a physical memory space at any location in the 4G-byte range of the memory system.

Virtual 8086 mode is entered by changing the VM bit in the EFLAG register to a logic 1. This mode is entered via an IRET instruction if the privilege level is 00. This bit cannot be set in any other manner. An attempt to access a memory address above the 1M-byte boundary will cause a type 13 interrupt to occur.

The virtual 8086 mode can be used to share one microprocessor with many users by partitioning the memory so that each user has its own DOS partition. User 1 can be allocated memory locations 00100000H–01FFFFFH, user 2 can be allocated locations 0020000H–01FFFFFFH, and so forth. The system software located at memory locations 00000000H–000FFFFFH can then share the microprocessor between users by switching from one to another to execute soft- ware. In this manner, one microprocessor is shared by many users.

 

THE 80386 AND 80486 MICROPROCESSORS:MOVING TO PROTECTED MODE.

MOVING TO PROTECTED MODE

In order to change the operation of the 80386 from the real mode to the protected mode, several steps must be followed. Real mode operation is accessed after a hardware reset or by changing the PE bit to a logic 0 in CR0. Protected mode is accessed by placing a logic 1 into the PE bit of CR0; before this is done, however, some other things must be initialized. The following steps accomplish the switch from the real mode to the protected mode:

1. Initialize the interrupt descriptor table so that it contains valid interrupt gates for at least the first 32 interrupt type numbers. The IDT may (and often does) contain up to 256 eight-byte interrupt gates defining all 256 interrupt types.

2. Initialize the global descriptor table (GDT) so that it contains a null descriptor at descriptor 0 and valid descriptors for at least one code, one stack, and one data segment.

3. Switch to protected mode by setting the PE bit in CR0.

4. Perform an intersegment (far) JMP to flush the internal instruction queue.

5. Load all the data selectors (segment registers) with their initial selector values.

6. The 80386 is now operating in the protected mode, using the segment descriptors that are defined in GDT and IDT.

Figure 17–24 shows the protected system memory map set up by following steps 1–6. The software for this task is listed in Example 17–1. This system contains one data segment descriptor and one code segment descriptor with each segment set to 4G bytes in length. This is the sim- plest protected mode system possible (called the flat model): loading all the segment registers, except code, with the same data segment descriptor from the GDT. The privilege level is initial- ized to 00, the highest level. This system is most often used where one user has access to the microprocessor and requires the entire memory space. This program is designed for use in a sys- tem that does not use DOS or does not shell from Windows to DOS. Later in this section, we show how to go to protected mode in a DOS environment. (Please note that the software in Example 17–1 is designed for a stand-alone system such as the 80386EX embedded micro- processor, and not for use in the PC.)

Example 17–1 does not store any interrupt vectors into the interrupt descriptor table, because none are used in the example. If interrupt vectors are used, then software must be included to load the addresses of the interrupt service procedures into the IDT. The software must be generated as two separate parts that are then connected together and burned on a ROM. The first part is written as shown in the real mode and the second part (see comment in listing) with the assembler set to generate protected mode code using the 32-bit flat model. This software will not function on a personal computer because it is written to function on an embedded sys- tem. The code must be converted to a binary file using EXE2BIN after it is assembled and before burning to a ROM.

The 80186, 80188, and 80286 Microprocessors-0436The 80186, 80188, and 80286 Microprocessors-0437

In more complex systems (very unlikely to appear in embedded systems), the steps required to initialize the system in the protected mode are more involved. For complex systems that are often multiuser systems, the registers are loaded by using the task state segment (TSS). The steps required to place the 80386 into protected mode operation for a more complex system using a task switch follow:

1. Initialize the interrupt descriptor table so that it refers to valid interrupt descriptors with at least 32 descriptors in the IDT.

2. Initialize the global descriptor table so that it contains a task state segment (TSS) descriptor, and the initial code and data segments required for the initial task.

3. Initialize the task register (TR) so that it points to a TSS.

4. Switch to protected mode by using an intersegment (far) jump to flush the internal instruction queue. This loads the TR with the current TSS selector and initial task.

5. The 80386 is now operating in the protected mode under control of the first task.

Example 17–2 illustrates the software required to initialize the system and switch to protected mode by using a task switch. The initial system task operates at the highest level of protection (00) and controls the entire operating environment for the 80386. In many cases, it is used to boot (load) software that allows many users to access the system in a multiuser environment. As with Example 17–2 this software will not function on a personal computer and is designed to function only on an embedded system.

The 80186, 80188, and 80286 Microprocessors-0438The 80186, 80188, and 80286 Microprocessors-0439

Neither Example 17–1 nor Example 17–2 is written to function in the personal computer environment. The personal computer environment requires the use of either the VCPI (virtual control program interface) driver provided by the HIMEM.SYS driver in DOS or the DPMI (DOS protected mode interface) driver provided by Windows when shelling to DOS. Example 17–3 shows how to switch to protected mode using DPMI and then display the contents of any area of memory. This includes memory in the extended memory area or anywhere else. This DOS application allows the contents of any memory location to be displayed in hexadecimal for- mat on the monitor, including locations above the first 1M byte of the memory system.

The 80186, 80188, and 80286 Microprocessors-0440The 80186, 80188, and 80286 Microprocessors-0441The 80186, 80188, and 80286 Microprocessors-0442The 80186, 80188, and 80286 Microprocessors-0443The 80186, 80188, and 80286 Microprocessors-0444The 80186, 80188, and 80286 Microprocessors-0445The 80186, 80188, and 80286 Microprocessors-0446

You might notice that the DOS INT 21H function call must be treated differently when operating in the protected mode. The procedure that calls a DOS INT 21H is at the end of Example 17–3. Because this is extremely long and time consuming, we have tended to move away from using the DOS interrupts from a Windows application. The best way to develop software for Windows is through the use of C/C++ with the inclusion of assembly language procedures for arduous tasks

 

THE 80386 AND 80486 MICROPROCESSORS:80386 MEMORY MANAGEMENT.

80386 MEMORY MANAGEMENT

The memory-management unit (MMU) within the 80386 is similar to the MMU inside the 80286, except that the 80386 contains a paging unit not found in the 80286. The MMU performs the task of converting linear addresses, as they appear as outputs from a program, into physical addresses that access a physical memory location located anywhere within the memory system. The 80386 uses the paging mechanism to allocate any physical address to any logical address. Therefore, even though the program is accessing memory location A0000H with an instruction, the actual physical address could be memory location 100000H, or any other location if paging is enabled. This feature allows virtually any software, written to operate at any memory location, to function in an 80386 because any linear location can become any physical location. Earlier Intel microprocessors did not have this flexibility. Paging is used with DOS to relocate 80386 and 80486 memory at addresses above FFFFFH and into spaces between ROMs at locations D0000–DFFFFH and other areas as they are available. The area between ROMs is often referred to as upper memory; the area above FFFFFH is referred to as extended memory.

Descriptors and Selectors

Before the memory paging unit is discussed, we examine the descriptor and selector for the 80386 microprocessor. The 80386 uses descriptors in much the same fashion as the 80286. In both microprocessors, a descriptor is a series of eight bytes that describes and locates a memory segment. A selector (segment register) is used to index a descriptor from a table of descriptors. The main difference between the 80286 and 80386 is that the latter has two additional selectors (FS and GS) and the most significant two bytes of the descriptor are defined for the 80386. Another difference is that 80386 descriptors use a 32-bit base address and a 20-bit limit, instead of the 24-bit base address and a 16-bit limit found on the 80286.

The 80286 addresses a 16M-byte memory space with its 24-bit base address and has a segment length limit of 64K bytes, due to the 16-bit limit. The 80386 addresses a 4G-byte memory space with its 32-bit base address and has a segment length limit of 1M byte or 4G bytes, due to a 20-bit limit that is used in two different ways. The 20-bit limit can access a segment with a length of 1M byte if the granularity bit (G) = 0. If G = 1, the 20-bit limit allows a segment length of 4G bytes.

The granularity bit is found in the 80386 descriptor. If G = 0, the number stored in the limit is interpreted directly as a limit, allowing it to contain any limit between 00000H and FFFFFH for a segment size up to 1M byte. If G = 1, the number stored in the limit is interpreted as 00000XXXH–FFFFFXXXH, where the XXX is any value between 000H and FFFH. This allows the limit of the segment to range between 0 bytes to 4G bytes in steps of 4K bytes. A limit of 00001H indicates that the limit is 4K bytes when G = 1 and 1 byte when G = 0. An example is a segment that begins at physical address 10000000H. If the limit is 00001H and G = 0, this seg- ment begins at 10000000H and ends at 10000001H. If G = 1 with the same limit (00001H), the segment begins at location 10000000H and ends at location 10001FFFH.

Figure 17–16 shows how the 80386 addresses a memory segment in the protected mode using a selector and a descriptor. Note that this is identical to the way that a segment is addressed by the 80286. The difference is the size of the segment accessed by the 80386. The selector uses its leftmost 13 bits to select a descriptor from a descriptor table. The TI bit indicates either the local (TI = 1) or global (TI = 0) descriptor table. The rightmost two bits of the selector define the requested privilege level of the access.

Because the selector uses a 13-bit code to access a descriptor, there are at most 8192 descriptors in each table—local or global. Because each segment (in an 80386) can be 4G bytes in length, 16,384 segments can be accessed at a time with the two descriptor tables. This allows the 80386 to access a virtual memory size of 64T bytes. Of course, only 4G bytes of memory actually exist in the memory system (1T byte = 1024G bytes). If a program requires more than

The 80186, 80188, and 80286 Microprocessors-0428

4G bytes of memory at a time, it can be swapped between the memory system and a disk drive or other form of large volume storage.

The 80386 uses descriptor tables for both global (GDT) and local (LDT) descriptors. A third descriptor table appears for interrupt (IDT) descriptors or gates. The first six bytes of the descriptor are the same as in the 80286, which allows 80286 software to be upward compatible with the 80386. (An 80286 descriptor used 00H for its most significant two bytes.) See Figure 17–17 for the 80286 and 80386 descriptor. The base address is 32 bits in the 80386, the limit is 20 bits, and a G bit selects the limit multiplier (1 or 4K times). The fields in the descriptor for the 80386 are defined as follows:

Limit (L19–L0) Defines the starting 32-bit address of the segment within the 4G-byte physical address space of the 80386 microprocessor. Defines the limit of the segment in units of bytes if the G bit = 0, or in units of 4K bytes if

G = 1. This allows a segment to be of any length from 1 byte to 1M byte if G = 0, and from 4K bytes to 4G bytes if G = 1. Recall that the limit indicates the last byte in a segment.

Access Rights Determines privilege level and other information about the segment. This byte varies with different types of descriptors and is elaborated with each descriptor type.

G The granularity bit selects a multiplier of 1 or 4K times for the limit field. If G = 0, the multiplier is 1; if G = 1, the multiplier is 4K.

D Selects the default instruction mode. If D = 0, the registers and memory pointers are 16 bits wide, as in the 80286; if D = 1, they are 32 bits wide, as in the 80386. This bit determines whether prefixes are required for

32-bit data and index registers. If D = 0, then a prefix is required to access 32-bit registers and to use 32-bit pointers. If D = 1, then a prefix is required to access 16-bit registers and 16-bit pointers. The USE16 and USE32 directives appended to the SEGMENT statement in assembly language control the setting of the D bit. In the real mode, it is always

The 80186, 80188, and 80286 Microprocessors-0429The 80186, 80188, and 80286 Microprocessors-0430

assumed that the registers are 16 bits wide, so any instruction that references a 32-bit register or pointer must be prefixed. The current version of DOS assumes D = 0, and most Windows programs assume D = 1.

AVL This bit is available to the operating system to use in any way that it sees fit. It often indicates that the segment described by the descriptor is avail- able.

Descriptors appear in two forms in the 80386 microprocessor: the segment descriptor and the system descriptor. The segment descriptor defines data, stack, and code segments; the system descriptor defines information about the system’s tables, tasks, and gates.

Segment Descriptors. Figure 17–18 shows the segment descriptor. This descriptor fits the gen- eral form, as dictated in Figure 17–17, but the access rights bits are defined to indicate how the data, stack, or code segment described by the descriptor functions. Bit position 4 of the access rights byte determines whether the descriptor is a data or code segment descriptor (S = 1) or a system segment descriptor (S = 0). Note that the labels used for these bits may vary in different versions of Intel literature, but they perform the same tasks.

Following is a description of the access rights bits and their function in the segment descriptor:

P Present is a logic 1 to indicate that the segment is present. If P = 0 and the segment is accessed through the descriptor, a type 11 interrupt occurs. This interrupt indicates that a segment was accessed that is not present in the system.

DPL Descriptor privilege level sets the privilege level of the descriptor; 00 has the highest privilege and 11 has the lowest. This is used to protect access to segments. If a segment is accessed with a privilege level that is lower (higher in number) than the DPL, a privilege violation interrupt occurs. Privilege levels are used in multiuser systems to prevent access to an area of the system memory.

S Segment indicates a data or code segment descriptor (S = 1), or a system segment descriptor (S = 0).

E Executable selects a data (stack) segment (E = 0) or a code segment (E = 1). E also defines the function of the next two bits (X and RW).

W If E = 0, then X indicates the direction of expansion for the data segment. If X = 0, the segment expands upward, as in a data segment; if X = 1, the segment expands downward as in a stack segment. If E = 1, then X indicates whether the privilege level of the code segment is ignored (X = 0) or observed (X = 1).

RW If E = 0, then the read/write bit (RW) indicates that the data segment may be writ- ten (RW = 1) or not written (RW = 0). If E = 1, then RW indicates that the code segment may be read (RW = 1) or not read (RW = 0).

A Accessed is set each time that the microprocessor accesses the segment. It is some- times used by the operating system to keep track of which segments have been accessed.

The 80186, 80188, and 80286 Microprocessors-0431

System Descriptor. The system descriptor is illustrated in Figure 17–19. There are 16 possible system descriptor types (see Table 17–1 for the different descriptor types), but not all are used in the 80386 microprocessor. Some of these types are defined for the 80286 so that the 80286 soft- ware is compatible with the 80386. Some of the types are new and unique to the 80386; some have yet to be defined and are reserved for future Intel products.

Descriptor Tables

The descriptor tables define all the segments used in the 80386 when it operates in the protected mode. There are three types of descriptor tables: the global descriptor table (GDT), the local descriptor table (LDT), and the interrupt descriptor table (IDT). The registers used by the 80386 to address these three tables are called the global descriptor table register (GDTR), the local descriptor table register (LDTR), and the interrupt descriptor table register (IDTR). These registers are loaded with the LGDT, LLDT, and LIDT instructions, respectively.

The descriptor table is a variable-length array of data, with each entry holding an 8-byte long descriptor. The local and global descriptor tables hold up to 8192 entries each, and the interrupt descriptor table holds up to 256 entries. A descriptor is indexed from either the local or global descriptor table by the selector that appears in a segment register. Figure 17–20 shows a segment register and the selector that it holds in the protected mode. The leftmost 13 bits index a descriptor; the TI bit selects either the local (TI = 1) or global (TI = 1) descriptor table; and the RPL bits indicate the requested privilege level.

Whenever a new selector is placed into one of the segment registers, the 80386 accesses one of the descriptor tables and automatically loads the descriptor into a program-invisible cache portion of the segment register. As long as the selector remains the same in the segment register, no additional accesses are required to the descriptor table. The operation of fetching a new descriptor

The 80186, 80188, and 80286 Microprocessors-0432

from the descriptor table is program-invisible because the microprocessor automatically accomplishes this each time that the segment register contents are changed in the protected mode.

Figure 17–21 shows how a sample global descriptor table (GDT), which is stored at memory address 00010000H, is accessed through the segment register and its selector. This table contains four entries. The first is a null (0) descriptor. Descriptor 0 must always be a null descriptor. The other entries address various segments in the 80386 protected mode memory system. In this illustration, the data segment register contains 0008H. This means that the selector is indexing descriptor location 1 in the global descriptor table (TI = 0), with a requested privilege level of 00. Descriptor 1 is located eight bytes above the base descriptor table address, beginning at location 00010008H. The descriptor located in this memory location accesses a base address of 00200000H and a limit of 100H. This means that this descriptor addresses memory locations 00200000H–00200100H. Because this is the DS (data segment) register, the data segment is located at these locations in the memory system. If data are accessed outside of these boundaries, an interrupt occurs.

The local descriptor table (LDT) is accessed in the same manner as the global descriptor table (GDT). The only difference in access is that the TI bit is cleared for a global access and set for a local access. Another difference exists if the local and global descriptor table registers are examined. The global descriptor table register (GDTR) contains the base address of the global

The 80186, 80188, and 80286 Microprocessors-0433The 80186, 80188, and 80286 Microprocessors-0434

descriptor table and the limit. The local descriptor table register (LDTR) contains only a selector, and it is 16 bits wide. The contents of the LDTR addresses a type 0010 system descriptor that contains the base address and limit of the LDT. This scheme allows one global table for all tasks, but allows many local tables, one or more for each task, if necessary. Global descriptors describe memory for the system, while local descriptors describe memory for applications or tasks.

Like the GDT, the interrupt descriptor table (IDT) is addressed by storing the base address and limit in the interrupt descriptor table register (IDTR). The main difference between the GDT and IDT is that the IDT contains only interrupt gates. The GDT and LDT contain segment and system descriptors, but never contain interrupt gates.

Figure 17–22 shows the gate descriptor, a special form of the system descriptor described earlier. (Refer to Table 17–1 for the different gate descriptor types.) Notice that the gate descriptor contains a 32-bit offset address, a word count, and a selector. The 32-bit offset address points to the location of the interrupt service procedure or other procedure. The word count indicates how many words are transferred from the caller’s stack to the stack of the procedure accessed by a call gate. This feature of transferring data from the caller’s stack is useful for implementing high-level languages such as C/C++. Note that the word count field is not used with an interrupt gate. The selector is used to indicate the location of task state segment (TSS) in the GDT or LDT if it is a local procedure.

When a gate is accessed, the contents of the selector are loaded into the task register (TR), causing a task switch. The acceptance of the gate depends on the privilege and priority levels. A return instruction (RET) ends a call gate procedure and a return from interrupt instruction (IRET) ends an interrupt gate procedure. Tasks are usually accessed with a CALL or an INT instruction, where the call instruction addresses a call gate in the descriptor table and the interrupt addresses an interrupt descriptor.

The difference between real mode interrupts and protected mode interrupts is that the interrupt vector table is an IDT in the protected mode. The IDT still contains up to 256 interrupt levels, but each level is accessed through an interrupt gate instead of an interrupt vector. Thus, interrupt type number 2 (INT 2) is located at IDT descriptor number 2 at 16 locations above the base address of the IDT. This also means that the first IK byte of memory no longer contains interrupt vectors, as it did in the real mode. The IDT can be located at any location in the memory system.

The Task State Segment (TSS)

The task state segment (TSS) descriptor contains information about the location, size, and privilege level of the task state segment, just like any other descriptor. The difference is that the TSS described by the TSS descriptor does not contain data or code. It contains the state of the task and linkage so tasks can be nested (one task can call a second, which can call a third, and so forth). The TSS descriptor is addressed by the task register (TR). The contents of the TR are changed by the LTR instruction. Whenever the protected mode program executes a JMP or CALL instruction, the contents of TR are also changed. The LTR instruction is used to initially access a task during system initialization. After initialization, the CALL or JUMP instructions normally switch tasks. In most cases, we use the CALL instruction to initiate a new task.

The TSS is illustrated in Figure 17–23. As can be seen, the TSS is quite a formidable data structure, containing many different types of information. The first word of the TSS is 

labeled back-link. This is the selector that is used, on a return (RET or IRET), to link back to the prior TSS by loading the back-link selector into the TR. The following word must contain a 0. The second through the seventh doublewords contain the ESP and ESS values for privilege levels 0–2. These are required in case the current task is interrupted so these privilege level

The 80186, 80188, and 80286 Microprocessors-0435

(PL) stacks can be addressed. The eighth word (offset 1CH) contains the contents of CR3, which stores the base address of the prior state’s page directory register. This must be restored if paging is in effect. The contents of the next 17 doublewords are loaded into the registers indicated. Whenever a task is accessed, the entire state of the machine (all of the registers) is stored in these memory locations and then reloaded from the same locations in the new TSS. The last word (offset 66H) contains the I/O permission bit map base address.

The I/O permission bit map allows the TSS to block I/O operations to inhibited I/O port addresses via an I/O permission denial interrupt. The permission denial interrupt is type number 13, the general protection fault interrupt. The I/O permission bit map base address is the offset address from the start of the TSS. This allows the same permission map to be used by many TSSs.

Each I/O permission bit map is 64K bits long (8K bytes), beginning at the offset address indicated by the I/O permission bit map base address. The first byte of the I/O permission bit map contains I/O permission for I/O ports 0000H–0007H. The rightmost bit contains the per- mission for port number 0000H. The leftmost bit contains the permission for port number 0007H. This sequence continues for the very last port address (FFFFH) stored in the leftmost bit of the last byte of the I/O permission bit map. A logic 0 placed in an I/O permission bit map bit enables the I/O port address, while a logic 1 inhibits or blocks the I/O port address. At present, only Windows NT, Windows 2000, and Windows XP uses the I/O permission scheme to disable I/O ports dependent on the application or the user.

To review the operation of a task switch, which requires only 17 μs to execute on an 80386 microprocessor, we list the following steps:

1. The gate contains the address of the procedure or location jumped to by the task switch. It also contains the selector number of the TSS descriptor and the number of words transferred from the caller to the user stack area for parameter passing.

2. The selector is loaded into TR from the gate. (This step is accomplished by a CALL or JMP that refers to a valid TSS descriptor.)

3. The TR selects the TSS.

4. The current state is saved in the current TSS and the new TSS is accessed with the state of the new task (all the registers) loaded into the microprocessor. The current state is saved at the TSS selector currently found in the TR. Once the current state is saved, a new value (by the JMP or CALL) for the TSS selector is loaded into TR and the new state is loaded from the new TSS.

The return from a task is accomplished by the following steps:

1. The current state of the microprocessor is saved in the current TSS.

2. The back-link selector is loaded to the TR to access the prior TSS so that the prior state of the machine can be returned to and be restored to the microprocessor. The return for a called TSS is accomplished by the IRET instruction.

 

 

THE 80386 AND 80486 MICROPROCESSORS:INTRODUCTION TO THE 80386 MICROPROCESSOR.

THE 80386 AND 80486 MICROPROCESSORS

INTRODUCTION

The 80386 microprocessor is a full 32-bit version of the earlier 8086/80286 16-bit microprocessors, and represents a major advancement in the architecture—a switch from a 16-bit architecture to a 32-bit architecture. Along with this larger word size are many improvements and additional features. The 80386 microprocessor features multitasking, memory management, virtual memory (with or without paging), software protection, and a large memory system. All soft- ware written for the early 8086/8088 and the 80286 are upward-compatible to the 80386 micro- processor. The amount of memory addressable by the 80386 is increased from the 1M byte found in the 8086/8088 and the 16M bytes found in the 80286, to 4G bytes in the 80386. The 80386 can switch between protected mode and real mode without resetting the microprocessor. Switching from protected mode to real mode was a problem on the 80286 microprocessor because it required a hardware reset.

The 80486 microprocessor is an enhanced version of the 80386 microprocessor that exe- cutes many of its instructions in one clocking period. The 80486 microprocessor also contains an 8K-byte cache memory and an improved 80387 numeric coprocessor. (Note that the 80486DX4 contains a 16K-byte cache.) When the 80486 is operated at the same clock frequency as an 80386, it performs with about a 50% speed improvement. In Chapter 18, the Pentium and Pentium Pro are detailed. These microprocessors both contain a 16K cache memory, and perform at better than twice the speed of the 80486 microprocessor. The Pentium and Pentium Pro also contain improved numeric coprocessors that operate five times faster than the 80486 numeric coprocessor. Chapter 19 deals with additional improvements in the Pentium II–Core2 microprocessors.

CHAPTER OBJECTIVES

Upon completion of this chapter, you will be able to:

1. Contrast the 80386 and 80486 microprocessors with earlier Intel microprocessors.

2. Describe the operation of the 80386 and 80486 memory-management unit and paging unit.

3. Switch between protected mode and real mode.

4. Define the operation of additional 80386/80486 instructions and addressing modes.

5. Explain the operation of a cache memory system.

6. Detail the interrupt structure and direct memory access structure of the 80386/80486.

7. Contrast the 80486 with the 80386 microprocessor.

8. Explain the operation of the 80486 cache memory.

INTRODUCTION TO THE 80386 MICROPROCESSOR

Before the 80386 or any other microprocessor can be used in a system, the function of each pin must be understood. This section of the chapter details the operation of each pin, along with the external memory system and I/O structures of the 80386.

Figure 17–1 illustrates the pin-out of the 80386DX microprocessor. The 80386DX is pack- aged in a 132-pin PGA (pin grid array). Two versions of the 80386 are commonly available: the 80386DX, which is illustrated and described in this chapter and the 80386SX, which is a reduced bus version of the 80386. A new version of the 80386—the 80386EX—incorporates the AT bus system, dynamic RAM controller, programmable chip selection logic, 26 address pins, 16 data pins, and 24 I/O pins. Figure 17–2 illustrates the 80386EX embedded PC.

The 80386DX addresses 4G bytes of memory through its 32-bit data bus and 32-bit address. The 80386SX, more like the 80286, addresses 16M bytes of memory with its 24-bit address bus via its 16-bit data bus. The 80386SX was developed after the 80386DX for applications that didn’t require the full 32-bit bus version. The 80386SX was found in many early personal computers that used the same basic motherboard design as the 80286. At the time that the 80386SX was popular, most applications, including Windows 3.11, required fewer than 16M bytes of memory, so the 80386SX is a popular and a less costly version of the 80386 micro- processor. Even though the 80486 has become a less expensive upgrade path for newer systems, the 80386 still can be used for many applications. For example, the 80386EX does not appear in computer systems, but it is becoming very popular in embedded applications.

The 80186, 80188, and 80286 Microprocessors-0413The 80186, 80188, and 80286 Microprocessors-0414

As with earlier versions of the Intel family of microprocessors, the 80386 requires a single

+5.0 V power supply for operation. The power supply current averages 550 mA for the 25 MHz version of the 80386, 500 mA for the 20 MHz version, and 450 mA for the 16 MHz version. Also available is a 33 MHz version that requires 600 mA of power supply current. The power supply current for the 80386EX is 320 mA when operated at 33 MHz. Note that during some modes of normal operation, power supply current can surge to over 1.0 A. This means that the power sup- ply and power distribution network must be capable of supplying these current surges. This device contains multiple VCC and VSS connections that must all be connected to +5.0 V and grounded for proper operation. Some of the pins are labeled N/C (no connection) and must not be connected. Additional versions of the 80386SX and 80386EX are available with a +3.3 V power supply. They are often found in portable notebook or laptop computers and are usually packaged in a surface mount device.

Each 80386 output pin is capable of providing 4.0 mA (address and data connections) or

5.0 mA (other connections). This represents an increase in drive current compared to the 2.0 mA available on earlier 8086, 8088, and 80286 output pins. The output current available on most 80386EX output pins is 8.0 mA. Each input pin represents a small load, requiring only ±10 μA of current. In some systems, except the smallest, these current levels require bus buffers.

The function of each 80386DX group of pins follows:

A31–A2 Address bus connections address any of the 1G × 32 (4G bytes) memory locations found in the 80386 memory system. Note that A0 and A1 are encoded in the bus enable (BE3 – BE0) to select any or all of the four bytes in a

clip_image01532-bit-wide memory location. Also note that because the 80386SX contains a 16-bit data bus in place of the 32-bit data bus found on the 80386DX, A1 is pre- sent on the 80386SX, and the bank selection signals are replaced with BHE and BLE. The BHE signal enables the upper data bus half; the BLE signal enables the lower data bus half.

clip_image016clip_image017D31–D0 Data bus connections transfer data between the microprocessor and its memory and I/O system. Note that the 80386SX contains D15–D0.

BE3 BE0

clip_image018M>IO

clip_image019clip_image020W>R ADS

Bank enable signals select the access of a byte, word, or doubleword of data. These signals are generated internally by the microprocessor from address bits A1 and A0. On the 80386SX, these pins are replaced by BHE, BLE, and A1. Memory/IO selects a memory device when a logic 1 or an I/O device when a logic 0. During the I/O operation, the address bus contains a 16-bit I/O address on address connections A15–A2.

clip_image021clip_image019[1]clip_image022Write>Read indicates that the current bus cycle is a write when a logic 1 or a read when a logic 0.

clip_image023The address data strobe becomes active whenever the 80386 has issued a valid memory or I/O address. This signal is combined with the W>R signal to gener- ate the separate read and write signals present in the earlier 8086–80286 micro- processor-based systems.

RESET Reset initializes the 80386, causing it to begin executing software at memory location FFFFFFF0H. The 80386 is reset to the real mode, and the leftmost 12 address connections remain logic 1s (FFFH) until a far jump or far call is exe- cuted. This allows compatibility with earlier microprocessors.

clip_image024CLK2 Clock times 2 is driven by a clock signal that is twice the operating frequency of the 80386. For example, to operate the 80386 at 16 MHz, apply a 32 MHz clock to this pin.

clip_image025clip_image026READY LOCK D>C

clip_image027BS16

clip_image028NA

Ready controls the number of wait states inserted into the timing to lengthen memory accesses.

Lock becomes a logic 0 whenever an instruction is prefixed with the LOCK: prefix. This is used most often during DMA accesses.

Data/control indicates that the data bus contains data for or from memory or I/O when a logic 1. If D>C is a logic 0, the microprocessor is halted or executes an interrupt acknowledge.

clip_image029clip_image030clip_image027[1]Bus size 16 selects either a 32-bit data bus (BS16 = 1) or a 16-bit data bus (BS16 = 0). In most cases, if an 80386DX is operated on a 16-bit data bus, we use the 80386SX that has a 16-bit data bus. On the 80386EX, the BS8 pin selects an 8-bit data bus.

Next address causes the 80386 to output the address of the next instruction or data in the current bus cycle. This pin is often used for pipelining the address.

HOLD Hold requests a DMA action.

clip_image031HLDA Hold acknowledge indicates that the 80386 is currently in a hold condition.

clip_image032PEREQ BUSY

The coprocessor request asks the 80386 to relinquish control and is a direct connection to the 80387 arithmetic coprocessor.

Busy is an input used by the WAIT or FWAIT instruction that waits for the coprocessor to become not busy. This is also a direct connection to the 80387 from the 80386.

clip_image033ERROR

Error indicates to the microprocessor that an error is detected by the coprocessor.

INTR An interrupt request is used by external circuitry to request an interrupt.

NMI A non-maskable interrupt requests a non-maskable interrupt as it did on the earlier versions of the microprocessor.

The Memory System

The physical memory system of the 80386DX is 4G bytes in size and is addressed as such. If vir- tual addressing is used, 64T bytes are mapped into the 4G bytes of physical space by the mem- ory management unit and descriptors. (Note that virtual addressing allows a program to be larger than 4G bytes if a method of swapping with a large hard disk drive exists.) Figure 17–3 shows the organization of the 80386DX physical memory system.

The memory is divided into four 8-bit wide memory banks, each containing up to 1G bytes of memory. This 32-bit-wide memory organization allows bytes, words, or doublewords of memory data to be accessed directly. The 80386DX transfers up to a 32-bit-wide number in a single memory cycle, whereas the early 8088 requires four cycles to accomplish the same trans- fer, and the 80286 and 80386SX require two cycles. Today, the data width is important, espe- cially with single-precision floating-point numbers that are 32 bits wide. High-level software normally uses floating-point numbers for data storage, so 32-bit memory locations speed the execution of high-level software when it is written to take advantage of this wider memory.

Each memory byte is numbered in hexadecimal as they were in prior versions of the fam- ily. The difference is that the 80386DX uses a 32-bit-wide memory address, with memory bytes numbered from location 00000000H to FFFFFFFFH.

clip_image034The two memory banks in the 8086, 80286, and 80386SX system are accessed via BLE (A0 on the 8086 and 80286) and BHE. In the 80386DX, the memory banks are accessed via four bank

clip_image035clip_image035[1]enable signals, BE3 – BE0. This arrangement allows a single byte to be accessed when one bank enable signal is activated by the microprocessor. It also allows a word to be addressed when two

The 80186, 80188, and 80286 Microprocessors-0415

bank enable signals are activated. In most cases, a word is addressed in banks 0 and 1, or in banks 2 and 3. Memory location 00000000H is in bank 0, location 00000001H is in bank 1, location 00000002H is in bank 2, and location 00000003H is in bank 3. The 80386DX does not contain address connections A0 and A1 because these have been encoded as the bank enable signals. Likewise, the 80386SX does not contain the A0 address pin because it is encoded in the BLE and BHE signals. The 80386EX addresses data either in two banks for a 16-bit-wide memory system if BS8 = 1 or as an 8-bit system if BS8 = 0.

Buffered System. Figure 17–4 shows the 80386DX connected to buffers that increase fan-out from its address, data, and control connections. This microprocessor is operated at 25 MHz using a 50 MHz clock input signal that is generated by an integrated oscillator module. Oscillator modules are usually used to provide a clock in modern microprocessor-based equipment. The HLDA signal is used to enable all buffers in a system that uses direct memory access. Otherwise, the buffer enable pins are connected to ground in a non-DMA system.

Pipelines and Caches. The cache memory is a buffer that allows the 80386 to function more efficiently with lower DRAM speeds. A pipeline is a special way of handling memory accesses so the memory has additional time to access data. A 16 MHz 80386 allows memory devices with access times of 50 ns or less to operate at full speed. Obviously, there are few DRAMs currently available with these access times. In fact, the fastest DRAMs currently in use have an access time of 40 ns or longer. This means that some technique must be found to interface these memory devices, which are slower than required by the microprocessor. Three techniques are available: interleaved memory, caching, and a pipeline.

The pipeline is the preferred means of interfacing memory because the 80386 micro- processor supports pipelined memory accesses. Pipelining allows memory an extra clocking period to access data. The extra clock extends the access time from 50 ns to 81 ns on an 80386 operating with a 16 MHz clock. The pipe, as it is often called, is set up by the microprocessor. When an instruction is fetched from memory, the microprocessor often has extra time before the next instruction is fetched. During this extra time, the address of the next instruction is sent out from the address bus ahead of time. This extra time (one clock period) is used to allow additional access time to slower memory components.

Not all memory references can take advantage of the pipe, which means that some memory cycles are not pipelined. These nonpipelined memory cycles request one wait state if the normal pipeline cycle requires no wait states. Overall, a pipe is a cost-saving feature that reduces the access time required by the memory system in low-speed systems.

Not all systems can take advantage of the pipe. Those systems typically operate at 20, 25, or 33 MHz. In these higher-speed systems, another technique must be used to increase the memory system speed. The cache memory system improves overall performance of the mem- ory systems for data that are accessed more than once. Note that the 80486 contains an internal cache called a level 1 cache and the 80386 can only contain an external cache called a level 2 cache.

A cache is a high-speed memory (SRAM) system that is placed between the microproces- sor and the DRAM memory system. Cache memory devices are usually static RAM memory components with access times of less than 10 ns. In many cases, we see level 2 cache memory systems with sizes between 32K and 1M byte. The size of the cache memory is determined more by the application than by the microprocessor. If a program is small and refers to little memory data, a small cache is beneficial. If a program is large and references large blocks of memory, the largest cache size possible is recommended. In many cases, a 64K-byte cache improves speed sufficiently, but the maximum benefit is often derived from a 256K-byte cache. It has been found that increasing the cache size much beyond 256K provides little benefit to the operating speed of the system that contains an 80386 microprocessor.

The 80186, 80188, and 80286 Microprocessors-0416

Interleaved Memory Systems. An interleaved memory system is another method of improving the speed of a system. Its only disadvantage is that it costs considerably more memory because of its structure. Interleaved memory systems are present in some systems, so memory access times can be lengthened without the need for wait states. In some systems, an interleaved memory may still require wait states, but may reduce their number. An interleaved memory system requires two or more complete sets of address buses and a controller that provides addresses for each bus. Systems that employ two complete buses are called a two-way interleave; systems that use four complete buses are called a four-way interleave.

An interleaved memory is divided into two or four parts. For example, if an interleaved memory system is developed for the 80386SX microprocessor, one section contains the 16-bit addresses 000000H–000001H, 000004H–000005H, and so on; the other section contains addresses 000002–000003, 000006H–000007H, and so forth. While the microprocessor accesses locations 000000H–000001H, the interleave control logic generates the address strobe signal for locations 000002H–000003H. This selects and accesses the word at location 000002H–000003H, while the microprocessor processes the word at location 000000H–000001H. This process alternates memory sections, thus increasing the performance of the memory system.

Interleaving increases the amount of access time provided to the memory because the address is generated to select the memory before the microprocessor accesses it. This is because the microprocessor pipelines memory addresses, sending the next address out before the data are read from the last address.

The problem with interleaving, although not major, is that the memory addresses must be accessed so that each section is alternately addressed. This does not always happen as a program executes. Under normal program execution, the microprocessor alternately addresses memory approximately 93% of the time. For the remaining 7%, the microprocessor addresses data in the same memory section, which means that in these 7% of the memory accesses, the memory sys- tem must cause wait states because of the reduced access time. The access time is reduced because the memory must wait until the previous data are transferred before it can obtain its address. This leaves the memory with less access time; therefore, a wait state is required for accesses in the same memory bank.

See Figure 17–5 for the timing diagram of the address as it appears at the microprocessor address pins. This timing diagram shows how the next address is output before the current data are accessed. It also shows how access time is increased by using interleaved memory addresses for each section of memory compared to a non-interleaved access, which requires a wait state.

clip_image045Figure 17–6 pictures the interleave controller. Admittedly, this is a complex logic circuit, which needs some explanation. First, if the SEL input (used to select this section of the memory) is inactive (logic 0), then the WAIT signal is a logic 1. Also, both ALE0 and ALE1, used to strobe the address to the memory sections, are both logic 1s, causing the latches connected to them to become transparent.

clip_image046As soon as the SEL input becomes a logic 1, this circuit begins to function. The A1 input is used to determine which latch (U2B or U5A) becomes a logic 0, selecting a section of the mem- ory. Also the ALE pin that becomes a logic 0 is compared with the previous state of the ALE pins. If the same section of memory is accessed a second time, the WAIT signal becomes a logic 0, requesting a wait state.

Figure 17–7 illustrates an interleaved memory system that uses the circuit of Figure 17–6. Notice how the ALE0 and ALE1 signals are used to capture the address for either section of memory. The memory in each bank is 16 bits wide. If accesses to memory require 8-bit data, the system causes wait states, in most cases. As a program executes, the 80386SX fetches instructions 16 bits at a time from normally sequential memory locations. Program execution uses inter- leaving in most cases. If a system is going to access mostly 8-bit data, it is doubtful that memory interleaving will reduce the number of wait states.

The 80186, 80188, and 80286 Microprocessors-0417The 80186, 80188, and 80286 Microprocessors-0418The 80186, 80188, and 80286 Microprocessors-0419

The access time allowed by an interleaved system, such as the one shown in Figure 17–7, is increased to 112 ns from 69 ns by using a 16 MHz system clock. (If a wait state is inserted, access time with a 16 MHz clock is 136 ns, which means that an interleaved system performs at about the same rate as a system with one wait state.) If the clock is increased to 20 MHz, the interleaved memory requires 89.6 ns, where standard, noninterleaved memory interfaces allow 48 ns for memory access. At this higher clock rate, 80 ns DRAMs function properly without wait states when the memory addresses are interleaved. If an access to the same section occurs, a wait state is inserted.

The Input/Output System

The 80386 input/output system is the same as that found in any Intel 8086 family microprocessor- based system. There are 64K different bytes of I/O space available if isolated I/O is implemented. With isolated I/O, the IN and OUT instructions are used to transfer I/O data between the microprocessor and I/O devices. The I/O port address appears on address bus connections A15–A2, with BE3 – BE0 used to select a byte, word, or doubleword of I/O data. If memory- mapped I/O is implemented, then the number of I/O locations can be any amount up to 4G bytes. With memory-mapped I/O, any instruction that transfers data between the microprocessor and memory system can be used for I/O transfers because the I/O device is treated as a memory

The 80186, 80188, and 80286 Microprocessors-0420

device. Almost all 80386 systems use isolated I/O because of the I/O protection scheme provided by the 80386 in protected mode operation.

Figure 17–8 shows the I/O map for the 80386 microprocessor. Unlike the I/O map of earlier Intel microprocessors, which were 16 bits wide, the 80386 uses a full 32-bit-wide I/O system divided into four banks. This is identical to the memory system, which is also divided into four banks. Most I/O transfers are 8 bits wide because we often use ASCII code (a 7-bit code) for transferring alphanumeric data between the microprocessor and printers and keyboards. This may change if Unicode, a 16-bit alphanumeric code, becomes common and replaces ASCII code. Recently, I/O devices that are 16 and even 32 bits wide have appeared for systems such as disk memory and video display interfaces. These wider I/O paths increase the data transfer rate between the microprocessor and the I/O device when compared to 8-bit transfers.

The I/O locations are numbered from 0000H to FFFFH. A portion of the I/O map is designated for the 80387 arithmetic coprocessor. Although the port numbers for the coprocessor are well above the normal I/O map, it is important that they be taken into account when decoding I/O space (overlaps). The coprocessor uses I/O locations 800000F8H–800000FFH for communications between the 80387 and 80386. The 80287 numeric coprocessor designed for use with the 80286 uses the I/O addresses 00F8H–00FFH for coprocessor communications. Because we often decode only address connections A15–A2 to select an I/O device, be aware that the coprocessor will activate devices 00F8H–00FFH unless address line A31 is also decoded. This should present no problem because you really should not be using I/O ports 00F8H–00FFH for any purpose.

The only new feature that was added to the 80386 with respect to I/O is the I/O privilege information added to the tail end of the TSS when the 80386 is operated in protected mode. As described in the section on memory management, an I/O location can be blocked or inhibited in the protected mode. If the blocked I/O location is addressed, an interrupt (type 13, general fault) is generated. This scheme is added so that I/O access can be prohibited in a multiuser environment. Blocking is an extension of the protected mode operation, as are privilege levels.

Memory and I/0 Control Signals

The memory and I/O are controlled with separate signals. The M>IO signal indicates whether the data transfer is between the microprocessor and the memory (M>IO 1) or I/O (M >IO 0). In addition to M>IO, the memory and I/O systems must read or write data. The W>R signal is a logic 0 for a read operation and a logic 1 for a write operation. The ADS signal is used to qualify the M>IO and W>R control signals. This is a slight deviation from earlier Intel microprocessors, which didn’t use ADS for qualification.

clip_image060See Figure 17–9 for a simple circuit that generates four control signals for the memory and I/O devices in the system. Notice that two control signals are developed for memory control (MRDC

The 80186, 80188, and 80286 Microprocessors-0421

and MWTC) and two for I/O control (IORC and IOWC). These signals are consistent with the memory and I/O control signals generated for use in earlier versions of the Intel microprocessor.

Timing

Timing is important for understanding how to interface memory and I/O to the 80386 microprocessor. Figure 17–10 shows the timing diagram of a nonpipelined memory read cycle. Note that the timing is referenced to the CLK2 input signal and that a bus cycle consists of four clocking periods.

Each bus cycle contains two clocking states with each state (T1 and T2) containing two clocking periods. Note in Figure 17–10 that the access time is listed as time number 3. The 16 MHz version allows memory an access time of 78 ns before wait states are inserted in this nonpipelined mode of operation. To select the nonpipelined mode, we place a logic 1 on the NA pin.

Figure 17–11 illustrates the read timing when the 80386 is operated in the pipelined mode. Notice that additional time is allowed to the memory for accessing data because the address is sent out early. Pipelined mode is selected by placing a logic 0 on the NA pin and by using address latches to capture the pipelined address. The clock pulse that is applied to the address

The 80186, 80188, and 80286 Microprocessors-0422The 80186, 80188, and 80286 Microprocessors-0423

latches comes from the ADS signal. Address latches must be used with a pipelined system, as well as with interleaved memory banks. The minimum number of interleaved banks of two and four have been successfully used in some applications.

Notice that the pipelined address appears one complete clocking state before it normally appears with nonpipelined addressing. In the 16 MHz version of the 80386, this allows an additional 62.5 ns for memory access. In a nonpipelined system, a memory access time of 78 ns is allowed to the memory system; in a pipelined system, 140.5 ns is allowed. The advantages of the pipelined system are that no wait states are required (in many, but not all bus cycles) and much lower-speed memory devices may be connected to the microprocessor. The disadvantage is that we need to interleave memory to use a pipe, which requires additional circuitry and occasional wait states.

The 80186, 80188, and 80286 Microprocessors-0424

Wait States

Wait states are needed if memory access times are long compared with the time allowed by the 80386 for memory access. In a nonpipelined 33 MHz system, memory access time is only 46 ns. Currently, only a few DRAM memories exist that have an access time of 46 ns. This means that often wait states must be introduced to access the DRAM (one wait for 60 ns DRAM) or an EPROM that has an access time of 100 ns (two waits). Note that this wait state is built into a motherboard and cannot be removed.

The READY input controls whether or not wait states are inserted into the timing. The READY input on the 80386 is a dynamic input that must be activated during each bus cycle. Figure 17–12 on the previous page shows a few bus cycles with one normal (no wait) cycle and one that contains a single wait state. Notice how the READY is controlled to cause 0 or 1 wait.

The 80186, 80188, and 80286 Microprocessors-0425

The READY signal is sampled at the end of a bus cycle to determine whether the clock cycle is T2 or TW. If READY 0 at this time, it is the end of the bus cycle or T2. If READY is 1 at the end of a clock cycle, the cycle is a TW and the microprocessor continues to test READY, searching for a logic 0 and the end of the bus cycle.

In the nonpipelined system, whenever ADS becomes a logic 0, a wait state is inserted if READY 1. After ADS returns to a logic 1, the positive edges of the clock are counted to generate the READY signal. The READY signal becomes a logic 0 after the first clock to insert 0 wait states. If one wait state is inserted, the READY line must remain a logic 1 until at least two clocks have elapsed. If additional wait states are desired, then additional time must elapse before READY is cleared. This essentially allows any number of wait states to be inserted into the timing.

Figure 17–13 on the previous page shows a circuit that inserts 0 through 3 wait states for various memory addresses. In the example, one wait state is produced for a DRAM access and two wait states for an EPROM access. The 74F164 clears whenever ADS is low and D>C is high. It begins to shift after ADS returns to a logic 1 level. As it shifts, the 00000000 in the shift register begins to fill with logic 1s from the QA connection toward the QH connection. The four different outputs are connected to an inverting multiplexer that generates the active low READY signal.