THE PENTIUM AND PENTIUM PRO MICROPROCESSORS:INTRODUCTION TO THE PENTIUM MICROPROCESSOR.

THE PENTIUM AND PENTIUM PRO MICROPROCESSORS

INTRODUCTION

The Pentium microprocessor signals an improvement to the architecture found in the 80486 microprocessor. The changes include an improved cache structure, a wider data bus width, a faster numeric coprocessor, a dual integer processor, and branch prediction logic. The cache has been reorganized to form two caches that are each 8K bytes in size, one for caching data, and the other for instructions. The data bus width has been increased from 32 bits to 64 bits. The numeric coprocessor operates at approximately five times faster than the 80486 numeric coprocessor. A dual integer processor often allows two instructions per clock. Finally, the branch prediction logic allows programs that branch to execute more efficiently. Notice that these changes are internal to the Pentium, which makes software upward-compatible from earlier Intel 80X86 microprocessors. A later improvement to the Pentium was the addition of the MMX instructions.

The Pentium Pro is a still faster version of the Pentium. It contains a modified internal architecture that can schedule up to five instructions for execution and an even faster floating- point unit. The Pentium Pro also contains a 256K-byte or 512K-byte level 2 cache in addition to the 16K-byte (8K for data and 8K for instruction) level 1 cache. The Pentium Pro includes error correction circuitry (ECC), as described in Chapter 10, to correct a one-bit error and indicate a two-bit error. Also added are four additional address lines, giving the Pentium Pro access to an astounding 64G bytes of directly addressable memory space.

CHAPTER OBJECTIVES

Upon completion of this chapter, you will be able to:

1. Contrast the Pentium and Pentium Pro with the 80386 and 80486 microprocessors.

2. Describe the organization and interface of the 64-bit-wide Pentium memory system and its variations.

3. Contrast the changes in the memory-management unit and paging unit when compared to the 80386 and 80486 microprocessors.

4. Detail the new instructions found with the Pentium microprocessor.

5. Explain how the superscalar dual integer units improve performance of the Pentium microprocessor.

6. Describe the operation of the branch prediction logic.

7. Detail the improvements in the Pentium Pro when compared with the Pentium.

8. Explain how the dynamic execution architecture of the Pentium Pro functions.

INTRODUCTION TO THE PENTIUM MICROPROCESSOR

Before the Pentium or any other microprocessor can be used in a system, the function of each pin must be understood. This section of the chapter details the operation of each pin, along with the external memory system and I/O structures of the Pentium microprocessor.

Figure 18–1 illustrates the pin-out of the Pentium microprocessor, which is packaged in a huge 237-pin PGA (pin grid array). The Pentium was made available in two versions: the full- blown Pentium and the P24T version called the Pentium Over Drive. The P24T version contains a 32-bit data bus, compatible for insertion into 80486 machines, which contains the P24T socket. The P24T version also comes with a fan built into the unit. The most notable difference in the

The Pentium and Pentium Pro Microprocessors-0463

pin-out of the Pentium, when compared to earlier 80486 microprocessors, is that there are 64 data bus connections instead of 32, which require a larger physical footprint.

As with earlier versions of the Intel family of microprocessors, the early versions of the Pentium require a single +5.0 V power supply for operation. The power supply current averages

3.3 A for the 66 MHz version of the Pentium, and 2.91 A for the 60 MHz version. Because these currents are significant, so are the power dissipations of these microprocessors: 13 W for the 66 MHz version and 11.9 W for the 60 MHz version. The current versions of the Pentium, 90 MHz and above, use a 3.3 V power supply with reduced current consumption. At present, a good heat sink with considerable airflow is required to keep the Pentium cool. The Pentium contains multiple VCC and VSS connections that must all be connected to +5.0 V or +3.3 V and ground for proper operation. Some of the pins are labeled N/C (no connection) and must not be connected.

The latest versions of the Pentium have been improved to reduce the power dissipation. For example, the 233 MHz Pentium requires 3.4 A or current, which is only slightly more than the 3.3 A required by the early 66 MHz version.

Each Pentium output pin is capable of providing 4.0 mA of current at a logic 0 level and 2.0 mA at a logic 1 level. This represents an increase in drive current, compared to the 2.0 mA available on earlier 8086, 8088, and 80286 output pins. Each input pin represents a small load requiring only 15 μA of current. In some systems, except the smallest, these current levels require bus buffers.

The function of each Pentium group of pins follows:

A20

The address A20 mask is an input that is asserted in the real mode to signal the Pentium to perform address wraparound, as in the 8086 microprocessor, for use of the HIMEM.SYS driver.

A31–A3 Address bus connections address any of the 5l2K × 64 memory locations found in the Pentium memory system. Note that A0, A1, and A2 are encoded in the bus enable (BE7–BE0), described elsewhere, to select any or all of the eight bytes in a 64-bit-wide memory location.

ADS

The address data strobe becomes active whenever the Pentium has issued a valid memory or I/O address. This signal is combined with the W>R and M>IO signals to generate the separate read and write signals present in the earlier 8086–80286 microprocessor-based systems.

AHOLD Address hold is an input that causes the Pentium to hold the address and AP signals for the next clock.

APCHK

BE7–BE0

BOFF

Address parity provides even parity for the memory address on all Pentium- initiated memory and I/O transfers. The AP pin must also be driven with even parity information on all inquire cycles in the same clocking period as the EADS signal. Address parity check becomes a logic 0 whenever the Pentium detects an address parity error.

Bank enable signals select the access of a byte, word, doubleword, or quadword of data. These signals are generated internally by the microprocessor from address bits A0, A1, and A2.

The back-off input aborts all outstanding bus cycles and floats the Pentium buses until BOFF is negated. After BOFF is negated, the Pentium restarts all aborted bus cycles in their entirety.

BP3–BP0 The breakpoint pins BP3–BP0 indicate a breakpoint match when the debug registers are programmed to monitor for matches.

PM1–PM0 The performance monitoring pins PM1 and PM0 indicate the settings of the performance monitoring bits in the debug mode control register.

BRDY

The burst ready input signals the Pentium that the external system has applied or extracted data from the data bus connections. This signal is used to insert wait states into the Pentium timing.

BREQ The bus request output indicates that the Pentium has generated a bus request.

BT3–BT0 The branch trace outputs provide bits 2–0 of the branch target linear address and the default operand size on BT3. These outputs become valid during a branch trace special message cycle.

BUSCHK

CACHE

The bus check input allows the system to signal the Pentium that the bus transfer has been unsuccessful.

The cache output indicates that the current Pentium cycle can cache data.

CLK The clock is driven by a clock signal that is at the operating frequency of the Pentium. For example, to operate the Pentium at 66 MHz, apply a 66 MHz clock to this pin.

D63–D0 Data bus connections transfer byte, word, doubleword, and quadword data between the microprocessor and its memory and I/O system.

D>C

Data/control indicates that the data bus contains data for or from memory or I/O when a logic 1. If D>C is a logic 0, the microprocessor is either halted or executing an interrupt acknowledge.

DP7–DP0 Data parity is generated by the Pentium and detects its eight memory banks through these connections.

EADS EWBE FERR FLUSH

FRCMC HIT

HITM

The external address strobe input signals that the address bus contains an address for an inquire cycle.

The external write buffer empty input indicates that a write cycle is pending in the external system.

The floating-point error is comparable to the ERROR line in the 80386 and shows that the internal coprocessor has erred.

The flush cache input causes the cache to flush all write-back lines and invalidate its internal caches. If the FLUSH input is a logic 0 during a reset operation, the Pentium enters its test mode.

The functional redundancy check is sampled during a reset to configure the Pentium in the master (1) or checker mode (0).

Hit shows that the internal cache contains valid data in the inquire mode.

Hit modified shows that the inquire cycle found a modified cache line. This output is used to inhibit other master units from accessing data until the cache line is writ- ten to memory.

HOLD Hold requests a DMA action.

HLDA Hold acknowledge indicates that the Pentium is currently in a hold condition.

IBT Instruction branch taken indicates that the Pentium has taken an instruction branch.

IERR IGNNE

The internal error output shows that the Pentium has detected an internal parity error or functional redundancy error.

The ignore numeric error input causes the Pentium to ignore a numeric coprocessor error.

INIT The initialization input performs a reset without initializing the caches, write-back buffers, and floating-point registers. This may not be used to reset the microprocessor in lieu of RESET after power-up.

INTR The interrupt request is used by external circuitry to request an interrupt.

INV The invalidation input determines the cache line state after an inquiry.

IU The U-pipe instruction complete output shows that the instruction in the U-pipe is complete.

IV The V-pipe instruction complete output shows that the instruction in the V-pipe is complete.

KEN LOCK
M>IO
NA

The cache enable input enables internal caching.

LOCK becomes a logic 0 whenever an instruction is prefixed with the LOCK: pre- fix. This is most often used during DMA accesses.

Memory/IO selects a memory device when a logic 1 or an I/O device when a logic

1. During the I/O operation, the address bus contains a 16-bit I/O address on address connections A15–A3.

Next address indicates that the external memory system is ready to accept a new bus cycle.

NMI The non-maskable interrupt requests a non-maskable interrupt, just as on the earlier versions of the microprocessor.

PCD The page cache disable output shows that the internal page caching is disabled by reflecting the state of the CR3 PCD bit.

PCHK

PEN

The parity check output signals a parity check error for data read from memory

or I/O.

The parity enable input enables the machine check interrupt or exception.

PRDY The probe ready output indicates that the probe mode has been entered for debugging.

PWT The page write-through output shows the state of the PWT bit in CR3. This pin is provided for use with the Intel Debugging Port and causes an interrupt.

RESET Reset initializes the Pentium, causing it to begin executing software at memory location FFFFFFF0H. The Pentium is reset to the real mode and the leftmost 12 address connections remain logic 1s (FFFH) until a far jump or far call is executed. This allows compatibility with earlier microprocessors. See Table 18–1 for the state of the Pentium after a hardware reset.

The Pentium and Pentium Pro Microprocessors-0464

SCYC The split cycle output signals a misaligned LOCKed bus cycle.

SMI SMIACT

The system management interrupt input causes the Pentium to enter the system management mode of operation.

The system management interrupt active output shows that the Pentium is operating in the system management mode.

TCK The testability clock input selects the clocking function in accordance to the IEEE 1149.1 Boundary Scan interface.

TDI The test data input is used to test data clocked into the Pentium with the TCK signal.

TDO The test data output is used to gather test data and instructions shifted out of the Pentium with TCK.

TMS The test mode select input controls the operation of the Pentium in test mode. The test reset input allows the test mode to be reset.

W>R WB>WT

Write/read indicates that the current bus cycle is a write when a logic 1 or a read when a logic 0.

Write-back/write-through selects the operation for the Pentium data cache.

The Memory System

The memory system for the Pentium microprocessor is 4G bytes in size, just as in the 80386DX and 80486 microprocessors. The difference lies in the width of the memory data bus. The Pentium uses a 64-bit data bus to address memory organized in eight banks that each contain 512M bytes of data. See Figure 18–2 for the organization of the Pentium physical memory system.

The Pentium memory system is divided into eight banks where each bank stores byte-wide data with a parity bit. The Pentium, like the 80486, employs internal parity generation and checking logic for the memory system’s data bus information. (Note that most Pentium systems do not use parity checks, because ECC is available.) The 64-bit-wide memory is important to double-precision floating-point data. Recall that a double-precision floating-point number is 64 bits wide. Because of the change to a 64-bit-wide data bus, the Pentium is able to retrieve float- ing-point data with one read cycle, instead of two as in the 80486. This causes the Pentium to function at a higher throughput than an 80486. As with earlier 32-bit Intel microprocessors, the memory system is numbered in bytes, from byte 00000000H to byte FFFFFFFFH.

Memory selection is accomplished with the bank enable signals (BE7–BE0). These separate memory banks allow the Pentium to access any single byte, word, doubleword, or quadword with one memory transfer cycle. As with earlier memory selection logic, eight separate write strobes are generated for writing to the memory system.

A new feature added to the Pentium is its capability to check and generate parity for the address bus (A31–A5) during certain operations. The AP pin provides the system with parity

The Pentium and Pentium Pro Microprocessors-0465

information and the APCHK indicates a bad parity check for the address bus. The Pentium takes no action when an address parity error is detected. The error must be assessed by the system and the system must take appropriate action (an interrupt), if so desired.

Input/Output System

The input/output system of the Pentium is completely compatible with earlier Intel microprocessors. The I/O port number appears on address lines A15–A3 with the bank enable signals used to select the actual memory banks used for the I/O transfer.

Beginning with the 80386 microprocessor, I/O privilege information is added to the TSS segment when the Pentium is operated in the protected mode. Recall that this allows I/O ports to be selectively inhibited. If the blocked I/O location is accessed, the Pentium generates a type 13 interrupt to signal an I/O privilege violation.

System Timing

As with any microprocessor, the system timing signals must be understood in order to interface the microprocessor. This portion of the text details the operation of the Pentium through its timing diagrams and shows how to determine memory access times.

The basic Pentium nonpipelined memory cycle consists of two clocking periods: T1 and T2. See Figure 18–3 for the basic nonpipelined read cycle. Notice from the timing diagram that the 66 MHz Pentium is capable of 33 million memory transfers per second. This assumes that the memory can operate at that speed.

Also notice from the timing diagram that the W>R signal becomes valid if ADS is a logic 0 at the positive edge of the clock (end of T1). This clock must be used to qualify the cycle as a read or a write.

During T1, the microprocessor issues the ADS, W>R, address, and M>IO signals. In order to qualify the W>R signal and generate appropriate MRDC and MWTC signals, we use a flip-flop

The Pentium and Pentium Pro Microprocessors-0466The Pentium and Pentium Pro Microprocessors-0467

to generate the W>R signal. Then a two-line-to-one-line multiplexer generates the memory and I/O control signals. See Figure 18–4 for a circuit that generates the memory and I/O control sig- nals for the Pentium microprocessor.

During T2, the data bus is sampled in synchronization with the end of T2 at the positive transition of the clock pulse. The setup time before the clock is given as 3.8 ns, and the hold time after the clock is given as 2.0 ns. This means that the data window around the clock is 5.8 ns. The address appears on the 8.0 ns maximum after the start of T1. This means that the Pentium micro- processor operating at 66 MHz allows 30.3 ns (two clocking periods), minus the address delay time of 8.0 ns and minus the data setup time of 3.8 ns. Memory access time without any wait states is 30.3 – 8.0 – 3.8, or 18.5 ns. This is enough time to allow access to a SRAM, but not to any DRAM without inserting wait states into the timing. The SRAM is normally found in the form of an external level 2 cache.

Wait states are inserted into the timing by controlling the BRDY input to the Pentium. The BRDY signal must become a logic 0 by the end of T2 or additional T2 states are inserted into the timing. See Figure 18–5 for a read cycle timing diagram that contains wait states for slower

The Pentium and Pentium Pro Microprocessors-0468The Pentium and Pentium Pro Microprocessors-0469

memory. The effect of inserting wait states into the timing is to lengthen the timing, allowing additional time for the memory to access data. In the timing shown, the access time has been lengthened so that standard 60 ns DRAM can be used in a system. Note that this requires the insertion of four wait states of 15.2 ns (one clocking period) each to lengthen the access time to 79.5 ns. This is enough time for the DRAM and any decoder in the system to function.

The BRDY signal is a synchronous signal generated by using the system clock. Figure 18–6 illustrates a circuit that can be used to generate BRDY for inserting any number of wait states into the Pentium timing diagram. You may recall a similar circuit inserting wait states into the timing diagram of the 80386 microprocessor. The ADS signal is delayed between 0 and 7 clocking periods by the 74Fl61 shift register to generate the BRDY signal. The exact number of wait states is selected by the 74F151 eight-line-to-one-line multiplexer. In this example, the multiplexer selects the four-wait output from the shift register.

A more efficient method of reading memory data is via the burst cycle. The burst cycle in the Pentium transfers four 64-bit numbers per burst cycle in five clocking periods. A burst with- out wait states requires that the memory system transfers data every 15.2 ns. If a level 2 cache is in place, this speed is no problem as long as the data are read from the cache. If the cache does not contain the data, then wait states must be inserted, which will reduce the data throughput. See Figure 18–7 for the Pentium burst cycle transfer without wait states. As before, wait states can be inserted to allow more time to the memory system for accesses.

The Pentium and Pentium Pro Microprocessors-0470

Branch Prediction Logic

The Pentium microprocessor uses branch prediction logic to reduce the time required for a branch caused by internal delays. These delays are minimized because when a branch instruction (short or near only) is encountered, the microprocessor begins prefetch instruction at the branch address. The instructions are loaded into the instruction cache, so when the branch occurs, the instructions are present and allow the branch to execute in one clocking period. If for any reason the branch prediction logic errs, the branch requires an extra three clocking periods to execute. In most cases, the branch prediction is correct and no delay ensues.

Cache Structure

The cache in the Pentium has been changed from the one found in the 80486 microprocessor. The Pentium contains two 8K-byte cache memories instead of one as in the 80486. There is an 8K-byte data cache and an 8K-byte instruction cache. The instruction cache stores only instructions, while the data cache stores data used by instructions.

In the 80486 with its unified cache, a program that was data-intensive quickly filled the cache, allowing little room for instructions. This slowed the execution speed of the 80486 micro- processor. In the Pentium, this cannot occur because of the separate instruction cache.

Superscalar Architecture

The Pentium microprocessor is organized with three execution units. One executes floating-point instructions, and the other two (U-pipe and V-pipe) execute integer instructions. This means that it is possible to execute three instructions simultaneously. For example, the FADD ST,ST(2) instruction, MOV EAX,10H instruction, and MOV EBX,12H instruction can all execute simultaneously because none of these instructions depend on each other. The FADD ST,ST(2) instruction is executed by the coprocessor; the MOV EAX,10H is executed by the U-pipe; and the MOV EBX,12H instruction is executed by the V-pipe. Because the floating-point unit is also used for MMX instructions, if available, the Pentium can execute two integers and one MMX instruction simultaneously.

Software should be written to take advantage of this feature by looking at the instructions in a program, and then modifying them when cases are discovered in which dependent instructions can be separated by nondependent instructions. These changes can result in up to a 40% execution speed improvement in some software. Make sure that any new compiler or other application package takes advantage of this new superscalar feature of the Pentium.

Leave a comment

Your email address will not be published. Required fields are marked *