11.4 Intel 80486 Microprocessor
The Intel 80486 is an enhanced 80386 microprocessor with on-chip floating-point hardware.
11.4.1 Intel80486/80386 Comparison
Table 11.2 compares the basic features of the 80486 with those of the 80386.
11.4.2 Special Features of the 80486
The Intel 80486 is a 32-bit microprocessor, like the Intel 80386. It executes the complete instruction set of the 80386 and the 80387DX floating-point coprocessor. Unlike the 80386, the 80486 on-chip floating-point hardware eliminates the need for an external floating-point coprocessor chip and the on-chip cache minimizes the need for an external cache and associated control logic.
The 80486 is object code compatible with the 8086, 8088, 80186, 80286, and 80386 processors. It can perform a complete set of arithmetic and logical operations on 8-, 16-, and 32-bit data types using a full-width ALU and eight general-purpose registers. Four gigabytes of physical memory can be addressed directly via its separate 32-bit addresses and data paths. An on-chip memory management unit is added, which maintains the integrity of memory in the multitasking and virtual-memory environments. Both memory segmentation and paging are supported.
The 80486 has an internal 8 Kbyte cache memory. This provides fast access to recently used instructions and data. The internal write-through cache can hold 8 Kbytes of data or instructions. The on-chip floating-point unit performs floating-point operations on the 32-, 64-, and 80- bit arithmetic formats specified in the IEEE standard and is object code compatible with the 8087, 80287, and 80387 coprocessors. The fetching, decoding, execution, and address translation of instructions is overlapped within the 80486 processor using instruction pipelining. This allows a continuous execution rate of one clock cycle per instruction for most instructions.
Like the 80386, the 80486 processor can operate in three modes (set in software):
real, protected, and virtual 8086 mode. After reset or power up, the 80486 is initialized in real mode. This mode has the same base architecture as the 8086, but allows access to the 32-bit register set of the 80486 processor. Nearly all of the 80486 processor instructions are available, but the default operand size is 16 bits. The main purpose of real mode is to set up the processor for protected mode.
Protected mode, or protected virtual address mode, is where the complete capabilities of the 80486 become available. Segmentation and paging can both be used in protected mode. All 8086, 80286, and 386 processor software can be run under the 80486 processor’s hardware-assisted protection mechanism.
Virtual 8086 mode is a submode for protected mode. It allows 8086 programs to be run but adds the segmentation and paging protection mechanisms of protected mode. It is more flexible to run 8086 in this mode than in real mode because virtual 8086 mode can simultaneously execute the 80486 operating system and both 8086 and 80486 processor applications.
The 80486 is provided with a bus backofffeature. Using this, the 80486 will float its bus signals if another bus master needs control of the bus during a 80486 bus cycle and then restart its cycle when the bus again becomes available. The 80486 includes dynamic bus sizing. Using this feature, external controllers can dynamically alter the effective width of the data bus with 8-, 16-, or 32-bit bus widths.
In terms of programming models, the Intel 80386 has very few differences with the 80486 processor. The 80486 processor defines new bits in the EFLAGS, CRO, and CR3 registers. In the 80386 processor, these bits were reserved, so the new architectural features should be a compatibility issue.
11.4.3 80486 New Instructions Beyond Those of the 80386
There are six basic instructions plus floating-point instructions added to the 80486 instruction set beyond those of the 80386 instruction set as follows:
1. Three New Application Instructions
-
BSWAP
-
XADD
-
CMPXCHG
2. Three New System Instructions
-
INVD
-
WBINVD
-
INVLPG
The 80386 can execute all its floating-point instructions when the 80387 is present in the system. The 80486, on the other hand, can directly execute all its floating point instructions (same as the 80386 floating-point instructions) because it has the on-chip floating-point hardware.
The three new application instructions included with the 80486 are BSWAP reg32; XADD dest, source; and CMPXCHG dest, source. BSWAP reg32 reverses the byte order of a 32-bit register, converting a value in little/big endian form to big/little endian form. That is, the BSWAP instruction exchanges bits 7-0 with bits 31-24 and bits 15-8 with bits 23-16 of a 32-bit register. Executing this instruction twice in a row leaves the register with the original value. When BSWAP is used with a 16-bit operand size, the result left in the destination operand is undefined. Consider an example of a 32-bit operand: If (EAX) = 12345678H, then after BSWAP EAX, the contents of EAX are 78563412H. Note that little endian is a byte-oriented method in which the bytes are ordered (left to right) as 3, 2, 1, and 0, with byte 3 being the most significant byte. Big endian on the other hand, is also a byte-oriented method where the bytes are ordered (left to right) as 0, I, 2, and 3 with byte 0 being the most significant byte. The BSWAP instruction speeds up execution of decimal arithmetic by operating on four digits at a time.
XADD dest, source has the form
The XADD dest, source instruction loads the destination into the source and then loads the sum of the destination and the original value of the source into the destination. For example, if (AX)= Ol23H, (BX) = 9876H, then after XADD AX, BX, the contents of AX and BX are respectively 9999H and 0123H.
CMPXCHG dest, source has the form:
The CMPXCHG instruction compares the (AL, AX or EAX register) with the destination. If they are equal, the source is loaded into the destination; Otherwise, the destination is loaded into the AL,AX or EAX. For example, if (DX) = 4324H, (AX) = 4532H, and (BX) = 4532H, then after CMPXCHG BX, OX, the ZF flag is set to one and (BX) = 4324H.
11.5 Intel Pentium Microprocessor
Table 11.3 summarizes the fundamental differences between the basic features of 486 and Pentium families. Microprocessors have served largely separate markets and purposes: business PCs and engineering workstations. The PCs have used Microsoft’s DOS and Windows operating systems whereas the workstations have used various features of UNIX.
The PCs have not been utilized in the workstation market because of their relatively modest performance, especially with regard to complicated graphics display and floating-point calculations. Workstations have been kept out of the PC market partially because of their high prices and hard-to-use system software.
The Pentium has brought the PCs up to workstation-class computational performance with sophisticated graphics. The Intel Pentium is a 32-bit microprocessor with a 64-bit data bus. The Intel Pentium, like its predecessor the Intel 80486, is 100% object code compatible with 8086/80386 systems. BICMOS(Bipolar and CMOS) technology is used for the Pentium.
The Pentium processor has three modes of operation; real-address mode (also called "real mode"), protected mode, and system management mode. The mode determines which instructions and architecture features are accessible. In real-address mode, the Pentium processor runs programs written for 8086 or for the real-address mode of an 80386 or 80486.
The architecture of the Pentium processor in this mode is identical to that of the 8086 microprocessor. In protected mode, all instruction and architectural features of the Pentium are available to the programmer. Some of the architectural features of the Pentium processor include memory management, protection, multitasking, and multiprocessing. While in protected mode, the virtual 8086 (v86) mode can be enabled for any task. For the v86 mode, the Pentium can directly execute "real-address-mode" 8086 software in a protected, multitasking environment.
The Pentium processor is also provided with a system management mode (SMM) similar to the one used in the 80486SL, which allows to design for low power usage. SMM is entered through activation of an external interrupt pin (system management interrupt, SMI#). In December 1994, Intel detected a flaw in the Pentium chip while performing certain division calculations. The Pentium is not the first chip that Intel has had problems with. The first version of the Intel 80386 had a math flaw that Intel quickly fixed before there were any complaints. Some experts feel that Intel should have acknowledged the math problem in the Pentium when it was first discovered and then have offered to replace the chips. In that case, the problem with the Pentium most likely would have been ignored by the users. However, Intel was heavily criticized by computer magazines when the division flaw in the Pentium chip was first detected.
The flaw in the division algorithm in the Pentium was caused by a problem with a look-up table used in the division. Errors occur in the fourth through the fifteenth significant
decimal digits. This means that in a result such as 5.78346, the last three digits could be incorrect. For example, the correct answer for the operation 4,195,835 – (4,195,835 + 3,145,727) + (3,145,727) is zero. The Pentium provided a wrong answer of 256. IBM claimed this problem can occur once every 24 days. Intel eventually fixed the division flaw problem in the Pentium.
The Pentium microprocessor is based on a superscalar design. This means that the processor includes dual pipelining and executes more than one instruction per clock cycle; note that scalar microprocessors such as the 80486 family have only one pipeline and execute one instruction per clock cycle, and superscalar processors allow more than one instruction to be executed per clock cycle.
The Pentium microprocessor contains the complete 80486 instruction set along with some new ones that are discussed later. Pentium’s on-chip memory management unit is completely compatible with that of the 80486.
The Pentium includes faster floating-point on-chip hardware than the 80486.
Pentium’s on-chip floating-point hardware has been completely redesigned over the 80486. Faster algorithms provide up to ten times speed-up for common operations such as add, multiply, and load. The two instruction pipelines and on-chip floating-point unit are capable of independent operations. Each pipeline issues frequently used instructions in a single clock cycle. The dual pipelines can jointly issue two integer instructions in one clock cycle or one floating-point instruction (under certain circumstances, two floating point instructions) in one clock cycle.
Branch prediction is implemented in the Pentium by using two prefetch buffers, one to prefetch code in a linear fashion and one to prefetch code according to the contents of the branch target buffer (BTB), so the required code is almost always prefetched before it is needed for execution. Note that the branch addresses are stored in the branch target buffer (BTB).
There are two instruction pipelines, the U pipe and the V pipe, which are not equivalent and interchangeable. The U pipe can execute all integer and floating-point instructions, whereas the V pipe can only execute simple integer instructions and the floating-point exchange register contents (FXCH) instructions.
The instruction decode unit decodes the prefetched instructions so that the Pentium can execute them. The control ROM includes the microcode for the Pentium processor and has direct control over both pipelines. A barrel shifter is included in the chip for fast shift operations.
11.5.1 Pentium Registers
The Pentium processor includes the same registers as the 80486. Three new system flags are added to the 32-bit EFLAGS register.
11.5.2 Pentium Addressing Modes and Instructions
The Pentium includes the same addressing modes as the 80386/80486.
The Pentium microprocessor includes three new application instructions and four new system instructions beyond those of the 80486. One of the new application instruction is the CMPXCHGSB. As an example, CMPXCHGSB reg64 or mem64 compares the 64-bit value in EDX:EAX with the 64 bit contents of reg64 or mem64. If they are equal, the 64-bit value in ECX:EBX is stored in reg64 or mem64; otherwise the content ofreg64 or mem64 is loaded into EDX:EAX.
Pentium floating-point instructions execute much faster than those of the 80486 instructions.
For example, a 66-MHz Pentium microprocessor provides about three times the floating point performance of a 66-MHz Intel 80486 DX2 microprocessor.
11.5.3 Pentium versus 80486: Basic Differences in Registers, Paging, Stack Operations, and Exceptions
Registers of the Pentium Processor versus Those of the 80486
This section discusses the basic differences between the Pentium and 80486 control, debug, and test registers.
One new control register, CR4, is included in the Pentium. CR4 contains bits that enable certain extensions to the 80486 provided in the Pentium processor. These extensions include functions for handling certain hardware error conditions.
The Pentium processor defines the type of breakpoint access by two bits in DR7 to perform breakpoint functions such as break on instruction execution only, break on data writes only, and break on data reads or writes but not instruction fetches. The implementation of test registers on the 80486 used for testing the cache has been redesigned in the Pentium processor.
Paging
The Pentium processor provides an extension to the memory management/paging functions of the 80486 to support larger page sizes.
Stack Operations
The Pentium, 80486, and 80386 microprocessors push a different value of SP on the stack for a PUSH instruction than does the 8086. The 32-bit processors push the value of the SP before it is decremented whereas the 8086 pushes the value of the SP after it is decremented.
Exceptions
The Pentium processor implements new exceptions beyond those of the 80486. For example, a machine check exception is newly defined for reporting parity errors and other hardware errors.
External hardware interrupts on the Pentium may be recognized on different instruction boundaries due to the pipelined execution of the Pentium processor and possibly an extra instruction passing through the V pipe concurrently with an instruction in the U pipe. When the two instructions complete execution, the interrupt is then serviced. Therefore, the EIP pushed onto the stack when servicing the interrupt on the Pentium processor may be different than that for the 80486 (i.e., it is serviced later). The priority of exceptions is the same on both the Pentium and 80486.
11.5.4 Pentium Input/Output
The Pentium processor handles I/O in the same way as the 80486. The Pentium can use either standard I/O or memory-mapped I/O. Standard I/O is accomplished by using IN/OUT instructions and a hardware protection mechanism. When memory-mapped I/O is used, memory-reference instructions are used for input/output and the protection mechanism is provided via segmentation or paging.
The Pentium can transfer 8, 16, or 32 bits to a device. Like memory-mapped I/O, 16- bit ports using standard I/O should be aligned to even addresses so that all 16 bits can be transferred in a single bus cycle. Like double words in memory-mapped I/O, 32-bit ports in standard I/O should be aligned to addresses that are multiples of four. The Pentium supports I/O transfer to misaligned ports, but there is a performance penalty because an extra bus cycle must be used.
The INS and OUTS instructions move blocks of data between I/O ports and memory. The INS and OUTS instructions, when used with repeat prefixes, perform block input or output operations. The string I/O instructions can operate on byte (8-bit) strings, word (16-bit) strings, or double word (32-bit) strings. When the Pentium is running in protected mode, I/O operates as in real address mode with additional protection features.
11.5.5 Applications with the Pentium
The performance of the Pentium’s floating-point unit (FPU) makes it appropriate for wide areas of numeric applications:
-
Pentium’s FPU can accept decimal operands and produce extra decimal results of up to 18 digits. This greatly simplifies accounting programming. Financial calculations that use power functions can take advantage of exponential and logarithmic functions.
-
Many minicomputer and mainframe large simulation problems can be executed by the Pentium. These applications include complex electronic circuit simulations using SPICE and simulation of mechanical systems using finite element analysis.
-
The Pentium’s FPU can move and position machine control heads with accuracy in real time. Axis positioning can efficiently be performed by the hardware trigonometric support provided by the FPU. The Pentium can therefore be used for computer numerical control (CNC) machines.
-
The pipelined instruction feature of the Pentium processor makes it an ideal candidate for DSP (digital signal processing) and related applications for computing matrix multiplications and convolutions.
-
Other possible application areas for the Pentium include robotics, navigation, data acquisition, and process control.
11.5.6 Pentium versus Pentium Pro
The Pentium was first introduced by Intel in March 1993, and the Pentium Pro was introduced in November 1995. The Pentium processor provides pipelined superscalar architecture. The Pentium processor’s pipelined implementation uses five stages to extract high throughput and the Pentium Pro utilizes 12-stage, superpipelined implementation, trading less work per pipestage for more stages. The Pentium Pro processor reduced its pipestage time by 33% compared with a Pentium processor, which means the Pentium Pro processor can have a 33% higher clock speed than a Pentium processor and still be equally easy to produce from a semiconductor manufacturing process. A 200-MHz Pentium Pro is always faster than a 200-MHz Pentium for 32-bit applications such as computer-aided design (CAD), 3-D graphics, and multimedia applications.
The Pentium processor’s superscalar architecture, with its ability to execute two instructions per clock, was difficult to exceed without a new approach. The new approach used by the Pentium Pro processor removes the constraint of linear instruction sequencing between the traditional "fetch" and "execute" phases, and opens up a wide instruction pool. This approach allows the "execute" phase of the Pentium Pro processor to have much more visibility into the program’s instruction stream so that better scheduling may take place. This allows instructions to be started in any order but always be completed in the original program order.
Microprocessor speeds have increased tremendously over the past 10 years, but the speed of the main memory devices has only increased by 60 percent. This increasing
memory latency, relative to the microprocessor speed, is a fundamental problem that the Pentium Pro is designed to solve. The Pentium Pro processor "looks ahead" into its instruction pool at subsequent instructions and will do useful work rather than be stalled. The Pentium Pro executes instructions depending on their readiness to execute and not on their original program order. In summary, it is the unique combination of improved branch prediction, choosing the best order, and executing the instructions in the preferred order that enables the Pentium Pro processor to improve program execution over the Pentium processor. This unique combination is called "dynamic execution."
The Pentium Pro does a great job running some operating systems such as Windows NT or Unix. The first release of Windows 95 contains a significant amount of 16-bit code in the graphics subsystem. This causes operations on the Pentium Pro to be serialized instead of taking advantage ofthe dynamic execution architecture. Nevertheless, the Pentium Pro is up to 30% faster than the fastest Pentium in 32-bit applications. Table
11.4 compares the basic features the Pentium with those of the Pentium Pro.
11.5.7 Pentium II/ Celeron/ Pentium II XeonTM / Pentium III / Pentium 4
The 32-bit Pentium II processor is Intel’s latest addition to the Pentium line of microprocessors, which originated form the widely cloned 80×86 line. It basically takes attributes of the Pentium Pro processor plus the capabilities of MMX technology to yield processor speeds of 333, 300, 266, and 233 MHz. The Pentium II processor uses 0.25 micron technology (this refers to the width of the circuit lines on the silicon) to allow increased core frequencies and reduce power consumption. The Pentium II processor took advantage of four new technologies to achieve its performance ratings:
-
Dual Independent Bus Architecture (DIB)
-
Dynamic Execution
-
Intel MMX Technology
-
Single-Edge-Contact Cartridge
DIB was first implemented in the Pentium Pro processor to address bandwidth limitations. The DIB architecture consists of two independent buses, an L2 cache bus and a system bus, to offer three times the bandwidth performance of single bus architecture processors. The Pentium II processor can access data from both buses simultaneously to accelerate the flow of information within the system.
Dynamic execution was also first implemented in the Pentium Pro processor.
It consists of three processing techniques to improve the efficiency of executing instructions.
These techniques include multiple branch prediction, data flow analysis, and speculative execution. Multiple branch prediction uses an algorithm to determine the next instruction to be executed following a jump in the instruction flow. With data flow analysis, the processor determines the optimum sequence for processing a program after looking at software instructions to see if they are dependent on other instructions. Speculative execution increases the rate of execution by executing instructions ahead of the program counter that are likely to be needed.
MMX (matrix math extensions) technology is Intel’s greatest enhancement to its microprocessor architecture. MMX technology is intended for efficient multimedia and communications operations. To achieve this, 57 new instructions have been added to manipulate and process video, audio, and graphical data more efficiently. These instructions support single-instruction multiple-data (SIMD) techniques, which enable one instruction to perform the same function on multiple pieces of data. Programs written using the new instructions significantly enhance the capabilities of Pentium II.
The final feature in Intel’s Pentium II processor is single-edge-contact (SEC) packaging. In this packaging arrangement, the core and L2 cache are fully enclosed in a plastic and metal cartridge. The components are surface mounted directly to a substrate inside the cartridge to enable high-frequency operation.
Intel Celeron processor utilizes Pentium II as core .The Celeron processor family includes: 333 MHz, 300A MHz, 300 MHz, and 266 MHz processors.The Celeron 266 MHz and 300 MHz processors do not contain any level 2 cache. But the Celeron 300A MHz and 333 MHz processors incorporate an integrated L2 cache. All Celeron processors are based on Intel’s 0.25 micron CMOS technology. The Celeron processor is designed for inexpensive or "Basic PC" desktop systems and can run Windows 98. The Celeron processor offers good floating-point (3D geometry calculations) and multimedia (both video and audio) performance.
The Pentium II Xeon processor contains large, fast caches to transfer data at super high speed through the processor core. The processor can run at either 400 MHz or 450 MHz. The Pentium II Xeon is designed for any mid-range or higher Intel-based server or workstation.The 450 MHz Pentium II Xeon can be used in dual-processor (two-way) workstations and servers. The 450 MHz Pentium II Xeon processor with four-way servers is expected to be available in the future.
The Pentium III operates at 450 MHz and 500 MHz. It is designed for desktop PCs. The Pentium III enhances the multimedia capabilities of the PC, including full screen video and graphics. Pentium III Xeon processors run at 500 MHz and 550 MHz. They are designed for mid-range and higher Internet-based servers and workstations. It is compatible with Pentium II Xeon processor-based platforms. Pentium III Xeon is also designed for demanding workstation applications such as 3-D visualization, digital content creation, and dynamic Internet content development. Pentium III-based systems can run applications on Microsoft Windows NT or UNIX-based environments. The Pentium III Xeon is available in a number of L2 cache versions such as 512-Kbytes, 1-Mbyte, or 2-Mbytes (500 MHz); 512 Kbytes (550 MHz) to satisfy a variety of lnternet application requirements.
The Intel Pentium 4 is an enhanced Pentium III processor. It is currently available at 1.30, 1.40, 1.50, and 1.70 GHz. The chip’s all-new internal design contains Intel Net Burst™ micro-architecture. This provides the Pentium 4 with hyper pipelined technology ( which doubles the pipeline depth to 20 stages), a rapid execution engine ( which pushes the processor’s ALUs to twice the core frequency), and 400 MHz system bus. The Pentium 4 contains 144 new instructions. Furthermore, inclusion of an improved Advanced Dynamic Execution and an improved floating point pushes data efficiently through the pipeline.
This enhances digital audio, digital video and 3D graphics. Along with other features such as streaming SIMD Extensions 2 (SSE2) that extends MMX™ technology, the Pentium 4 gives the advanced technology to get the most out of the Internet. Finally, the Pentium 4 offers high performance when networking multiple PCs, or when attaching Pentium 4 based PC to home consumer electronic systems and new peripherals.