THE PENTIUM II, PENTIUM III, PENTIUM 4, AND CORE2 MICROPROCESSORS:THE PENTIUM 4 AND CORE2.

THE PENTIUM 4 AND CORE2

The most recent version of the Pentium Pro architecture microprocessor is the Pentium 4 micro- processor and recently the Core2 from Intel. The Pentium II, Pentium III, Pentium 4, and Core2 are all versions of the Pentium Pro architecture. The Pentium 4 was released initially in November 2000 with a speed of 1.3 GHz. It is currently available in speeds up to 3.8 GHz. Two packages are available for early versions of this integrated microprocessor, the 423-pin PGA and the 478-pin FC-PGA2. Both versions of the original issue of the Pentium 4 used the 0.18 micron technology for fabrication. The most recent versions use either the 0.13 micron technology or the 90 nm (0.09 micron) technology. Newer versions of the Pentium 4 use the LGA (leadless grid array) 775 package, which has 775 pins. Intel is currently developing a 45 nm technology for

The Pentium II, Pentium III, Pentium 4,and Core2 Microprocessors-0501

future products. As with earlier versions of the Pentium III, the Pentium 4 uses a 100 MHz memory bus speed, but because it is quad pumped, the bus speed can approach 400 MHz. More recent versions use the 133 MHz bus listed as 533 MHz because of quad pumping or 200 MHz listed as 800 MHz. Some newer versions use a 1033 MHz or 1333 MHz front side bus; another package called the LGA 771 has appeared in newer versions of the Xeon. Figure 19–5 illustrates the pin- out of the 423-pin PGA of the Pentium 4 microprocessor.

Memory Interface

The memory interface to the Pentium 4 typically uses the Intel 945, 965, or 975 chip set. These chip sets provide a dual-pipe memory bus to the microprocessor with each pipe interfaced to a 32-bit-wide section of the memory. The two pipes function together to comprise the 64-bit-wide data path to the microprocessor. Because of the dual-pipe arrangement, the memory must be populated with pairs of DDR2 memory devices operating at 600 MHz, 800 MHz, or 1033 MHz. According to Intel the DDR2 arrangement provides a 300% increase in speed over a memory populated with PC-100 memory.

The Pentium II, Pentium III, Pentium 4,and Core2 Microprocessors-0502

Intel has abandoned RDRAM in favor of DDR2 (double data rate) memory beginning with the 965 and 975 chip sets. Apparently the claim of a 300% increase in RDRAM speed failed to prove factual. In addition to the inclusion of support for DDR2, memory support for the serial ATA disk interface has also been added.

Newer chip sets such as the 945 and 965 contain the PCI Express interface and do not contain the AGP interface. The AGP interface is replaced by PCI Express for video support. IDE support remains for interface to legacy devices such as older HDD, CD-ROM, and DVD drives.

Register Set

The Pentium 4 and Core2 register set is nearly identical to all other versions of the Pentium except that the MMX registers are separate entities from the floating-point registers. In addition, eight 128-bit- wide XMM registers are added for use with the SIMD (single-instruction, multiple data) instructions as explained in Chapter 14 and the extended 128-bit packed doubled floating-point numbers.

You might think of the XMM registers as double-wide MMX registers that can hold a pair of 64-bit double-precision floating-point numbers or four single-precision floating-point numbers. Likewise they can also hold 16-byte-wide numbers as the MMX registers hold 8-byte-wide numbers. The XMM registers are double-width MMX registers.

If the new patch for MASM 6.15 is downloaded from Microsoft, programs can be assembled using both the MMX and XMM instructions. The ML.EXE program is also found in Microsoft Visual Studio.net 2003. To assemble programs that include MMX instructions, use the .MMX switch. For programs that include the SIMD instructions, use the .XMM switch. Example 19–1 illustrates a very simple program that uses the MMX instructions to add two 8- byte-wide numbers together. Notice how the .MMX switch is used to select the MMX instruction set. The MOVQ instructions transfer numbers between memory and the MMX registers. The MMX registers are numbered from MM0 to MM7. You can also use the MMX and SIMD instructions in Microsoft Visual C++ using the inline assembler if you download the latest patch from Microsoft for Visual Studio version 6.0 or use a newer version of Visual Studio. It is recommended that Visual Studio Express, which contains the patch, is used for software development.

The Pentium II, Pentium III, Pentium 4,and Core2 Microprocessors-0503

Similarly, the XMM software can be used in a program with the .XMM switch. Most modern programs use the XMM registers and the XMM instruction set to accomplish multimedia and other high-speed operations. Example 19–2 shows a short program that illustrates the use of a few XMM instructions. This program multiplies two sets of four single-precision floating-point numbers and stores the four products into the four doublewords at ANS. In order to enable access to octal words (128-byte-wide numbers), we use the OWORD PTR directive. Also notice that the FLAT model is used with the C profile. The SIMD instructions only function in protected mode so the program uses the FLAT model format. This means that the .686 and .XMM switches are both placed before the model statement.

The Pentium II, Pentium III, Pentium 4,and Core2 Microprocessors-0504

Hyper-Threading Technology

The most recent innovation and new to the Pentium is called hyper-threading technology. This significant advancement combines two microprocessors into a single package. To understand this new technology, refer to Figure 19–6, which shows a traditional dual processor system and a hyper-threaded system.

The hyper-threaded processor contains two execution units that each contain a complete set of the registers capable of running software independently or concurrently. These two separate machine contexts share a common bus interface unit. During machine operation each processor is capable of running a thread (process) independently, increasing the execution speed of an application that is written using multiple threads. The bus interface unit contains the level 2 and level 3 caches and the interface to the memory and I/O structure of the machine. When either microprocessor needs to access memory or I/O, it must share the bus interface unit.

The bus interface unit is in use to access memory, but since memory is accessed in bursts that fill caches, it is often idle. Because of this, a second processor can use this idle time to access memory while the other processor is busy executing instructions. Does the speed of the system double? Yes and no. Some threads can run independently of each other as long as they do not access the same area of memory. If each thread accesses the same area of memory, the machine can actually run slower with hyper-threaded technology. This does not occur very often, so in most cases the system performance increases with hyper-threading achieving nearly the same performance as with a dual processor system.

Eventually most machines will use hyper-threading technology, which means that more attention should be given to developing software that is multi-threaded. Each thread runs on a different processor in a system that has either dual processors or hyper-threaded processors, increasing performance. In the future the architecture may include even more processors to handle additional threads.

The Pentium II, Pentium III, Pentium 4,and Core2 Microprocessors-0505

Multiple Core Technology

Most new versions of the Pentium 4 and Core2 contain either dual or quad cores. Each core is a separate version of the microprocessor that independently executes a separate task. Three versions are currently available: the Pentium D, which contains two cores; with separate caches; a Core2 Duo version that contains a shared cache, but two cores and a quad core version, which contains four cores. Intel seems to have migrated to a shared cache for multiple core microprocessors. A recent article from Intel stated that in the future the Pentium or whatever it will be called may contain up to 80 cores. The Core2 Duo contains either a 2M or 4M byte cache and operates at frequencies to 3 GHz. It certainly appears that the speed race is over and the clock frequency has stabilized at between 3 and 4GHz. Does this mean that in the future a 5 GHz version will never become available? It is possible, but at this time a much higher clock frequency appears to be impossible, so multiple cores using threaded application seem to be the prospect for some time to come. It appears silicon technology has reached its apex. What this means is that efficient programming will become the avenue for increasing the speed of computer systems.

CPUID

As in earlier versions of the Pentium, the CPUID instruction accesses information that indicates the type of microprocessor as well as the features supported by the microprocessor. In the ever- evolving series of microprocessors it is important to be able to access this information so that efficient software can be written to operate on many different versions of the microprocessor.

Table 19–7 lists the latest features available to the CPUID instruction. To access these features, EAX is loaded with the input number listed in the table, then the CPUID instruction is executed. The CPUID instruction usually returns information in the EAX, EBX, ECX, and EDX registers in the real or protected mode. As can be gleaned from the table, additional features have been added to the CPUID instruction when compared to previous versions.

The Pentium II, Pentium III, Pentium 4,and Core2 Microprocessors-0506

In Chapter 18 software was developed to read and display the data available after the CPUID instruction was invoked with EAX = 1. Here we deal with reading the processor brand string and prepare it for display in a Visual C++ function. The brand string, if supported, contains the frequency that the microprocessor is certified to operate and also the genuine Intel keyword. The BrandString function (see Example 19–3) returns a CString that contains the information stored in the CPUID members 80000002H–80000004H. This software requires a Pentium 4 sys- tem for proper operation as tested for in BrandString function. The Convert function reads the contents of EAX, EBX, ECX, and EDX from the register specified as the parameter and converts them to a CString that is returned. The author’s system shows that the brand string is

“Intel(R) Pentium(R) 4 CPU 3.06GHz”

The Pentium II, Pentium III, Pentium 4,and Core2 Microprocessors-0507The Pentium II, Pentium III, Pentium 4,and Core2 Microprocessors-0508

The other information available about the system is returned in EAX, EBX, ECX, and EDX after executing CPUID after loading EAX with a 1. The EAX register contains the version information as the model, family, and stepping information, as illustrated in Figure 19–7. The EBX register contains information about the cache, such as the size of the cache line flushed by the CFLUSH instruction in bits 15–8 and the ID assigned the local APIC on reset in bits 31–24. Bits 23–16 indicate how many internal processors are available to hyper-threading (two for the current Pentium 4 microprocessor). Example 19–4 shows a function that identifies the number of processors in a hyper-threaded CPU and returns it as a character string. If more than nine processors are eventually added to the microprocessor, then the software in Example 19–4 would need to be modified.

The Pentium II, Pentium III, Pentium 4,and Core2 Microprocessors-0509

Feature information for the microprocessor is returned in ECX and EDX as indicated in Figures 19–8 and 19–9. Each bit is a logic 1 if the feature is present. For example, if hyper- threading is needed in an application bit, position 28 is tested in EDX to see if hyper-threading is supported. This appears in Example 19–4 along with reading the number of processors found in a hyper-threaded microprocessor. The BT instruction tests the bit indicated and places it into the carry flag. If the bit under test is a 1, then the resultant carry is one and if the bit under test is a 0, the resultant carry is zero.

The Pentium II, Pentium III, Pentium 4,and Core2 Microprocessors-0510

Model-Specific Registers

As with earlier versions of the Pentium, the Pentium 4 and Core2 also contain model-specific registers that are read with the RDMSR instruction and written with the WRMSR instruction. The Pentium 4 and Core2 each have 1743 model-specific registers numbered from 0H to 6CFH.

Intel does not provide information on all of them. The registers not identified are either reserved by Intel or used for some undocumented feature or function.

Both the read and write model-specific register instructions function in the same manner. Register ECX is loaded with the register number to be accessed, and the data are transferred through the EDX:EAX register pair as a 64-bit number where EDX is the most significant 32 bits and EAX is the least significant bits. These registers must be accessed in either the real mode (DOS) or in ring 0 of protected mode. These registers are normally accessed by the operating system and cannot be accessed in normal Visual C++ programming.

Performance-Monitoring Registers

Another feature in the Pentium 4 is a set of performance-monitoring registers (PMR) that, like the model-specific registers, can only be used in real mode or at ring 0 of protected mode. The only register that can be accessed via user software is the time-stamp counter, which is a performance- monitoring register. The remaining PMRs are accessed with the RDPMR. This instruction is sim- ilar to the RDMSR instruction in that it uses ECX to specify the register number and the result appears in EDX:EAX. There is no write instruction for the PMRs.

64-Bit Extension Technology

Intel has released its 64-bit extension technology for most members of the Intel 32-bit architecture family. The instruction set and architecture is backwards compatible to the 8086, which means that the instructions and register set have remained compatible. (The only things that are not compatible are a few of the legacy instructions and some instructions that deal with AH, BH, CH, and DH.) What is changed is that the register set is stretched to 64 bits in width in place of the current 32-bit-wide registers. Refer to Figure 19–10 for the programming model of the Pentium 4 and Core2 in 64-bit mode.

Notice that the register set now contains sixteen 64-bit-wide general-purpose registers, RAX, RBX, RCX, RDX, RSP, RBP, RDI, RSI, R8–R15. The instruction pointer is also stretched to a width of 64 bits, allowing the microprocessor to address memory using a 64-bit memory address. This allows the microprocessor to address as much memory as the specific implementation of the microprocessor has address pins.

The registers are addressed as 64-bit, 32-bit, 16-bit, or 8-bit registers. An example is R8 (64 bits), R8D (32 bits), R8W (16 bits), and R8L (8 bits). There is no way to address the high byte (as in BH) for a numbered register; only the low byte of a numbered register can be addressed. Legacy addressing such as MOV AH,AL functions correctly, but addressing a legacy high-byte register and a numbered low-byte register is not allowed. In other words, MOV AH,R9L is not allowed, but MOV AL,R9L is allowed. If the MOV AH,R9L instruction is included in a program no error will occur; instead the instruction will be changed to MOV BPL, R9L. AH, BH, CH, and DH are changed to the low-order 8 bits (the L is for low order) of BPL, SPL, DIL, and SIL, respectively. Otherwise the legacy registers can be mixed with the new numbered registers R8–R15 as in MOV R11, RAX, MOV R11D, ECX, or MOV BX, R14W.

Another addition to the architecture is a set of additional SSE registers numbered XMM8–XMM15. These registers are accessed by the SSE, SSE2, or SSE3 instructions. Otherwise, the SSE unit has not been changed. The control and debug registers are expanded to 64 bits in width. A new model-specific register is added to control the extended features at address C0000080H. Figure 19–11 depicts the extended feature control register.

SCE The system CALL enable bit is set to enable the SYSCALL and SYSRET instructions in the 64-bit mode.

LME The mode enable bit is set to allow the microprocessor to use the 64-bit extended mode.

The Pentium II, Pentium III, Pentium 4,and Core2 Microprocessors-0512

The Pentium II, Pentium III, Pentium 4,and Core2 Microprocessors-0513

The protected mode descriptor table registers are expanded in the extended 64-bit mode so that each descriptor table register, GDTR, LDTR, IDTR, and the task register (TR) hold a 64-bit base address instead of a 32-bit base address. The biggest change is that the base address and lim- its of the segment descriptors are ignored. The system uses a base address of 0000000000000000H for the code segment and the DS, ES, and SS segments are ignored.

Paging is also modified to include a paging unit that supports the translation of a 64-bit lin- ear address into a 52-bit physical address. Intel states that in the first version of this 64-bit Pentium the linear address will be 48 bits and the physical address will be 40 bits. This means that there will be a 40-bit address to support 1T (terra) byte of physical memory translated from a linear address space of 256T bytes. The 52-bit address accesses 4P (peta) bytes of memory and a 64-bit linear address accesses 16E (exa) bytes or memory. The translation is accomplished with additional tables in the paging unit. In place of two tables (a page directory and a page table), the 64-bit extended paging unit uses four levels of page tables.

Leave a comment

Your email address will not be published. Required fields are marked *