■ SUMMARY
Memory is organized in a hierarchy in which the densest memory offers the least performance, whereas the greatest performance is realized with the memory that has the least density. In order to bridge the gap between the two, the principle of locality is exploited in cache and virtual memories.
A cache memory maintains the most frequently used blocks in a small, fast memory that is local to the CPU. A paged virtual memory augments a main memory with disk storage. The physical memory serves as a window on the paged virtual memory, which is maintained in its entirety on a hard magnetic disk.
Cache and paged virtual memories are commonly used on the same computer, but for different reasons. A cache memory improves the average access time to the main memory, whereas a paged virtual memory extends the size of the main memory.
In an example memory architecture, the Intel Pentium has a split L1 cache and a TLB that reside on the Pentium core, and a unified L2 cache that resides in the same package as the Pentium although on a different silicon die. When paging is implemented, the page table is located in main memory, with the TLB reducing the number of times that the TLB is referenced. The L1 and L2 cache memories then reduce the number of times main memory is accessed for data. The Pentium also supports segmentation, which has its own set of mapping registers and control hardware that resides on the Pentium.
■ FURTHER READING
(Stallings, 1993) and (Mano, 1991) give readable explanations of RAM. A number of memory databooks (Micron, 1992) and (Texas Instruments, 1991) give practical examples of memory organization. (Foster, 1976) is the seminal reference on CAM. (Mead and Conway, 1980) describe the H-tree structure in the context of VLSI design. (Franklin et al, 1982) explores issues in partitioning chips, which arise in splitting an H-tree for a CAM. (Sedra and Smith, 1997) discuss the implementation of several kinds of static and dynamic RAM.
(Hamacher et al, 1990) gives a classic treatment of cache memory. (Tanenbaum, 1990) gives a readable explanation of virtual memory. (Hennessy and Patterson, 1995) and (Przybylski, 1990) cover issues relating to cache performance. Seg- mentation on the Intel Pentium processor is covered in (Intel, 1993). Kingston Technology gives a broad tutorial on memory technologies at http://www.king- ston.com/king/mg0.htm.
Foster, C. C., Content Addressable Parallel Processors, Van Nostrand Reinhold Company, (1976).
Franklin, M. A., D. F. Wann, and W. J. Thomas, “Pin Limitations and Partitioning of VLSI Interconnection Networks,” IEEE Trans Comp., C-31, 1109, (Nov. 1982).
Hamacher, V. C., Z. G. Vranesic, and S. G. Zaky, Computer Organization, 3/e, McGraw Hill, (1990).
Hennessy, J. L. and D. A. Patterson, Computer Architecture: A Quantitative Approach, 2/e, Morgan Kaufmann Publishers, San Mateo, California, (1995).
Intel Corporation, Pentium Processor User’s Manual, Volume 3: Architecture and Programming Manual, (1993).
Knuth, D., The Art of Computer Programming: Fundamental Algorithms, vol. 1, 2/e, Addison-Wesley, (1974).
Mano, M., Digital Design, 2/e, Prentice Hall, (1991).
Mead, C. and L. Conway, Introduction to VLSI Systems, Addison Wesley, (1980). Micron, DRAM Data Book, Micron Technologies, Inc., 2805 East Columbia Road, Boise, Idaho, (1992).
Przybylski, S. A., Cache and Memory Hierarchy Design, Morgan Kaufmann Publishers, (1990).
Sedra, A., and Smith, K., Microelectronic Circuits, 4/e, Oxford University Press, New York, (1997).
Stallings, W., Computer Organization and Architecture, 3/e, MacMillan Publishing, New York, (1993).
Tanenbaum, A., Structured Computer Organization, 3/e, Prentice Hall, Englewood Cliffs, New Jersey, (1990).
Texas Instruments, MOS Memory: Commercial and Military Specifications Data Book, Texas Instruments, Literature Response Center, P.O. Box 172228, Denver, Colorado, (1991).
■ PROBLEMS
7.1 A ROM lookup table and two D flip-flops implement a state machine as shown in the diagram below. Construct a state table that describes the machine.
7.2 Fill in four memory locations for the lookup table shown in Figure 7-11 in which each of the four operations: add, subtract, multiply, and divide are performed on A=16 and B=4. Show the address and the value for each case.
7.3 Design an eight-word, 32-bit RAM using 8´8 RAMs.
7.4 Design a 16-word, four-bit RAM using 4´4 RAMs and a single external decoder.
7.5 Given a number of n-word by p-bit RAM chips:
(a) Show how to construct an n-word ´ 4p-bit RAM using these chips. Use any other logic components that you have seen in the text that you feel are needed.
(b) Show how to construct a 4n-word ´ p-bit RAM using these chips.
7.6 Draw the circuit for a 4-to-16 tree decoder, using a maximum fan in and fan-out of two.
7.7 A direct mapped cache consists of 128 slots. Main memory contains 16K blocks of 16 words each. Access time of the cache is 10 ns, and the time required to fill a cache slot is 200 ns. Load-through is not used; that is, when an accessed word is not found in the cache, the entire block is brought into the cache, and the word is then accessed through the cache. Initially, the cache is empty. Note: When referring to memory, 1K = 1024.
(a) Show the format of the memory address.
(b) Compute the hit ratio for a program that loops 10 times from locations 15 – 200. Note that although the memory is accessed twice during a miss (once for the miss, and once again to satisfy the reference), a hit does not occur for this case. To a running program, only a single memory reference is observed.
(c) Compute the effective access time for this program.
7.8 A fully associative mapped cache has 16 blocks, with eight words per block. The size of main memory is 216 words, and the cache is initially empty. Access time of the cache is 40 ns, and the time required to transfer eight words between main memory and the cache is 1 µs.
(a) Compute the sizes of the tag and word fields.
(b) Compute the hit ratio for a program that executes from 20–45, then loops four times from 28–45 before halting. Assume that when there is a miss, that the entire cache slot is filled in 1 µs, and that the first word is not seen by the CPU until the entire slot is filled. That is, assume load-through is not used. Initially, the cache is empty.
(c) Compute the effective access time for the program described in part (b) above.
7.9 Compute the total number of bits of storage needed for the associative mapped cache shown in Figure 7-13 and the direct mapped cache shown in Figure 7-14. Include Valid, Dirty, and Tag bits in your count. Assume that the word size is eight bits.
7.10 (a) How far apart do memory references need to be spaced to cause a miss on every cache access using the direct mapping parameters shown in Figure 7-14?
(b) Using your solution for part (a) above, compute the hit ratio and effective access time for that program with TMiss = 1000 ns, and THit = 10 ns. Assume that load-through is used.
7.11 A computer has 16 pages of virtual address space but only four physical page frames. Initially the physical memory is empty. A program references the virtual pages in the order: 0 2 4 5 2 4 3 11 2 10.
(a) Which references cause a page fault with the LRU page replacement pol- icy?
(b) Which references cause a page fault with the FIFO page replacement pol- icy?
7.12 On some computers, the page table is stored in memory. What would happen if the page table is swapped out to disk? Since the page table is used for every memory reference, is there a page replacement policy that guarantees that the page table will not get swapped out? Assume that the page table is small enough to fit into a single page (although usually it is not).
7.13 A virtual memory system has a page size of 1024 words, eight virtual pages, four physical page frames, and uses the LRU page replacement policy. The page table is as follows:
(a) What is the main memory address for virtual address 4096?
(b) What is the main memory address for virtual address 1024?
(c) A fault occurs on page 0. Which page frame will be used for virtual page 0?
7.14 When running a particular program with N memory accesses, a computer with a cache and paged virtual memory generates a total of M cache misses and F page faults. T1 is the time for a cache hit; T2 is the time for a main memory hit; and T3 is the time to load a page into main memory from the disk.
(a) What is the cache hit ratio?
(b) What is the main memory hit ratio? That is, what percentage of main memory accesses do not generate a page fault?
(c) What is the overall effective access time for the system?
7.15 A computer contains both cache and paged virtual memories. The cache can hold either physical or virtual addresses, but not both. What are the issues involved in choosing between caching virtual or physical addresses? How can these problems be solved by using a single unit that manages all memory map- ping functions?
7.16 How much storage is needed for the page table for a virtual memory that has 232 bytes, with 212 bytes per page, and 8 bytes per page table entry?
7.17 Compute the gate input count for the decoder(s) of a 64 ´ 1-bit RAM for both the 2D and the 2-1/2D cases. Assume that an unlimited fan-in/fan-out is allowed. For both cases, use ordinary two-level decoders. For the 2 1/2D case, treat the column decoder as an ordinary MUX. That is, ignore its behavior as a DEMUX during a write operation.
7.18 How many levels of decoding are needed for a 220 word 2D memory if a fan-in of four and a fan-out of four are used in the decoder tree?
7.19 A video game cartridge needs to store 220 bytes in a ROM.
(a) If a 2D organization is used, how many leaves will be at the deepest level of the decoder tree?
(b) How many leaves will there be at the deepest level of the decoder tree for a 2-1/2D organization?
7.20 The contents of a CAM are shown below. Which set of words will respond if a key of 00A00020 is used on fields 1 and 3? Fields 1 and 3 of the key must match the corresponding fields of a CAM word in order for that word to respond. The remaining fields are ignored during the matching process but are included in the retrieved words.
7.21 When the TLB shown in Figure 7-27 has a miss, it accesses the page table to resolve the reference. How many entries are in that page table?