PROGRAMMING THE MICROPROCESSOR:DISK FILES

DISK FILES

Data are found stored on the disk in the form of files. The disk itself is organized in four main parts: the boot sector, the file allocation table (FAT), the root directory, and the data storage areas. The Windows NTFS (New Technology File System) contains a boot sector and a master file table (MFT). The first sector on the disk is the boot sector, which is used to load the disk operating system (DOS) from the disk into the memory when power is applied to the computer.

The FAT (or MFT) is where the names of files/subdirectories and their locations on the disk are stored by the operating system. All references to any disk file are handled through the FAT (or MFT). All other subdirectories and files are referenced through the root directory in the FAT system. The NTFS system does not have a root directory even though the file system may still appear to have a root directory. The disk files are all considered sequential access files, meaning that they are accessed a byte at a time, from the beginning of the file toward the end. Both the NTFS file system and the FAT file system are in use, with the hard disk drive on most modern Windows systems using NTFS and the floppy disk, CD-ROM, and DVD using the FAT system.

Disk Organization

Figure 8–7 illustrates the organization of sectors and tracks on the surface of the disk. This organization applies to both floppy and hard disk memory systems. The outer track is always track 0, and the inner track is 39 (double density) or 79 (high density) on floppy disks. The inner track on a hard disk is determined by the disk size, and could be 10,000 or higher for very large hard disks.

Figure 8–8 shows the organization of data on a disk. The length of the FAT is determined by the size of the disk. In the NTFS system, the length of the MFT is determined by the number

of files stored on the disk. Likewise, the length of the root directory, in a FAT volume, is deter- mined by the number of files and subdirectories located within it. The boot sector is always a single 512-byte-long sector located in the outer track at sector 0, the first sector.

The boot sector contains a bootstrap loader program that is read into RAM when the sys- tem is powered. The bootstrap loader then executes and loads the operating system into RAM. Next, the bootstrap loader passes control to the operating system program, allowing the computer to be under the control of and execute Windows, in most cases. This same sequence of events also occurs if the Linux operating system is found on the disk.

The FAT indicates which sectors are free, which are corrupted (unusable), and which contain data. The FAT table is referenced each time that the operating system writes data to the disk so that it can find a free sector. Each free cluster is indicated by 0000H in the FAT and each occupied sector is indicated by the cluster number. A cluster can be anything from one sector to any number of sectors in length. Many hard disk memory systems use four sectors per cluster, which means that the smallest file is 512 bytes × 4, or 2048 bytes long. In a system that uses NTFS, the cluster size is usually 4K bytes, which is eight sectors long.

Figure 8–9 shows the format of each directory entry in the root, or in any other directory or subdirectory. Each entry contains the name, extension, attribute, time, date, location, and length. The length of the file is stored as a 32-bit number. This means that a file can have a maximum length of 4G bytes. The location is the starting cluster number.

Windows NTFS uses a much larger directory entry or record (1,024 bytes) than that of the FAT system (32 bytes). The MFT record contains the file name, file date, attribute, and data. The data can be the entire contents of the file, or a pointer to where the data is stored on the disk called a file run. Generally files that are smaller than about 1500 bytes fit into the MFT record. Longer files fit into a file run or file runs. A file run is a series of contiguous clusters that store the file data. Figure 8–10 illustrates an MFT record in the Windows NTFS file system. The information attribute contains the create date, last modification date, create time, last modification time, and file attributes such as read-only, archive, and so forth. The security attribute stores all security information for the file for limiting access to the file in the Windows system. The header stores information about the record type, size, name (optional), and whether it is resident or not.

File Names

Files and programs are stored on a disk and referenced both by a file name and an extension to the file name. With the DOS operating system, the file name may only be from one to eight characters long. The file name contains just about any ASCII character, except for spaces or the “ . / [ ] * , : < > I ; ? = characters. In addition to the file name, the file can have an optional

one- to three-digit extension to the file name. Note that the name of a file and its extension are always separated by a period. If Windows 95 through Windows XP is in use, the file name can be of any length (up to 255 characters) and can even contain spaces. This is an improvement over the eight-character file name limitation of DOS. Also note that a Windows file can have more than one extension.

Directory and Subdirectory Names. The DOS file management system arranges the data and programs on a disk into directories and subdirectories. In Windows directories and sub- directories are called file folders. The rules that apply to file names also apply to file folder names. The disk is structured so that it contains a root directory when first formatted. The root directory or folder for a hard disk used as drive C is C:. Any other folder is placed in the root directory. For example, C:DATA is folder DATA in the root directory. Each folder placed in the root directory can also have subdirectories or subfolders. Examples are the subfolders C:DATAAREA1 and C:DATAAREA2, in which the folder DATA contains two subfolders: AREA1 and AREA2. Subfolders can also have additional subfolders. For example, C:DATAAREA2LIST depicts folder DATA, subfolder AREA, which contains a subfolder called LIST.

Sequential Access Files

All DOS files and Windows files are sequential files. A sequential file is stored and accessed from the beginning of the file toward the end, with the first byte and all bytes between it and the last accessed to read the last byte. Fortunately, files are read and written in C++ using the File class, which makes their access and manipulation easy. This section of the text describes how to

create, read, write, delete, and rename a sequential access file. To gain access to the File class, a new using must be added to the list of using statements at the top of the program. If file access is needed, add a using namespace System::IO; statement to the program.

File Creation. Before a file can be used, it must exist on the disk. A file is created by the File class using Create as an attribute that directs File to create a file. A file is created with create as illustrated in Example 8–29. Here the name of the file that is created by the program is stored in a Stringˆ called File Name. Next, the File class is used to test and see if the file already exists before creating it. Finally, in the if statement the file is created.

In this example, if the file fails to open because the disk is full or the folder is not found, a Windows message box displays Cannot create file followed by the file name, and an exit from the program occurs when OK is clicked in the message box. To try this example, create a dialog application and place the code in the Load event handler. Choose a folder name that does not exist (test should probably work) and run the application. You should see the error message. If you change the File Name so it does not include the folder, you will not get the error message.

Writing to a File. Once a file exists, it can be written to. In fact, it would be highly unusual to create a file without writing something to it. Data are written to a file one byte at a time. The FileStream class is used to write a stream of data to the file. Data are always written starting at the very first byte in a file. Example 8–30 lists a program that creates a file in the root directory called Test1.txt and stores the letter A in each of its 256 bytes. If you execute this code and look at Test1.txt with Note Pad, you will see a file filled with 256 letter As. Note that the file stream should be closed when finished using Close( ) function. Also notice in this example that an array of size byte is created using the garbage collection class in C++. It is important to use this class to create a managed array of data.

Suppose that a 32-bit integer must be written to a file. Because only bytes can be written, a method must be used to convert the four bytes of the integer into a form that can be written to a file. In C++ shifts are used to place the byte into the proper location to store in the array. Assembly language can also accomplish the same task in fewer bytes, as listed in Example 8–31. If you look at the assembly code for each method, you see that the assembly language method is much shorter and much faster. If speed and size are important, then the assembly code is by far the best choice, although in this case the code generated by C++ is fairly efficient.

Reading File Data. File data are read from the beginning of the file toward the end using the Open Read member of File. Example 8–32 shows an example that reads the file written in Example 8–30 into a buffer called buffer1. The Open Read function returns the number of bytes actually read from the file, but not used in this example. This works fine if the size of the file is known as it is here, but suppose that the length of the file is not known. The File Info class is used to find the length of a file as illustrated in Example 8–33.

An Example Binary Dump Program. One tool not available with Windows is a program that displays the contents of a file in hexadecimal code. Although this may not be used by most pro- grammers, it is used whenever software is developed so that the actual contents of a file can be viewed in hexadecimal format. Start a forms application in Windows and call it HexDump. Place a control called a Rich Textbox onto the form as illustrated in Figure 8–11. Under Properties for the Rich Textbox Control, make sure you change Locked to true and Scroll bars to Vertical. If you display a very large file, you will want to be able to scroll down through the code. Very large files take some time to load in this program.

This program uses the function (Disph) shown earlier in Example 8–24 to display the address as an eight-digit hexadecimal address and also to display the contents of the address in hexadecimal form as a two-digit number. Add Disph function to the program so it returns a String at the location addressed by char temp as the third parameter. The first two parameters contain two integers: one for the number and one for the number of digits called size, as shown in Example 8–34.

Example 8–34 shows the entire program required to perform a hexadecimal dump. Most of the program is generated by Visual C++, only the function at the top and a few at the end were entered to create the application. Note that to change the file for this program requires a change of the name of the file in the program. This can be modified by using an edit box to enter the file name, but it was not done in this example for sake of brevity. In this program 16 bytes are read at a time and formatted for display. This process continues until no bytes remain in the file. The

ASCII data that are displayed at the end of the hexadecimal listing are filtered so that any ASCII character under 32 (a space) are displayed as a period. This is important or control characters such as line feed, backspace, and the like will destroy the screen formatting of the ASCII text, and that is undesirable.

The File Pointer and Seek. When a file is opened, written, or read, the file pointer addresses the current location in the sequential file. When a file is opened, the file pointer always addresses the first byte of the file. If a file is 1024 bytes long, and a read function reads 1023 bytes, the file pointer addresses the last byte of the file, but not the end of the file.

The file pointer is a 32-bit number that addresses any byte in a file. The File Append member function is used to add new information to the end of a file. The file pointer can be moved from the start of the file or from the end of the file. Open moves the pointer to the start of the file. In practice, both are used to access different parts of the file. The FileStream member function Seek allows the file pointer to be moved to the start of a file (SeekOrigin::Begin), the end of a file (SeekOrigin::End), or the current location in the file (SeekOrigin::Current). The first number in the Seek function is the offset. If the third byte in the file is accessed, it is accessed with a Seek(2, SeekOrigin::Begin) function. (The third byte is at offset 2.) Note that the second number in the Write function is also an offset and can be used in the same manner as a Seek.

Suppose that a file exists on the disk and that you must append the file with 256 bytes of new information. When the file is opened, the file pointer addresses the first byte of the file. If you attempt to write without moving the file pointer to the end of the file, the new data will overwrite the first 256 bytes of the file. Example 8–35 shows a sequence of instructions for Appends, which adds 256 bytes of data to the end of the file, and then closes the file. This file is appended with 256 new bytes of data from area Buffer.

One of the more difficult file maneuvers is inserting new data in the middle of the file. Figure 8–12 shows how this is accomplished by creating a second file. Notice that the part of the file before the insertion point is copied into the new file. This is followed by the new information before the remainder of the file is appended after the insertion in the new file. Once the new file is complete, the old file is deleted and the new file is renamed to the old file name.

Example 8–36 shows a program that inserts new data into an old file. This program copies the Data.new file into the Data.old file at a point after the first 256 bytes of the Data.old file. The new data from buffer2 is next added to the file and then this is followed by the remainder of the old file. New File member functions are used to delete the old file and rename the new file to the old file name.

Random Access Files

Random access files are developed through software using sequential access files. A random access file is addressed by a record number rather than by going through the file searching for data. The Seek function becomes very important when random access files are created. Random access files are much easier to use for large volumes of data, which are often called databases.

Creating a Random Access File. Planning is paramount when creating a random access file system. Suppose that a random access file is required for storing the names of customers. Each customer record requires 32 bytes for the last name, 32 bytes for the first name, and one byte for the middle initial. Each customer record contains two street address lines of 64 bytes each, a city line of 32 bytes, two bytes for the state code, and nine bytes for the Zip Code. The basic customer information alone requires 236 bytes; additional information expands the record to 512 bytes. Because the business is growing, provisions are made for 5000 customers. This means that the total random access file is 2,560,000 bytes long.

Example 8–37 illustrates a short program that creates a file called CUST.FIL and inserts 5000 blank records of 512 bytes each. A blank record contains 00H in each byte. This appears be a large file, but it fits on the smallest of hard disks.

Reading and Writing a Record. Whenever a record must be read, the record number is found by using a Seek. Example 8–38 lists a function that is used to Seek to a record. This function assumes that a file has been opened as CustomerFile and that the CUST.FIL remains open at all times.

Notice how the record number is multiplied by 512 to obtain a count to move the file pointer using a Seek. In each case, the file pointer is moved from the start of the file to the desired record.