Chapter 14. Internet
Surfing’s a breeze—and I do mean riding the Web and not a wave. Just point and click, and you can visit any Web site, anywhere in the world, in microseconds (or a few minutes, if you have a dial-up line). About the only thing easier is being dead, and that’s not nearly as much fun (or so I’ve been told).
The Web is easy to use because, believe it or not, it was designed that way. Scientists working around huge particle accelerators—the kind that smash the tiniest pieces of matter into one another at nearly the speed of light to create huge research budget deficits—spent a bit of their spare time developing the idea. Basically, they decided to put a pretty face on the work of the military-educational complex that let college professors in Berkeley play Pong with researchers at Princeton.
But what’s easy for you to use isn’t necessarily easy for your computer. And it’s not just your computer. Beneath all the fun and games is a snarl of cables more tangled than a planet-size bowl of spaghetti, millions of computers, and an assortment of hardware stranger than the collection of an alien zoo (routers, switchers, multiplexers, and demultiplexers), and some things so esoteric no one knows what to call them. That easy point-and-click interface of the Web covers up an international conspiracy of computers working together to put pop-up ads atop your every click.
As with any conspiracy, getting to the bottom of the Internet requires getting the answers to a couple of questions: What does your computer know, and when does it know it? The answers tell how the complex web of the Internet really operates.
Actually, getting to the bottom of the Internet is easy. You’re already there. The computer sitting in front of you is considered the lowest of the low, a mere client to the millions of servers on the Web. At the far end is a hallowed space—actually a super-superstitious 13 of them—housing the true masterminds of the Web, the 13 root name servers, the machines that are the ultimate organizers of the Web.
In between is the Internet. But it’s nothing you can get your hands on. It’s nothing real. Although the Internet is built using hardware, it is not hardware itself. Similarly, you need hardware to connect to the Internet, but that hardware only serves as a means to access what you really want: the information that the Internet can bring to your computer. Without the right hardware, you could not connect to the Internet, but having the hardware alone won’t make a new World Wide Web.
Despite its unitary name, there is no giant Internet in the sky or in some huge office complex somewhere. In fact, the Internet is the classic case of “there is no there there,” as Gertrude Stein observed in her book Everybody’s Autobiography (impress your friends with that gem). Like an artichoke, if you slice off individual petals or pieces of the Internet, you’ll soon have a pile of pieces and no Internet anywhere, and you won’t find it among the pieces. Rather, like the artichoke, the Internet is the overall combination of the pieces.
Those pieces are tied together both physically and logically. The physical aspect is a collection of wires, optical fibers, and microwave radio links that carry digital signals between computers. The combination of connections forms a redundant network. Computers are linked to one another in a web that provides multiple signal paths between any two machines.
The logical side is a set of standards for the signals that travel through that network. The Internet uses various protocols, depending on what kind of data is being transferred. The chief protocol and the defining standard of the Internet is TCP/IP, discussed in this chapter.
To work properly, the TCP/IP system requires every computer (or device) connected to the Internet have a unique address, the IP address. Although simple in concept—the IP address is nothing more than a 32-bit binary number, at least for now—what that number means and the future of the entire addressing system are two of the most complex issues regarding the Internet.
Of course, 32-bit binary numbers probably don’t roll off your tongue, and remembering one for each Web site you want to visit sounds about as fun as an overnight study session with too much coffee and stress. Thankfully, the developers of the Web concocted the Domain Name System (DNS), which assigns somewhat more memorable names to every Web site. Making the DNS system work, however, is one of the great challenges of the Internet.
The most visible piece of the Web is content. After all, if there weren’t anything worth surfing for, you’d probably turn your attention to something else—say, surfing for real when the tsunami warnings go out. Every Web page is, surprisingly, a computer program written in unique languages understood by your Web browser.
And finally, you’ve somehow got to make a connection with the Internet. That’s what you pay for.
The place to begin is the beginning—and with the Internet, we need to go way back. Although the Web is the medium of the moment, the Internet has a long history, and locating its origins depends on how primitive an ancestor you seek.
The thread of the development of the Internet stretches all the way back to 1958, if you pull on it hard enough. The Internet’s mother—the organization that gave birth to it—was itself born in the contrail of Sputnik. In October, 1957, the USSR took the world by surprise by launching the first artificial satellite and made the U.S. suddenly seem technologically backward. In response, President Dwight D. Eisenhower launched the Advanced Research Project Agency (ARPA) as part of the Department of Defense in January, 1958.
Then, as now, ARPA’s work involved a lot of data processing, much of it at various university campuses across the country. Each computer, like the college that hosted it, was a world unto itself. To work on the computer, you had to be at the college. To share the results of the work on the computer, you needed a letter carrier with biceps built up from carrying stacks of nine-track tapes from campus to campus. Information flowed no faster between computers than did the mail.
Bob Taylor, working at ARPA in 1967, developed the idea of linking together into a redundant, packet-based network all the computers of major universities participating in the agency’s programs. In October, 1969, the first bytes crossed what was to become ARPAnet in tests linking Stanford Research Institute and the University of California at Los Angeles. By December, 1969, four nodes of the fledgling internetworking system were working.
The system began to assume its current identity with the first use of the Transmission Control Protocol in a network in July, 1977. As a demonstration, TCP was used to link together a packet radio network, SATnet, and ARPAnet. Then, in early 1978, the Transmission Control Protocol was split into a portion that broke messages into packets, reassembled them after transmission, kept order among the packets, and controlled error control, called TCP, and a second protocol that concerned itself with the routing of packets through the linkage of the network, called the Internet Protocol (IP). The two together made TCP/IP, the fundamental protocol of today’s Internet.
If the Internet actually has a birthday, it’s January 1, 1983, when ARPAnet switched over from the Network Control Protocol to TCP/IP. (By that time, ARPAnet was only one of many networks linked by TCP/IP.) To give a friendlier front end to communications with distant computer systems, Tim Berners-Lee, working at CERN in Geneva in 1990, invented the World Wide Web.
The final step in the development of today’s Internet came in 1991. In that year the National Science Foundation, which was overseeing the operation of the Internet, lifted its previous restrictions on its commercial use. The free-market free-for-all began.
If you want to go to a store, it’s useful to know where it’s at. You could cruise for hours or days looking for the store you want—and you might never find it. But once you have an address, you have an anchor. You know where to go, and you should be able to quickly figure out how to get there.
Although mindless cruising is one of the delights of the Web (after all, isn’t that what surfing really is?), when you’re more directed in your search (when you know what you’re looking for), having an address can be helpful. Even if you surf, you may want to return to the same page you’ve visited before, and knowing where to find it will shorten your journey back (but maybe make it less fun).
Just like physical stores, the Internet uses an address to anchor Web sites. In fact, not only sites but every device that connects to the Internet gets its own, unique address. The Internet Protocol itself defines the addressing scheme that’s used, so the addresses are naturally called IP addresses.
Under the Internet Protocol, an address is four bytes, a 32-bit binary number. Usually you’ll find it expressed as a series of four numbers called octets. The value of each octet is expressed in decimal notation, and the individual octets are separated by periods. Hence, an Internet address looks like this:
Every device gets its own, unique address. Do the math, and you’ll see this scheme allows for 4,294,967,296 unique devices (which is simply 2 to the 32nd power, exactly what a 32-bit binary address means).
Dividing IP addresses into octets is more than a matter of readability. By design, the IP address is structured to help make finding any given computer easier. The address has two parts: The first identifies a network connected to the Internet, and the last part identifies a specific computer attached to the network.
For example, the computer with the IP address of 192.168.132.1 actually is computer number 1 attached to the network named 192.168.132.0.
Structuring IP addresses in this way makes routing packets and messages across the Internet easier for the routers charged with the job. They only need to find the network and dump the packets on it. The network then routes the packets to the computer designated in the IP address.
You may detect the one small flaw in this addressing system—the IP address by itself does not indicate where the split between the network address and the device address occurs. Although it would be easy to define the last octet as the device address, that was too arbitrary and limiting for the folks who developed the Internet. Such a scheme would limit any network to only 256 addresses. The Internet designers preferred to permit greater versatility in configuring networks. After all, the Internet was specifically meant to allow colleges to exchange information, and most colleges now have substantially more than 256 students, each with his or her own computer tied into the network and Internet.
Instead of fixing the division between network and device addresses, the Internet designers chose to use a second number that defined the boundary. This second number is the subnet mask. You’ll encounter it nearly every time you tangle with IP addresses.
The subnet mask takes the same form as an IP address, four octets in dotted-decimal format. Unlike IP addresses, however, which allow full variation to yield more than four billion distinct numbers, only 32 different subnet masks are allowed in the IP scheme of things. Once you think about it, this number makes sense because there are only 32 places to draw the line between network and device addresses. Table 14.1 lists all the valid subnet masks.
Understanding why these particular masks are the only ones allowed and why they were chosen requires examining the IP address and subnet mask numbers in their native binary form.
Although most network administrators look at IP addresses in the dotted-decimal format, computer equipment sees them as a series of 32 ones and zeros. For example, what you see as the IP address 192.168.132.1 looks like this to your computer:
Divided into octets, this number becomes the following:
In this format, the number of the subnet mask makes more sense, at least if you look at it with an engineer’s eyes. The number allows subnet mask numbers resulting in a dotted-decimal format that is all ones on the left and zeros on the right. For example, the subnet mask expressed as 255.255.255.128 can also be represented in binary as follows:
In this form, the columns filled with the ones represent the digits of the IP address that designate the network. The columns filled with zeros are the digits of the valid computer identification numbers.
Don’t bother memorizing the table of subnet masks. If all you’re going to do is home networking, you only need to deal with one subnet mask: 255.255.255.0. You’ll find that this is Microsoft’s default when you set up TCP/IP on your system, and it is both necessary and sufficient for most home network setups.
Subnet masks move into prominence when you venture into serious networking. That is, when you move your network onto the Internet and have your own galaxy of computers linking into workgroups. To accommodate the really big kids with prodigious needs, the Internet was designed to be divvied up among governments and businesses in chunks that depended on need and, more likely, lobbying.
Some organizations need more Internet space than others. Some are able to demand more space than others. And some are able to use whatever forms of legal blackmail are available to extort more network space than others. In the days that InterNIC (the organization charged with administering the Internet at that time) assigned blocks of Internet addresses, it followed a classification scheme with five levels: Classes A through E. Although this scheme is no longer used, the addresses assigned under it remain. You can classify it as interesting Internet trivia that still creeps into our lives when we least expect it.
In any case, in each of the first three classes defined under the InterNIC scheme, the number of addresses available to an organization was defined by the subnet mask. In addition, InterNIC defined two more classes for special purposes: multicasting (sending packets to multiple computers but not all on the Internet) and experimental purposes. These classes were assigned their own ranges of special IP addresses. The five classes are as follows:
Class A— These Internet addresses use a subnet mask of 255.0.0.0. The first bit in a Class A address is always zero, so Class A addresses always start with the first octet in the range of 0 to 126, inclusive. This classification leaves seven bits to identify the network and 24 bits to identify individual devices.
Class B— These Internet addresses use a subnet mask of 255.255.0.0. The first two bits of a Class B address are a one followed by a zero, so the first octet of a Class B address always falls in the range 128 to 191, inclusive. This classification leaves 14 bits to identify the network and 16 bits to identify individual devices.
Class C— These Internet addresses use a subnet mask of 255.255.255.0. The first three bits of a Class C address are always two ones followed by a zero, so Class C addresses always fall in the range with the first octet of 192 to 223, inclusive. This classification leaves 21 bits to identify the network and eight bits to identify individual devices.
Class D— These addresses always start with binary addresses of three ones followed by a zero, which translates into a first octet in dotted-decimal notation in the range 224 to 239, inclusive. The remaining 28 bits in the address identify the group of computers for which the multicast is meant.
Class E— These addresses always start with a binary address of four ones, which translates into a first octet in dotted-decimal notation in the range 240 to 255, inclusive. As with Class D, the remaining 28 bits in the address identify the group of computers for which the multicast is meant.
Subnet masks are cumbersome in everyday use on the Internet. To help make routing messages between computer networks more efficient, Internet workers developed Classless Inter-Domain Routing (CIDR) to provide more flexibility than was possible with the subnet mask scheme. The CIDR system is now used by virtually every computer on the Internet’s backbone to route messages to their destinations.
Basically, the CIDR system distills the four-byte subnet mask into a single number appended to an IP address called a network prefix. The number in the network prefix describes the number of bits in the address that constitute the network designation part of the address, much as the subnet mask does. For example, in the CIDR network address
the first 24 bits indicate the address of a network, and the last eight bits identify an individual computer.
Although four billion is a lot of computers for a network, the Internet’s administrators see the reserve of IP addresses quickly disappearing. They fear that sometime soon the world will run out of them and no new computer can connect.
They haven’t been asleep, however. They have developed a revised version of the Internet Protocol to accommodate longer addresses to break through the 32-bit addressing limit. Called Internet Protocol Version 6 (we used version 4 today), the revision allows for IP addresses 128-bits long. The result is that IPv6 accommodates more addresses than it is convenient to write down. Every person in the world could have four billion computers, each with its own IPv6 address, and there would still be 16 sextillion times more addresses available.
The revisions of IPv6 don’t stop with longer addresses. Under IPv6, the packet header allows messages to be identified as part of a particular flow, such as a stream of audio or video. Properly identified, the packets can be routed to follow the same path to help them get reconstructed as a real-time stream. The header also includes extensions for authentication, error control, and privacy.
What Address to Use
When setting up a home or small office network, at some time or another you will be confronted with the choice of IP addresses to use—it’s one of those unwelcome choices that is given every network administrator. As far as I can tell, no readily available source even hints at what IP address you should use. But the choice is critical, and the people who govern such things in the Internet publish the addresses you should use.
In fact, the Internet Assigned Numbers Authority (IANA, which you can find on the Web at www.iana.org) reserves three blocks of IP addresses for use by private networks—that is, those that don’t intend on directly connecting to the Web. Because it’s likely you will connect only through a gateway at your ISP, your home network falls into the private network class, and these reserved addresses are the ones you should choose from. Table 14.2 lists the addresses IANA reserves for private networks.
Clearly, any of these three ranges will have more than enough room for any conceivable home network. Microsoft uses the last of these, the range starting at 192.168.0.0, for the private networks it automatically sets up for home use.
Certainly you’re not limited to these values for your own private network. You could simply create your own IP address. If you’re not too clever for you own good, you may get away with it. Coming up with a valid address is not difficult—but it’s not a good idea either. Internet addresses are assigned, and the Internet governing bodies go to lengths to be sure there’s no conflicts.
Addresses You Cannot Use
The IP naming rules dictate that you cannot use certain addresses for computers on a network. These fall at the two ends of the number range in the fourth octet. That is, addresses ending in zero, such as 192.168.155.0, and those ending in 255, such as 192.168.154.255, cannot be used as addresses for computers or other devices connected to a network. These addresses have a specifically defined meaning in the IP system.
Addresses ending in a zero refer to the network itself rather than any specific computer or device connected to it. Addresses ending in 255 are used to broadcast messages to all devices in the network, so all devices in a network will receive packets with the network address and a 255 at its end.
If you find IP addresses confusing, you’re not the only one. Keeping track of all the addresses used by a network can be confusing, indeed. What you really need is someone who excels at organization, who rigidly assigns addresses and keeps track of every detail as if he were a machine. In fact, a machine such as a computer would be a good choice to take over the job.
Using the Dynamic Host Configuration Protocol (DHCP), you can move the responsibility for assigning and organizing the IP addresses your network uses to one of its servers.
DHCP is an automatic method for assigning addresses to devices. When a device wants to join the network, it queries the DHCP server, and the server sends back a unique IP address for the device.
Not just any address will do. In the Microsoft scheme of things, the addresses assigned by a server are drawn from within a scope, a range of no more than 255 contiguous addresses. All the devices in a workgroup must be within the same scope, although a network may have many intercommunicating scopes.
Setting up a DHCP server usually is more work than most normal people want to do. That’s why it’s usually left to network administrators. But if you buy an inexpensive router in order to share a high-speed Internet connection, odds are it has a DHCP server built in to it. When you log in to your network in preparation for sharing your connection, the DHCP server automatically sends your computer its own unique IP address so it can join the network.
Note that you should have only one DHCP server in a network. If you install a dedicated server to act as your DHCP server and you install a router for Internet sharing, the two DHCP servers may come into conflict, possibly preventing your network from operating, or just preventing some computers from seeing others on the network. To avoid problems, make sure you have only one DHCP server.
Domain Name System
Although DHCP does a good job of hiding your own computer’s IP address from you, it does nothing to make IP addresses manageable on the Web. So that you don’t have to type into your browser the IP address of each computer you want to visit on the Web (you can if you want, by the way), the Web’s creators developed the Domain Name System (DNS).
From your perspective, DNS works by assigning a structured name to every Web site, the familiar whatever-dot-com you use every day. In the language of the Internet, that dot-com is called the domain name of the Web site.
Most domain names take the form of a word, a period, and another few letters. Those letters after the period comprise the top-level domain, the primary organizing structure of the Web. You can’t use just anything as a top-level domain. The organization charged with administering Web names, the Internet Corporation for Assigned Names and Numbers (ICANN), maintains tight control of top-level domains.
There are two kinds of top-level domains: organizational and national. Originally, only six organizational top-level domains were allowed, but on November 16, 2000, ICANN added seven more. Table 14.3 lists the currently recognized top-level domains.
In addition, each nation in the world is also given its own top-level domain, a two-character country code. These are listed in Table 14.4.
To the left of the period is the name of the actual domain assigned the Web site through a registry. If there is more than one period in the name, the leftmost portion is a subdomain of the next domain name to the right. The DNS system allows for multiple subdomains. Each subdomain (or domain, if there are no subdomains) specifies an actual server on the network.
To the right of the top-level domain is the directory path to a particular file on the designated server containing a Web page or other data. In Internet lingo, this composite construction of the domain name and directory path is called a Uniform Resource Locator, because the name itself holds all the information computers on the Web need in order to find a particular page or file.
The best view of the Internet comes with following a packet sent from your computer. When you log in to a Web site, you actually send a command to a distant server telling it to download a page of data to your computer. Your Web browser packages that command into a packet labeled with the address of the server storing the page that you want. Your computer sends the packet to your modem (or terminal adapter), which transmits it across your telephone or other connection to your Internet Service Provider (ISP).
How DNS Works
The instant you press your mouse button with the cursor pointing at a particularly juicy image on your monitor, your computer drops everything to obey your command. The mouse sends your microprocessor an interrupt to make it pay immediate attention. The mouse driver checks to see whether a signal has come in saying you’ve pressed the mouse button. The driver passes this vital information to your operating system, which takes a peek into its private memory to see what location value it has stored for the location of your mouse’s cursor on the screen. The operating system then checks to see whether the mouse location corresponds to a hotspot on the Web page you’re viewing, indicating to your browser that there is a hyperlink instruction to send you to a new location on the Web. When there is, the fun on the Web begins.
Your browser has no idea where to find the page you want on the Web. All it has is a name—it’s sort of like finding an isolated name on a message pad when you awake from a drunken stupor. You recognize it as a name, but you don’t know why you wrote down the name or how to get in touch with the person to find out.
You might start with the white pages of your telephone book, but your computer can’t even open a book, let alone look something up. The only thing it can do is send out an electrical signal down the connection with the Internet. It doesn’t send out just any signal. It sends out the name of the Web site you’re looking for in a special data packet. Rather than the white pages, the name goes to a resolver.
A resolver is not a special machine. Rather, the term resolver defines a particular function of a special kind of server on the network, called a name server. The resolver does what its name says—it tries to resolve the address of a site on the Web. It looks at the name your computer has sent to it in a standard form known as a Uniform Resource Locator (URL).
Like a mailman sorting mail and looking at the bottom of the address first (for the ZIP Code and state), the resolver examines the last part of the URL first—the part of the name to the right of the rightmost period—the familiar .com, .org, or .edu. This portion of the name is the top-level domain, which tells the resolver how to find the location of the Web site. In Internet terms, the top-level domain is about as general as describing a creature as being in the animal kingdom.
The top-level domain doesn’t tell the resolver where to look for the Web site. Rather, it tells the resolver where to look for a list of site names in the top-level domain. Resolvers aren’t stupid. If they’ve looked up a top-level domain before, they probably already know where to look. If not, however, they call on one of the 13 root name servers to tell them which servers store the information about each top-level domain.
The root name server, or, more likely, the resolver, passes your request for the Web site to one of the name servers assigned to the top-level domain of the URL you’re looking for. Hundreds of thousands of servers may track this information. It’s kept in multiple copies for speed and reliability. Speed because one server is not burdened with finding every requested URL in its domain, and reliability because if one server becomes unavailable, there are hundreds of others that can take its place.
This server matches your requested URL with the domain name server (also abbreviated DNS) that handles the Web site. The server sends the requested URL to the IP address—a block of four bytes of binary code—of the DNS. The DNS knows all the names of the Web sites it serves. It passes the IP address of the Web site you want back to your computer so it can use this address to find the page you want.
When your computer signals to the IP address, it sends a request for the page listed in the hotspot you clicked. The server at the Web site diligently finds the page and passes it back to your computer. Your operating systems passes it to your browser, which formats the page for the screen and passes it back to your operating system, which sends it, in turn, to your display driver and then your monitor screen.
All these requests travel from server to server with light speed, so everything happens fast. Your computer should know the IP address of the page you want and start loading the page in a fraction of a second. Meanwhile, you’ve probably become impatient and clicked on something else, starting the whole process over again.
Root Name Servers
The part of the Web in charge of identifying each site and getting its address to you is called the Root Name Server System. The master plan that makes it work is the DNS protocol, which describes the packets that need to be exchanged and provides the roadmap for them to follow. The root name servers hold the key to locating the indexes containing the IP addresses you need. The root zone file is the index itself.
The root name servers are arguably the most important computers on the World Wide Web. Only they store the official records of the locations of the registries for each top-level domain, the rightmost part of each Web address. Because this information is so vital to the operation of the Web, it’s stored not in triplicate but in 13 duplicate copies in separate computers spread across the world.
Well, not quite. The 13 root name servers actually represent only six distinct geographic locations. Six are clustered around Washington, D.C., two are co-located (and co-operated) in Marina Del Ray, California, two are in the Silicon Valley area, one is in Japan, one is in the U.K., and one is in Sweden. Table 14.5 lists the 13 root name servers.
There’s another piece to the IP naming system: the names your computers wear when you access them over your own network using the networking capabilities of Windows. You assign these names using the Windows Internet Name System (WINS). This system is responsible for converting the names you assign to IP addresses for routing messages through networks using the Internet Protocol. In effect, WINS works like DNS at the local level.
In the WINS system, you assign your computer a name when you set up networking on that machine. The computer then sends its name to the server, and the server stores the names you assign in a database, which the server references to resolve requests for IP addresses.
The “wheels” of the Internet is the Transport Control Protocol/Internet Protocol, or as it is more commonly known, TCP/IP.
TCP/IP sees everything in terms of packets. Instead of moving data in a long stream like unraveling a roll of movie film, the protocol breaks it into pieces. Having a bunch of short chunks automatically ensures that there will be breaks in the flow of data during which other computers can negotiate for time to send their own packets. At the distant end of the connection, the packets get reassembled to put the data back into its original form.
Each has a predefined structure, with a header that contains address, routing, and control information and a payload of data. The payload moves through the network intact and unexamined, so its content is irrelevant to the network. Packets might contain anything, from program code for Unix computers, to bits of video images, to cream cheese on celery (if you could fit that into computer data).
Moving packets around the Internet is a lot like modern psychotherapy. It is nondirected. The packets simply ramble around until they happen upon the place where they are going. They may follow any one of a near infinite number of paths between the two communicating computers. The Internet imposes no fixed structure. That’s one of its greatest strengths—because messages don’t have an assigned path, an interrupt of any path won’t stop data from flowing. It simply finds another path to its destination.
Of course, having packets floating all over the Internet is not the most efficient way of moving information. Consequently, servers build tables of paths to send the messages along, routing them according to IP address. When a message first goes to an IP address, an Internet server checks the path it followed and can reuse the same path (or start the packet along its way using the same path) for subsequent packets.
The wonderful thing about TCP/IP is that any computer system can use it. Microprocessor types, architectures, and programming languages mean nothing to TCP/IP. Think of TCP/IP packets as being the shipping containers of the Internet. A computer at one end of the connection fills a packet up, and another machine at the far end of the connection empties it out. It doesn’t matter to the freight line what’s inside (although customs may take a peek if it crosses borders—the Internet’s equivalent of customs is the firewall). Using TCP/IP, the Internet will carry anything. It doesn’t matter whether it’s useful or even compatible with the recipient system. That’s a matter left to the two communicating systems. The network doesn’t care.
The World Wide Web is the most visually complicated and compelling aspect of the Internet. Despite its appearances, however, the Web is nothing more than another file transfer protocol. When you call up a page from the Web, the remote server simply downloads a file to your computer. Your Web browser then decodes the page, executing commands embedded in it to alter the typeface and to display images at the appropriate place. Most browsers cache several file pages (or even megabytes of them) so that when you step back, you need not wait for the same page to download once again.
The commands for displaying text use their own language, called the Hypertext Markup Language (HTML). As exotic and daunting as HTML sounds, it’s nothing more than a coding system that combines formatting information in textual form with the readable text of a document. Your browser reads the formatting commands, which are set off by a special prefix so that the browser knows they are commands, and organizes the text in accordance with them, arranging it on the page, selecting the appropriate font and emphasis, and intermixing graphical elements. Writing in HTML is only a matter of knowing the right codes and where to put them. Web authoring tools embed the proper commands using menu-driven interfaces so that you don’t have to do the memorization.
In truth, the Internet was not designed to link computers but rather to tie together computer networks. As its name implies, the Internet allows data to flow between networks. Even if you only have a single computer, when you connect with the Internet, you must run a network protocol the same as if you had slung miles of Ethernet cable through your home and office. Whether you like it or not, you end up tangled in the web of networking when you connect to the Internet.
The ISP actually operates as a message forwarder. At the ISP, your message gets combined with those from other computers and sent through a higher-speed connection (at least you should hope it is a high-speed connection) to yet another concentrator that eventually sends your packet to one of five regional centers (located in New York, Chicago, San Francisco, Los Angeles, and Maryland). There, the major Internet carriers exchange signals, routing the packets from your modem to the carrier that will haul them to their destination based on their Internet address.
Okay, so your Internet access through your modem or digital connection isn’t as fast as you’d like. Welcome to the club. As the Duchess of Windsor never said, “You can never be too rich or thin or have an Interconnection that’s fast enough.” Everyone would like Web pages to download instantly. Barring that, they’d like them to load in a few seconds. Barring that, they’d just like them to load before the next Ice Age.
The most tempting way to increase your Internet speed is to update your modem—move from dial-up to a broadband service such as DSL, cable, or satellite. Once you do, you may discover the dirty secret of the Internet: You’re working on the wrong bottleneck. You may have a high-speed connection, but the server you want to download pages from may be someone’s ten-year-old computer hogtied by a similar-vintage 9600bps modem. Or a server with heady-duty equipment may be overwhelmed by more requests than it can handle. Remember, your packets may get slowed anywhere along their way through the Web.
You can easily check your Internet bottleneck and see what you can do about it. Pick a large file and download it at your normal online time. Then, pry yourself out of bed early and try downloading the same file at 6 a.m. EST or earlier when Internet traffic is likely to be low. If you notice an appreciable difference in response and download times, a faster modem won’t likely make your online sessions substantially speedier. The constraints aren’t in your computer but in the server and network itself.
Another way to check is with one of the many services designed for checking DSL speed. To find one, simply perform a Web search for “DSL speed test.” One choice is www.dslreports.com.
As originally conceived, the Internet is not just a means for moving messages between computers. It was designed as a link between computer systems that allowed scientists to share machines. One researcher in Boston could, for example, run programs on a computer system in San Francisco. Commands for computer systems move across wires just as easily as words and images. To the computer and the Internet, they are all just data.
Much of the expense businesses put into connecting to the Internet involves undoing the work of the original Internet creators. The first thing they install is a firewall, which blocks outsiders from taking control of the business’s internal computer network. They must remain constantly vigilant that some creative soul doesn’t discover yet another flaw in the security systems built into the Internet itself.
Can someone break into your computer through the Internet? It’s certainly possible. Truth be told, however, rummaging through someone’s computer is about as interesting as burrowing into his sock drawer. Moreover, the number of computers out there makes it statistically unlikely any given errant James Bond will commandeer your computer, particularly when there’s stuff much more interesting (and challenging to break into) such as the networks of multibillion dollar companies, colleges, government agencies, and the military.
The one weakness to this argument is that it assumes whoever would break into your computer uses a degree of intelligence. Even a dull, uninteresting computer loaded with naught but a two-edition-old copy of Office can be the target of the computer terrorist. Generally, someone whose thinking process got stalled on issues of morality, the computer terrorist doesn’t target you as much as the rest of the world that causes him so much frustration or boredom. His digital equivalent of a bomb is the computer virus.
A computer virus is program code added to your computer without your permission. The name, as a metaphor for human disease, is apt. As with a human virus, a computer virus cannot reproduce by itself—it takes command of your computer and uses its resources to duplicate itself. Computer viruses are contagious in that they can be passed along from one machine to another. And computer viruses vary in their effects, from deadly (wiping out the entire contents of your hard disk) to trivial (posting a message on your screen). But computer viruses are nothing more than digital code, and they are machine specific. Neither you nor your toaster nor your PDA can catch a computer virus from your computer.
Most computer viruses latch onto your computer and lie in wait. When a specific event occurs—for example, a key date—they swing into action, performing whatever dreadful act their designers got a chuckle from. To continue infecting other computers, they also clone themselves and copy themselves to whatever disks you use in your computer. In general, viruses add their code to another program in your computer. They can’t do anything until the program they attach themselves to begins running. Virus writers like to attach their viruses to parts of the operating system so that the code will load every time you run your computer. Because antivirus programs and operating system now readily detect such viruses, the virus terrorists have developed other tactics. One of the latest is the macro-virus, which runs as a macro to a program. In effect, the virus is written in a higher-level language that escapes detection by the antivirus software.
Viruses get into your computer because you let them. They come through any connection your computer has with the outside world, including floppy disks and going online. Browsing Web pages ordinarily won’t put you at risk because HTTP doesn’t pass along executable programs. Plug-ins may, however. Whenever you download a file, you run a risk of bringing a virus with it. Software and drivers that you download are the most likely carriers. Most Webmasters do their best to ensure that they don’t pass along viruses. However, you should always be wary when you download a program from a less reputable site. The same warning applies to e-mail from unknown senders.
There is no such thing as a sub-band or sub-carrier virus that sneaks into your computer through a “sub-band” of your modem’s transmissions. Even were it possible to fiddle with the operation of a modem and add a new, invisible modulation to it, the information encoded on it could never get to your computer. Every byte from an analog modem must go through the UART in the modem or serial port, then be read by your computer’s microprocessor. The modem has no facility to link a sideband signal (even if there were such a thing) to that data stream.