9.4 Network Architecture: The Internet
In the early days of computing, computers were centralized facilities that contained most or all of the resources used by the populations they serviced. Data was transferred between computers via media (punched paper cards, paper tapes, magnetic tapes, and magnetic disks), hand-carried by an operator.
As the number of computers increased, and costs shifted away from hardware and more toward labor, it became economical to directly link computers so that resources could be shared. This is what networking is about. We briefly explored local area networks in the context of the traditional 7-layer ISO model. Here, we take a deeper look at architectural aspects of computer networks in the context of the Internet model.
9.4.1 THE INTERNET MODEL
In a telecommunication system there may be many sources and many destinations. An example of this form of communication is a long distance telephone network. For every telephone to be reachable from every other telephone, there must be a path, or channel, between each source and destination. If there are 107 telephones in New York City and 107 telephones in Chicago, then for everyone in one city to be able to call everyone in the other city, 107 ´ 107 = 1014 channels must exist between the cities. Fortunately, not everyone in New York City wants to talk with everyone in Chicago at the same time, and a smaller number of channels between New York City and Chicago can be shared among all telephones in those cities. On the other hand, there must be at least one line from each telephone to the telephone company’s central office, and there must be sufficient lines between central offices to handle the maximum number of simultaneously held conversations.
A small number of physical connections, on the order of a few to a few thousand depending on whether fibers or wires are used, are all that are needed to connect the cities because it is never the case that everyone in one city wants to call some- one in the other city at the same time. The information carrying capacity of the connections (called bandwidth) is shared among all of the users so that a dramatic reduction in cost is realized. A control mechanism must be created, how- ever, so that the bandwidth can be shared properly.
Layering in the TCP/IP Protocol Suite
An “internet” is a collection of interconnected networks. The “Internet” is probably the most well-known internet, using the TCP/IP protocol and IP addresses in what is known as the TCP/IP protocol suite (more on this below). The 7-layer OSI model has been simplified somewhat in the Internet, which can be thought of as having only 4 layers, as illustrated by the protocol stack shown in Figure 9-15. At the bottom of the protocol stack is the Link layer, which is made up of
the medium access control (MAC) and physical (PHY) sublayers. The Link layer resolves contention for the medium when more than one device wants to transmit, manages the logical grouping of bits into frames, and implements error protection.
The Link layer is responsible for simply getting a frame of bits from one machine to a directly connected machine. This is fine for point-to-point communication between two cooperating processes on different machines. In order for multiple processes to share the same link, however, a protocol is needed to coordinate which data goes to what process. This is the responsibility of the Network layer, which is implemented with the Internet Protocol (IP) for the Internet.
The network layer deals with hop-by-hop communication. The Transport layer deals with end-to-end communication, in which there may be a number of intervening systems between the sender and receiver. The Transport layer deals with retransmission (for errors, or packets dropped due to congestion), sequencing (packets may arrive out-of-order), flow control (applying back-pressure to the source to relieve congestion) and error protection (the Link layer does not do enough error protection on its own.) For the Internet, the Transport layer is implemented with the Transmission Control Protocol (TCP). The TCP/IP combination at the Network and Transport layers is the predominent Internet proto- col suite. Any other appropriate protocols can be used at the Link and Application layers, and there are also other protocols used within the Network and Transport layers.
At the Application layer, a process can exchange data with another process any- where on the Internet and treat the connection as if it is a file on the local system, reading and writing bytes with ordinary read and write system calls, frequently implemented by sockets, which are pathways to the network through the operating system.
Internet Addresses
Every interface on the Internet has a unique IP address. Version 4 of the IP protocol, known as IPv4, is still widely used but is gradually being replaced by IPv6 which uses addresses that are four times larger, and has several enhancements and simplifications to IPv4. An example of an IPv4 address, shown in “dotted decimal notation” is shown below:
Each number that is delimited by a dot is an unsigned byte in the range from 0 through 255. The equivalent bit pattern for the IPv4 address shown above is then:
The leftmost bits determine the class of the address. Figure 9-16 shows the five IPv4 classes. Class A has 7 bits for the network identification (ID) and 24 bits for the host ID. There can thus be at most 27 class A networks and 224 hosts on each class A network. A number of these addresses are reserved, and so the number of addresses that can be assigned to hosts is fewer than the number of possible addresses.
Class B addresses use 14 bits for the network ID and 16 bits for the host ID. Class C addresses use 21 bits for the network ID and 8 bits for the host ID. Class
D addresses are used for multicast groups, to which an end-system that has a class A, B, or C address subscribes, and thereby receives all network traffic intended for that group. This is an efficient mechanism for sending the same packets to multiple subscribers, without flooding the network with broadcasts, and without the sender needing to keep track of all of the current subscribers. Class E addresses are unused.
The available supply of IPv4 addresses will run out soon after the year 2000, and so it is important that IPv6 be widely adopted soon. Already, many networks reuse IP addresses that are simultaneously in use elsewhere (using a protocol that allows for sharing of IP addresses), and others assign IP addresses only for the duration of a session (such as for a dialup line through a modem.)
Ports
Loosely speaking, a port is how a process is known to the world. A port number identifies the source process, and a port number also identifies the destination process. Strictly speaking, the port identifies a network entry point for a process. Ports 0-1023 are well-known ports for server processes. For example, the telnet port is 23. On a Unix system, the following command:
will connect the user to system cereal.rutgers.edu. If the 23 is not present on the command line, then 23 is assumed. If 23 is replaced with another port, such as 13 for the daytime server, then a different process will be reached, with different resulting behavior.
Encapsulation
Network data is encapsulated as it passes through the network layers, as illustrated in Figure 9-17. The user data is sent to the network using similar read and
write system calls that would be used for reading and writing files. The application layer sends user data to the Transport layer, where the operating system adds a TCP header that identifies the source and destination ports, forming a TCP segment. The TCP segment is passed down to the network layer, where the TCP segment is repackaged into IP datagrams, each with an IP header identifying the source and destination systems. The IP datagrams are sent to the Link layer, where the datagrams are encapsulated into Ethernet frames (for this example). The reverse process takes place on the receiving system.
A single TCP segment may be decomposed into a number of IP datagrams, that are independently routed through the Internet. Each IP datagram contains the source and destination IP addresses (in the IP header), the source and destination ports (in the TCP header), and the protocol at the next layer of encapsulation (in the IP header – TCP is only one of the transport layer protocols used in the Internet.) Collectively, these five parameters uniquely identify each IP datagram as it traverses the Internet, which helps ensure that the datagrams arrive at the correct receiving process.
The Domain Name System
The Domain Name Systems (DNS) is a distributed database that maps between hostnames and IP addresses, and provides mail routing information. For example, cereal.rutgers.edu maps to 165.230.140.67 (and vice versa), and all three names: internet.rutgers.edu, www.internet.rutgers.edu, and mulder.rutgers.edu map to 165.230.44.67. The DNS is responsible for interacting with programs that need to map between names and addresses.
Each domain (like rutgers.edu) maintains its own database of information, and runs a server that other systems across the Internet can query. Access to the DNS is provided through a resolver which is embodied in library routines that are silently linked into high-level programs that access the network.
The Network Information Center (NIC, also known as the InterNIC) manages the top-level domains, and delegates authority for second level domains. Within a zone, a local administrator maintains the name server database. There must be a primary name server, which loads its database from a file, and secondary name servers, which get their information from the primary name server. Caching is used, so that a query that causes other servers to be contacted does not cause future queries to cause additional contacts to other servers.
The World Wide Web
The World Wide Web (or simply, the “Web”) is made up of client processes (Web browsers) and Web servers running the Hyper Text Transport Protocol (HTTP), at the Application layer of the Internet. As distinctions get blurred in everyday usage, it is important to keep in mind that the Web is built on top of the Internet – the Web is not the Internet itself.
In 1989, Tim Berners-Lee at CERN (the European high-energy physics facility) developed a text based Web, for exchanging technical documents among col- leagues. In February 1993, the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign released a graphical version of the Mosaic Web browser, as well as an HTTP server, both free of charge, and the Web exploded to where it is today.
9.4.2 BRIDGES AND ROUTERS REVISITED, AND SWITCHES
A hub is a central connection point for end systems. A hub is also known as a bridge when an end system is another hub. A hub simply copies packets from one network interface to all of the others, as illustrated in the configuration shown in Figure 9-18a. Hubs and bridges have modest intelligence these days, by
isolating collisions on single network links (that is, if two packets collide on a span of the network, which is a normal but unwanted condition, the collision signal is not propagated to the other network links), and by limiting certain types of traffic from being sent to all other interfaces.
A router connects one network to another (see Figure 9-18b), and makes decisions with respect to forwarding packets across its boundaries. A router by definition has more than one network interface and forwards packets between interfaces. The network protocols used on either side of a router can differ.
A router forwards packets based on the protocol, whereas a switch forwards packets based only on the destination address. A switch is a high speed hub with no shared bandwidth, as illustrated in Figure 9-18c. A switch eliminates media access conflicts because there is no contention for the media.
An example of a switch is discussed in Section 10.9.2, in which an external controller sets up source-to-destination paths. An enhancement is a self-routing network, that sets up source-to-destination connections on-the-fly, based on the destination addresses in the headers of packets traversing the network.
As an example, consider designing a 4-input, 4-output self-routing switch. We can accomplish this using the bubblesort algorithm, in which packets with the smallest addresses are bubbled to the top, by making pairwise exchanges starting from the top and working toward the bottom, dropping the packet with the largest address to the bottom on each pass. For n channels, there are n(n-1)/2 comparisons that need to be made. For this case, n=4, and so 4(4-1)/2 = 6 comparisons need to be made, which means that the switch needs 6 comparison boxes.
The corresponding 4-input, 4-output self-routing switch is shown in Figure 9-19. Unsorted packets enter at the left, and emerge in sorted order by destination address on the right. (See problem A-28 for the design of one of the comparison boxes.)