Skip to main content

     
  TPF : Library : TPF Newsletters
 

Separating Contenders from Pretenders


Mark Gambino, IBM TPF Development

The TCP/IP native stack road map for TPF 4.1 was presented at the Fall 2003 TPF Users Group meeting. A copy of this presentation is available at http://www.ibm.com/software/htp/tpf/tpfug/tgf03/tgf03.htm

Each Sold Separately and Some Assembly Required

The purpose of this presentation was twofold: First, to organize the more than 60 native stack enhancements into logical categories such as availability, performance, security, and so on. Second, to point out that all TCP/IP stacks are not the same and that many quality of service (QoS) features that you expect and require are optional features in the architecture or are purely roll your own (RYO). This second point is expanded on in this article to help explain how it differentiates itself from other servers and TCP/IP stacks. Besides implementing the base requirements of the Internet Protocol (IP) and Transmission Control Protocol (TCP) architectures, TPF supports several optional features of the IP and TCP architectures that have been developed in recent years along with numerous features that are unique to the TPF platform.

 

A Long Time Ago In a Galaxy Far, Far Away...

Request for Comments (RFC) documents 791 and 793 define the IP and TCP architectures, respectively. These documents define the bits and bytes of packets that flow between nodes and the sequence of events for starting and ending a given session. Both RFCs came out in 1981. What were considered large buffers and high-speed networks in 1981 are orders of magnitude smaller than what they are today; therefore, modern high-end servers need to take into consideration the much higher throughput requirements and how that affects the various TCP/IP timers and algorithms. The base IP architecture was designed when a given application ran on only one server instance, the physical network interface on the server was a single point of failure, all packets in the network (destined for different servers, applications, or both) flowed at the same priority, and security was limited to physical connectivity built around private trusted networks. Oh, how times have changed!

 

What's Mine Is Mine

Today, most applications run on multiple server instances, generically called server farms or clusters. TPF has had this capability for decades with its loosely coupled feature. Some servers are dedicated to a single application type (like Web servers, file servers, or mail servers) while others, including most TPF systems, run a variety of different applications on the same server. In a distributed transaction environment, multiple heterogeneous servers are involved in the processing of a given transaction. For example, an end user sends a request message into server X causing server X to send an authorization or availability query to server Y. Once server X receives the response to its query, it sends a reply message to the end user. The TCP/IP stack of most systems implements process-scoped sockets. This means that a given socket (TCP/IP session) is tied to a given process. The process that creates a socket is the only process that can use that socket and, if the process ends for any reason, the socket is cleaned up. In the previous example, if server X implements process-scoped sockets, the process must remain active while waiting for the answer from server Y. If server Y takes, on average, 1 second to respond and the requirement is for server X to handle 500 messages per second, that means there would need to be 500 active processes running on server X. As the message rate increases or the response time of server Y increases, the scalability concerns of this design become more and more evident because having that many active processes on server X is not possible.

 

We Share Because We Care

One of the fundamental design points of TPF TCP/IP native stack is kernel-scoped sockets, where the system owns all sockets rather than a socket being tied to a process (where an ECB represents a process in TPF). Part of this design includes a TPF-unique capability called activate_on_receipt (AOR). Let's look at the distributed transaction example now.

ECB 1 in TPF (server X) receives the request from the end user over socket 1. After ECB 1 sends the query to server Y over socket 2, ECB 1 uses the AOR function to tell TPF that when the response from Y comes in on socket 2, create a new ECB (ECB 2) and pass the response to the specified application program in ECB 2. ECB 1 can exit after issuing AOR. This means that no ECBs (processes) are tied up (active) while waiting for data to arrive from server Y. When ECB 2 is created, it will send the reply message to the end user over socket 2 and then exit. Taking advantage of the kernel-scoped sockets and AOR capabilities of TPF, you can scale up to tens of thousands of messages per second on a single server image. ECB 1 received the request message from the end user over the socket, but ECB 2 sent the reply message. This is one example of shared sockets where the same socket can be used for multiple ECBs (processes). Shared sockets is a very powerful feature because, for example, you could create a single socket that is used to log data to a remote system and have all ECBs send data on that same socket, either over sockets directly or using a higher level messaging protocol like MQSeries.

 

Mobile Homes Coming to a Network Near You

IP is connectionless oriented. Only the two nodes that are the endpoints of the sockets have knowledge of individual sockets. An IP packet contains the destination address, but does not specify what path to take to reach that destination. IP routers keep track of the available paths and select the path that a given packet will take. If a router in the middle of the network fails, IP will reroute traffic through another path (assuming alternate paths exist), enabling sockets to survive the failure of a network component. If a server has multiple network interfaces and one of those interfaces fails, do you want your sockets to fail? Of course not. If sockets are tied to a real IP address in the server, those sockets are tied to the physical network interface associated with that real IP address. In other words, if that network adapter fails, the sockets fail as well. Virtual IP addresses (VIPAs) were created to enable sockets to survive the failure of a network adapter on a server. VIPAs accomplish this by allowing a VIPA to be moved from one network adapter to another adapter on that server. High-availability servers, like TPF, support VIPAs. TPF has extended the concept of VIPA with movable VIPAs, which allow a VIPA to be moved from one server to a network adapter in another server in the loosely coupled complex. This not only provides for even higher availability, but gives you the ability to balance the load across servers in the TPF complex.

 

Server Images Are Snowflakes---No Two Are Exactly the Same

In addition to movable VIPAs, the TPF Domain Name System (DNS) server is another method to balance traffic across TPF servers in the complex and across multiple network interfaces on a single TPF server. DNS allows multiple IP addresses to be defined for the host name representing the server complex. However, using external DNS servers to select the server IP address to be used for a given client session does not necessarily result in a balanced load because external DNS servers assume all server images are equal and active.

For example, what if one server image is running on a more powerful processor than other server images, or if one server image is currently running CPU-intensive utilities. In other words, not all active server images have the same processing power available for new transactions. In the case of a TPF complex, it is quite common to expand the complex (add more server images) during peak periods and then collapse the complex when the load drops off. To overcome the problems of external DNS servers assigning more work to an overloaded processor or selecting an IP address of an inactive server image, TPF has its own internal DNS server that can be used to balance traffic for connections to the TPF complex. If you want to know the status of the server complex regarding which server images and network interfaces are currently active, and what the current load is on each server image, you need to ask the server complex itself. The TPF DNS server always responds with a usable (active) IP address and is customizable to enable the path selection (load balancing) logic to take into consideration whatever factors are appropriate to your environment. The TPF DNS server has another important advantage---centralized load balancing logic. The more external DNS servers that you allow to do path selection for new sessions, the less likely it is that you will end up with a balanced load on the server.

 

I'm Sorry, Sir. This Event Is by Invitation Only!

We have discussed the methods for deciding which path a client should use to reach the server; however, that assumes this particular client is allowed to connect to the server. That blind and trusting assumption is not wise in this security conscious era. Instead, you should verify that this client is authorized to connect to not only the server node itself, but to the requested server application. At the network level, this can be done using firewall filter rules or access control lists. A comprehensive security strategy should include firewalls at the edge routers of your private network and in server nodes as well. The TPF TCP/IP native stack includes a built in firewall that allows you to define filter rules to control access to TPF applications from externals users as well as users on your private network. Incorporating the firewall into the TPF native stack also has the benefit of being able to detect and prevent denial of service (DoS) attacks that attempt to exploit holes in the TCP/IP architecture. For end-to-end security, you can implement secure sockets layer (SSL) functionality in your applications. SSL-enabled applications are able to validate the identity of the partner and exchange data in a secure manner over public networks. Besides standard SSL support, TPF has shared SSL support that provides TPF-unique capabilities like the ability to share SSL sessions across multiple ECBs and AOR functionality for SSL sessions.

 

We All Have Our Limits..... Don't Push It!

A remote client requests a connection with the server. The rules state that this client is authorized to connect to the specified server application; therefore, it would seem that the server should accept the connection request. Not necessarily.

If multiple applications run in the server, you might want to limit the amount of resources that a given application can use so that one application does not monopolize the entire server. The TCP connection limiting support of TPF provides this capability by allowing you to define the maximum number of active sessions that are allowed for each TCP application in TPF. If the limit is reached and a new connection request is received, the connection request is rejected. By limiting the number of active sessions, you can control the amount of network, CPU, and server resources that the application can use. Connection limiting is valuable for overload situations where the traffic rate is much higher than normal, and for intentional floods during DoS attacks aimed at trying to take down the server.

 

OK, You Can Come in, but We'll Be Keeping a Close Eye on You!

Some applications like Web servers and mail servers use short-lived connections where a socket is started, only one or a few transactions flow, and then the socket is closed. Connection limiting works very well for this type of application. However, many applications use long-life socket connections where the connection is started, remains active for hours or even days, and is used to for thousands of transactions. For applications like this, it is not enough to just make resource checks when the connection is first started; resource checks must be made throughout the life of the connection.

This is where TPF traffic limiting support comes into play. Traffic limiting allows you to define the maximum message rate (in messages per second) for a given socket and for each application. If the socket or application limit is reached and the application attempts to read another message over this socket, the application will be blocked, making it look like there is no message available to read even if there are messages to read. Once the current time interval expires, if there is a message available to read, the application will be posted and passed the message. Traffic limiting has the ability to control the rate at which input messages are given to an application and does so without any changes required to the application program. Similar to connection limiting, traffic limiting is also valuable for overload situations where the traffic rate is much higher than normal, and for intentional floods as part of a DoS attack aimed at trying to take down the server. Traffic limiting has additional benefits in that it can be used for UDP applications as well as TCP applications. For TCP applications, no messages are lost, even if the traffic limits are exceeded. For UDP applications, if messages arrive faster than they are allowed to be given to the application (based on the defined traffic limits) and the socket receive buffer fills up, some input messages may be lost. This is consistent with UDP behavior because even if you do not use traffic limiting, messages can arrive faster than the UDP application reads them, and if the socket receive buffer becomes full, some input message are lost.

 

You Cannot Put 10 Pounds in a 5-Pound Bag!

We have seen that TPF has methods for controlling the rate at which traffic flows from clients to the TPF server, but what about traffic flowing from the server to a client? For TCP sockets, the remote client controls the rate at which traffic flows from server to client. This is based purely on the available resources of the client node, but there are other factors to consider. For example, just because the client says that it is ready for 100 K of data from the server, that does not mean the network can handle a burst of traffic that large. If the network cannot handle the data rate, the server needs to slow down the rate at which it sends data. TPF has congestion control built into the TCP layer based on RFC 2001. TPF has also implemented TCP congestion avoidance mechanisms. Congestion control is reactive while congestion avoidance is proactive. What does that mean?

Let's say that snow is falling and my sidewalk is getting slippery. If I were purely reactive, I would wait until someone slipped and fell, and then shovel some snow off the sidewalk. Next, I would wait until the next person fell and then shovel more snow. If I were proactive, I would see the snow building up and would start shoveling before anyone has fallen in an effort to reduce the likelihood that anyone does fall. Similarly, TCP congestion control (reactive) waits for problems to occur (packets to become lost in the network) and then takes action (reduces the rate at which data is sent). TCP congestion avoidance (proactive) monitors the round-trip times (RTTs) of messages to anticipate when congestion is likely to occur and takes action before any packets are lost. Congestion control mixed with congestion avoidance is a powerful combination. It greatly reduces packet loss and increases end-to-end throughput.

 

What Is This, a Television Mini-Series?

This concludes part 1 of our discussion about the capabilities of the TCP/IP native stack that differentiates TPF from other platforms. This article touched on high availability, load balancing, sharing sockets, security, and methods for controlling traffic flowing in and out of TPF. Part 2 will follow in a subsequent TPF Systems Technical Newsletter edition and discuss performance, advanced socket features available to TPF applications, and the many diagnostic tools that are available.