ND exports a virtual IP address (VIP) for each service which is scaled. ND does not modify packets destine to the VIP. Consequently, each of the servers in the cluster must accept packets destine to the VIP without exporting (ARP) the VIP. This is accomplished by aliasing the loop back device with the VIP. A intended affect of this is that servers can respond directly to the clients without sending the responses via ND. We call this approach a half-proxy. For services where the response size is considerably larger than the requests, such as the web, there is a significant advantage to scaling with a half-proxy as ND can spend all of it's time distributing request and managing the load without processing any responses. ND keeps a connection table which associates a TCP connection with a particular server in the cluster. Connections are allocated using a weighted round-robin algorithm. The ratio of the weights determines how the connections will be allocated. For example if there are two servers A & B with weights 10 and 5 respectively, then server A will get twice as many connections as server B. The weights are set by a user-level manager process which monitors the load on the servers. The manager monitors the load using counters that are kept in ND, measuring the response time for simple request from the ND machine, and taking configurable metrics at each server. The results of these measurements are combined in a configurable manner and compared with current weights to determine whether or not any weight changes are necessary.
In order for a service to be scaleable via ND, each server has to provide the same service. ND is the technology for distributing request and does not enforce this requirement. Another technology, for example NFS, DFS, distributed database, copying files, or specific application programming, has to be used to make the server images identical.
Initially ND routed separate connection from the same client independently. However, this approach limits ND to scaling services which have no client state from request to request. This was considered too sever a restriction so ND was extended supports client affinity of various types, for example SSL and FTP.
The initial version of ND were primarily concerned with TCP based services like the web. However, all TCP/IP based services should be scaled. Several issues remain with respect to the approach for scaling that ND uses. First, introducing an ND node creates a single point of failure. What are the proper methods of providing high availability and fault tolerance for the ND node? Since ND is a half-connection-proxy, how can it be extended to support servers anywhere on the Internet? ND is aimed at high load Web sites, for a unloaded (very low connection rate) site different policies are necessary and a mechanism to automatically switch between low rate and high rate policies. What is the proper location of ND type technology in a system? Should it be integrated into products which want to scale or provided as a generic system service? ND was initially targeted at HTTP and uses a single connection request for it's unit of allocation. As HTTP evolves and the load associated with a connection request increases how should ND and its algorithms change to continue to provide effective load-sharing. ND attempts to "transparently" interpose itself between clients and servers. Consequently this limits the types of services that can be scaled via ND, how do we maintain transparency by modifying various factors such as the unit of allocation or point of allocation to broaden the types of services that can be supported. Finally, how to extend this approach for other protocols such as UDP.
This project is joint work between German Goldszmidt, Guerney Hunt, Richard King, Eric Levy, Eric Nahum, and John Tracey all of the IBM T. J. Watson Research Center.