__Issues in using RPCs for distributed communication

ABSTRACT

Remote Procedure Calls (RPCs) (Buhle, 1993) are among the most commonly used data communication protocols within client/server distributed systems (Muhlhauser et al., 1993). RPC is an invocation of a pre-defined stored procedure that resides at the server level. Stored procedures perform various functions such as accessing designated databases, communicating with other servers, or invoking gateways (Gray and Reuter, 1993) in order to communicate with large size enterprise servers. RPCs are simple to create, change, and control, however they impose numerous restrictions in any distributed system demanding high levels of performance. This paper explores the fundamental limitations of the RPC protocol within the distributed systems. The requirements for a more intelligent protocol are deduced based on the analysis of these limitations.

1 INTRODUCTION

A Remote Procedure Call is an extension of the standard procedure call mechanism (Gray and Reuter, 1993) that is provided in most high-level programming languages. In procedure call mechanism, a procedure is implemented in one process and may be called by other processes that share the same memory address space. This restriction is eliminated in the remote procedure call mechanism. Figure 1 illustrates a general form of the RPC within a client/server distributed environment. It should be noted that remote procedure calls may take place locally. In such cases both client and server processes are located on the same hardware platform. In spite of all restrictions, RPC is widely used in many client/server distributed systems due to the conceptual and practical simplicity. The protocol also complies with the open systems concept since it may be implemented within heterogeneous (Notkin et al., 1988) environments. These restrictions have forced various groups ranging from software to full system solution providers to device work-around solutions to overcome these restrictions.

2 RPC RESTRICTIONS

The most outstanding restrictions of the existing RPCs are the lack of support for: Asynchronous communication, Dynamic data resolution, Server selection and Failure recovery, Message queuing, Performance/flow control monitoring, and Local data caching. These restrictions are responsible for different implementations of the RPC which in turn have violated the open systems philosophy in practice. In most cases, RPCs are used as primary communication protocol in conjunction with a number of RPC unrelated techniques to overcome one or more of these restrictions. The remainder of this section analyzes these restrictions in more details:

2.1 ASYNCHRONOUS COMMUNICATION

The most outstanding restriction of the RPC protocol is the lack of support for asynchronous communication (Menon et al., 1993). This restriction forces the client process that invokes the RPC to a wait-state until the server has completed the results. This in turn limits the degree of communication concurrency in any distributed environment. There are currently a number of solutions that have been offered to address this restriction (Ananda and Koh, 1991; Walker et al., 1990). However, they are mostly work-around remedies rather than true solutions, which do not eliminate the synchronous limitation. Two of the more common approaches are briefly described below:

Microsoft (Rymer et al., 1994) addresses the synchronous limitations of its RPC by generating an internal process that is transparent to the caller of the RPC. This internal process allows caller to continue with its operation while it waits for the synchronous operation to complete. Upon completion of the RPC request, the internal process posts a completion confirmation together with data, if any, to the original process via dedicated messages and consecutively destroys itself. The main drawback of this mechanism is that a number of hidden processes are actively executing and consuming system resources. This could potentially slow down or even halt the client platform as the number of RPCs increase. Also this mechanism does not eliminate the synchronous limitation of the RPC protocol.

The RPC of Sybase (Kramer, 1993) provides two modes of operation for invoking the protocol. The first mode is the original synchronous model in which the caller must wait while the operation is in progress. This approach is used by all processes with no real-time requirement for data. The second mode allows the calling process to continue with other operations after the RPC request is made without any wait, however, it must request the results of the RPC at a later time using a polling (Gray and Reuter, 1993) mechanism. That is: process invoking the RPC must request the results by continually interrogating the server to see if the request is complete. In this mode, caller must eventually wait for the results, however it could optimize its operations by requesting the results whenever it is most efficient or convenient, thereby providing a level of concurrency between the client and the server operations. This mode is particularly useful if client does not require any results back from the server, as in updating a remote database.

2.2 DYNAMIC DATA RESOLUTION

Consider a common situation in which a client process invokes an RPC in order to retrieve remote data located on the server’s database. Assume, without loss of generality, that the client receives the requested data after X units of time from server and it requires Y units of time to completed the processing of the data, a total of X + Y units of time:

Processing Time

Where:

A: Client requests data from server via an RPC

B: Server returns results to client after X units of time

C: Client completes processing of data after Y units

Therefore processing a request via RPC where partial data access is not allowed is X + Y units of time. Now consider a modified scenario where the client process is capable of receiving partial data from the server as illustrated below:

Processing Time

Where:

A, B, C: Have the same definitions as before

y1, y2, y3: Are client processing times to complete partial data returned by the server where: y1 + y2 + y3 = Y

The total client time for processing a request where partial data access is allowed is X + y3 which is substantially less than X + Y since y3 <<Y.

As the above scenarios illustrated, there is a significant reduction in total processing time of any distributed transaction if the communication protocol provides for partial delivery of data. To achieve this, client process may instruct the communication protocol of the method in which data must be delivered. The client may, for example, request data to be delivered in 1 Kilo-Byte (KB) blocks for the first 20 KB of data, and choose to receive the remainder in one large block. The RPC protocol does not allow partial delivery of data neither does allow the flexibility to select different resolution levels.

2.3 SERVER SELECTIONS AND FAILURE RECOVERY

The ability to handle abnormal situations is a fundamental requirement for any distributed system (King and Garcia-Molina, 1990). The abnormalities could be due to either software or hardware components and may occur at any system level. In all cases, the behavior of a distributed transaction may be placed in one of the following categories:

a) Normal: The transaction has terminated as expected. In this category, data may or may not necessarily be returned by the server even though the operation is considered normal. As an example, if an invalid request is received by a server, the server may issue an error message questioning the validity of the request which is a correct response in this situation.

b) Retry/Time-out: The transaction has timed out due to slower than expected response time of one or more of the system components. The option of retrying the transaction or voluntarily terminating the transaction must be offered to the client process.

c) Reconnection: The transaction has abnormally terminated due to unavailability or failure of one or more of the system components such as network connections, servers, or databases. The system should automatically attempt to complete the transaction via alternative routes (Huang and Jalote, 1989).

d) Hard Fail: The system has failed to complete the distributed transaction and will not allow any further attempts, unlike the reconnect case, through alternative routes. This is usually the case when the system detects that a crucial component of the final destination is not available regardless of the traversed route.

The RPC protocol does not support any of the above cases with the exception of the normal case (a) which it fully supports. For the (b), (c), and (d) cases above, RPC informs the client of the status. The client process may choose to invoke a new RPC to retry the same transaction over or select an alternative route manually.

2.4 MESSAGE QUEUING

In distributed systems, concurrent delivery of data and messages is essential regardless of the underlying communication protocol. Consider the messaging mechanism of a client/server environment that uses RPC as communication protocol. An RPC is invoked by the client process which triggers a sequence of activities that involve multiple computing platforms and network layers. The activities begin at the client platform by packaging and sending remote requests across Local Area Network (LAN) and/or Wide Area Network (WAN) (Gray and Reuter, 1993) mediums. The request is eventually received by the designated server, if no anomalies arise at the client or the communication layers. The server process in turn services the request either locally or invokes other servers in order to reach the final destination. The final results will subsequently be traversed back to the requester of the service. There is a significant potential for problems or delays at any stages of this two-way communication which in turn yields to generation of appropriate error, warning, or informational messages.

The existing RPC protocol is capable of delivering messages in near-real-time while preserving the sequentially of messages as they arrive. That is: a message that was generated first must be delivered first, this may not be a simple task considering the non-deterministic nature of the distributed environments (Shyh-Wei and Virgil, 1990). However, the near-real-time deliverance and order preservation capabilities, even though necessary, may potentially lower throughput of the system. Consider a common situation in which the sender of a message (network and/or server) is blocked due to lack of immediate response by the receiver (client). The blocking action prevents a full utilization of distributed components and is not addressed by the RPC protocol.

2.5 PERFORMANCE and FLOW CONTROL MONITORING

The flow of information within any distributed computing environment is often very complex due to the non-deterministic behavior of the system. As the state diagram presented in Figure 2 illustrates, distributed communication is clearly non-deterministic since the flow of control between any two designated points may take place across different communication paths at different times. This is primarily due to availability of various hardware and network components. The RPC protocol does not provide capabilities to record various state changes while remote communication is progress. Such control flow (Danzig, 1994) information can be used for client/server load balancing (Lin and Raghavendra, 1991; Nishikawa and Steenkiste, 1993) or as a diagnostic tool during failure conditions. This also provides the ability to measure performances of various components of the distributed environment while a transaction is in process. Control flow monitoring helps discover the capacity, security, and speed bottlenecks of the system.

2.6 LOCAL DATA CACHING

The ability to cache remote data (Korner, 1990) at the client level could potentially increase the systems efficiency by reducing communication in certain circumstances. To best realize the impact of data caching in distributed environments, consider the following definitions: Let’s measure the efficiency of a distributed computing environment by the total time it takes to transfer data in kilo-Bytes (KB), t_k, between the client and the server over a fixed time period Dt. The efficiency of the system can be measured based on the size and type of data as follows:

Efficiency of the distributed environment = E =

Size of data in KBs . time to transfer 1-KB of data

------------------------------------------------------------- (1) Dt

That is: if a given set of remote tasks take t units of time to complete in distributed environment A, and the same tasks take 2t to complete in distributed environment B, then environment A is twice as efficient as environment B.

Also Let:

Communicated data in KB = Fixed data in KB +

Variable data in KB

= F + V (2)

Where:

Fixed date (F) is the type of remote data that changes very seldom within the given time period , Dt. Example: the customer’s monthly statements which changes only once a month.

Variable data (V) is any data that is not considered fixed. Example: Satellite weather data or Dow Jones financial data which are of real-time nature.

Therefore:

(F + V) . t_kF . t_kV . t_k

E = -------------- = --------- + -------- (3)

Dt Dt Dt

Notice the need for communicating the fixed portion of data (F) is greatly reduced if a local data caching is devised at the client level. Under this scenario, the term (F.t_k / Dt) of formula (1) approaches 0 for sufficiently small Dt. That is:

F . t_kV . t_kV . t_k V . t_k

E = --------- + -------- = 0 + --------- = -------- (4)

Dt Dt Dt Dt

Therefore, in general:

Efficiency of distributed systems without data caching

----------------------------------------------------------------- =

Efficiency of distributed systems with data caching

F . t_k V . t_k

--------- + ---------

(3) Dt Dt

------ = ------------------------------ = 1 + (F / V) (5)

(4) V . t_k

--------

From (5) we conclude that distributed communication protocols that provide a local data caching mechanism are approximately (F/V) percent more efficient than none-caching protocols. The RPC protocol does not support data caching.

3 INTELLIGENT PROTOCOL REQUIREMENTS

The demand for near-real-time transmission of data is a fundamental requirement of most heterogeneous client/server systems. This requirement is further complicated as the data increases in volume. There are currently a number of data communication protocols available, however, they are primarily static by design. That is, they have been designed to operate in a pre-defined format without the ability to accommodate new functional requirements or adapt to new distributed environments. This severely limits the efficiency of interprocess communication within client/server distributed systems. The obsolescence of these protocols is also inherently unavoidable as distributed systems evolve.

As an example, the fundamental problem with the RPC protocol is the fact that it was initially designed for interprocess communication within centralized computing systems and it was later extended to serve in distributed environments. Distributed systems involve multiple heterogeneous platforms with non-deterministic behavior, therefore requiring new protocol design approaches to address the distributed needs. We are currently considering a new approach for designing data communication protocols. The underlying methodology is to design protocols as dynamically re-definable organizations with processing intelligence. This model, similar to the organization of a computer, consists of a memory unit to handle local caching data, a synchronization unit to allow process concurrency, and a rule-based control unit to perform all logical operations. The rule-based control center provides run-time capability to automatically modify the protocol’s behavior by the invoking processes. The model monitors the non-deterministic formation of distributed systems and provides near-real-time response to client requests. The complete design of one such model is currently under construction and is beyond the scope of this paper.

4 SUMMARY

The main limitations of the RPC protocol were presented in this paper. These limitations are attributed to the fact that RPC is derived from the standard procedure call that was designed for centralized systems with one common memory address space. This paper also outlines a new approach for modeling distributed protocols. The underlying methodology suggests designing distributed protocols as dynamically re-definable organizations with processing intelligence in order to support on going distributed requirements.

REFERENCES

Ananda, A., Koh, E., 1991, “ASTRA - An asynchronous remote procedure call facility”, IEEE 11th International conference on distributed computing systems,

pp. 172-179.

Buhle, L., 1993, “Distributed programming: an RPC primer”, Digital Systems Journal, Professional Press Inc., Vol. 15, No. 6, pp. 5-14.

Danzig, P., 1994, “Flow control for limited buffer multicast”, IEEE Transactions on Software Engineering, Vol. 20, No. 1, pp. 1-12.

Gray, J., Reuter, A., 1993, “Transaction processing: Concepts and techniques”, Morgan Kaufmann publishers Inc., San Mateo.

Huang, Y., Jalote, P., 1989, “Availability analysis of the primary site approach for fault tolerance”, IEEE Eight symposium on reliable distributed systems, pp. 130-136.

King, R., Garcia-Molina, H., 1990, “Overview of disaster recovery for transaction processing systems”, IEEE 10th International conference on distributed computing systems, pp. 286-293.

Korner, K., 1990, “Intelligent caching for remote file service”, IEEE 10th International conference on distributed computing systems, pp. 200-226.

Kramer, M., 1993, “TransAccess: mainframe, client/server integration”, Distributed Computing Monitor, Patricia Seybold Group, Vol. 8, No. 5, pp. 27-31.

Lin, H., Raghavendra, C., 1991, “A dynamic load balancing policy with a central job dispatcher (LBC)”, IEEE 11th International conference on distributed computing systems, pp. 264-271.

Menon, S., Dasgupta, P., LeBlanc, R., 1993, “Asynchronous event handling in distributed object-based systems”, IEEE 13th International conference on distributed computing systems, pp. 383-390.

Muhlhauser, M., Gerteis, W., Heuser, L., 1993, “DOCASE: a methodic approach to distributed programming”, Communications of the ACM, Vol. 36, No. 9, pp. 127-139.

Nishikawa, H., Steenkiste, P., 1993, “A general architecture for load balancing in a distributed-memory environment”, IEEE 13th international conference on distributed computing systems, pp. 47-54.

Notkin, D., Black, A., Lazowska, E., Levy, H., Sanislo, J., Zahorjan, J., 1988, “Interconnecting Heterogeneous computer systems”, Communications of the ACM, Vol. 31, No. 3, pp. 258-273.

Rymer, J., Guttman, M., Matthews, J., 1994, “Microsoft OLE 2.0 and the Road to Cairo; how Object Linking and Embedding will lead to Distributed Object Computing”, Distributed Computing Monitor, Patricia Seybold Group, Vol. 9, No. 1, pp. 3-27.

Shyh-Wei, L., Virgil, D.G., 1990, “On Reply detection in distributed systems”, IEEE 10th International conference on distributed computing systems, pp. 188-195.

Walker, E., Floyd, R., Neves, P., 1990, “Asynchronous remote operation execution in distributed systems”, IEEE 10th International conference on distributed computing systems, pp. 253-259.