diff options
Diffstat (limited to 'storage/bdb/rpc_server/clsrv.html')
-rw-r--r-- | storage/bdb/rpc_server/clsrv.html | 453 |
1 files changed, 453 insertions, 0 deletions
diff --git a/storage/bdb/rpc_server/clsrv.html b/storage/bdb/rpc_server/clsrv.html new file mode 100644 index 00000000000..599ad56f557 --- /dev/null +++ b/storage/bdb/rpc_server/clsrv.html @@ -0,0 +1,453 @@ +<!doctype html public "-//w3c//dtd html 4.0 transitional//en"> +<html> +<head> + <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> + <meta name="GENERATOR" content="Mozilla/4.76 [en] (X11; U; FreeBSD 4.3-RELEASE i386) [Netscape]"> +</head> +<body> + +<center> +<h1> + Client/Server Interface for Berkeley DB</h1></center> + +<center><i>Susan LoVerso</i> +<br><i>sue@sleepycat.com</i> +<br><i>Rev 1.3</i> +<br><i>1999 Nov 29</i></center> + +<p>We provide an interface allowing client/server access to Berkeley DB. +Our goal is to provide a client and server library to allow users to separate +the functionality of their applications yet still have access to the full +benefits of Berkeley DB. The goal is to provide a totally seamless +interface with minimal modification to existing applications as well. +<p>The client/server interface for Berkeley DB can be broken up into several +layers. At the lowest level there is the transport mechanism to send +out the messages over the network. Above that layer is the messaging +layer to interpret what comes over the wire, and bundle/unbundle message +contents. The next layer is Berkeley DB itself. +<p>The transport layer uses ONC RPC (RFC 1831) and XDR (RFC 1832). +We declare our message types and operations supported by our program and +the RPC library and utilities pretty much take care of the rest. +The +<i>rpcgen</i> program generates all of the low level code needed. +We need to define both sides of the RPC. +<br> +<h2> +<a NAME="DB Modifications"></a>DB Modifications</h2> +To achieve the goal of a seamless interface, it is necessary to impose +a constraint on the application. That constraint is simply that all database +access must be done through an open environment. I.e. this model +does not support standalone databases. The reason for this constraint +is so that we have an environment structure internally to store our connection +to the server. Imposing this constraint means that we can provide +the seamless interface just by adding a single environment method: <a href="../docs/api_c/env_set_rpc_server.html">DBENV->set_rpc_server()</a>. +<p>The planned interface for this method is: +<pre>DBENV->set_rpc_server(dbenv, /* DB_ENV structure */ + hostname /* Host of server */ + cl_timeout, /* Client timeout (sec) */ + srv_timeout,/* Server timeout (sec) */ + flags); /* Flags: unused */</pre> +This new method takes the hostname of the server, establishes our connection +and an environment on the server. If a server timeout is specified, +then we send that to the server as well (and the server may or may not +choose to use that value). This timeout is how long the server will +allow the environment to remain idle before declaring it dead and releasing +resources on the server. The pointer to the connection is stored +on the client in the DBENV structure and is used by all other methods to +figure out with whom to communicate. If a client timeout is specified, +it indicates how long the client is willing to wait for a reply from the +server. If the values are 0, then defaults are used. Flags +is currently unused, but exists because we always need to have a placeholder +for flags and it would be used for specifying authentication desired (were +we to provide an authentication scheme at some point) or other uses not +thought of yet! +<p>This client code is part of the monolithic DB library. The user +accesses the client functions via a new flag to <a href="../docs/api_c/db_env_create.html">db_env_create()</a>. +That flag is DB_CLIENT. By using this flag the user indicates they +want to have the client methods rather than the standard methods for the +environment. Also by issuing this flag, the user needs to connect +to the server via the <a href="../docs/api_c/env_set_rpc_server.html">DBENV->set_rpc_server()</a> +method. +<p>We need two new fields in the <i>DB_ENV </i>structure. One is +the socket descriptor to communicate to the server, the other field is +the client identifier the server gives to us. The <i>DB, </i>and<i> +DBC </i>only need one additional field, the client identifier. The +<i>DB_TXN</i> +structure does not need modification, we are overloading the <i>txn_id +</i>field. +<h2> +Issues</h2> +We need to figure out what to do in case of client and server crashes. +Both the client library and the server program are stateful. They +both consume local resources during the lifetime of the connection. +Should one end drop that connection, the other side needs to release those +resources. +<p>If the server crashes, then the client will get an error back. +I have chosen to implement time-outs on the client side, using a default +or allowing the application to specify one through the <a href="../docs/api_c/env_set_rpc_server.html">DBENV->set_rpc_server()</a> +method. Either the current operation will time-out waiting for the +reply or the next operation called will time out (or get back some other +kind of error regarding the server's non-existence). In any case, +if the client application gets back such an error, it should abort any +open transactions locally, close any databases, and close its environment. +It may then decide to retry to connect to the server periodically or whenever +it comes back. If the last operation a client did was a transaction +commit that did not return or timed out from the server, the client cannot +determine if the transaction was committed or not but must release the +local transaction resources. Once the server is back up, recovery must +be run on the server. If the transaction commit completed on +the server before the crash, then the operation is redone, if the transaction +commit did not get to the server, the pieces of the transaction are undone +on recover. The client can then re-establish its connection and begin +again. This is effectively like beginning over. The client +cannot use ID's from its previous connection to the server. However, +if recovery is run, then consistency is assured. +<p>If the client crashes, the server needs to somehow figure this out. +The server is just sitting there waiting for a request to come in. +A server must be able to time-out a client. Similar to ftpd, if a +connection is idle for N seconds, then the server decides the client is +dead and releases that client's resources, aborting any open transactions, +closing any open databases and environments. The server timing +out a client is not a trivial issue however. The generated function +for the server just calls <i>svc_run()</i>. The server code I write +contains procedures to do specific things. We do not have access +to the code calling <i>select()</i>. Timing out the select is not +good enough even if we could do so. We want to time-out idle environments, +not simply cause a time-out if the server is idle a while. See the +discussion of the <a href="#The Server Program">server program</a> for +a description of how we accomplish this. +<p>Since rpcgen generates the main() function of the server, I do not yet +know how we are going to have the server multi-threaded or multi-process +without changing the generated code. The RPC book indicates that +the only way to accomplish this is through modifying the generated code +in the server. <b>For the moment we will ignore this issue while +we get the core server working, as it is only a performance issue.</b> +<p>We do not do any security or authentication. Someone could get +the code and modify it to spoof messages, trick the server, etc. +RPC has some amount of authentication built into it. I haven't yet +looked into it much to know if we want to use it or just point a user at +it. The changes to the client code are fairly minor, the changes +to our server procs are fairly minor. We would have to add code to +a <i>sed</i> script or <i>awk</i> script to change the generated server +code (yet again) in the dispatch routine to perform authentication. +<p>We will need to get an official program number from Sun. We can +get this by sending mail to <i>rpc@sun.com</i> and presumably at some point +they will send us back a program number that we will encode into our XDR +description file. Until we release this we can use a program number +in the "user defined" number space. +<br> +<h2> +<a NAME="The Server Program"></a>The Server Program</h2> +The server is a standalone program that the user builds and runs, probably +as a daemon like process. This program is linked against the Berkeley +DB library and the RPC library (which is part of the C library on my FreeBSD +machine, others may have/need <i>-lrpclib</i>). The server basically +is a slave to the client process. All messages from the client are +synchronous and two-way. The server handles messages one at a time, +and sends a reply back before getting another message. There are +no asynchronous messages generated by the server to the client. +<p>We have made a choice to modify the generated code for the server. +The changes will be minimal, generally calling functions we write, that +are in other source files. The first change is adding a call to our +time-out function as described below. The second change is changing +the name of the generated <i>main()</i> function to <i>__dbsrv_main()</i>, +and adding our own <i>main()</i> function so that we can parse options, +and set up other initialization we require. I have a <i>sed</i> script +that is run from the distribution scripts that massages the generated code +to make these minor changes. +<p>Primarily the code needed for the server is the collection of the specified +RPC functions. Each function receives the structure indicated, and +our code takes out what it needs and passes the information into DB itself. +The server needs to maintain a translation table for identifiers that we +pass back to the client for the environment, transaction and database handles. +<p>The table that the server maintains, assuming one client per server +process/thread, should contain the handle to the environment, database +or transaction, a link to maintain parent/child relationships between transactions, +or databases and cursors, this handle's identifier, a type so that we can +error if the client passes us a bad id for this call, and a link to this +handle's environment entry (for time out/activity purposes). The +table contains, in entries used by environments, a time-out value and an +activity time stamp. Its use is described below for timing out idle +clients. +<p>Here is how we time out clients in the server. We have to modify +the generated server code, but only to add one line during the dispatch +function to run the time-out function. The call is made right before +the return of the dispatch function, after the reply is sent to the client, +so that client's aren't kept waiting for server bookkeeping activities. +This time-out function then runs every time the server processes a request. +In the time-out function we maintain a time-out hint that is the youngest +environment to time-out. If the current time is less than the hint +we know we do not need to run through the list of open handles. If +the hint is expired, then we go through the list of open environment handles, +and if they are past their expiration, then we close them and clean up. +If they are not, we set up the hint for the next time. +<p>Each entry in the open handle table has a pointer back to its environment's +entry. Every operation within this environment can then update the +single environment activity record. Every environment can have a +different time-out. The <a href="../docs/api_c/env_set_rpc_server.html">DBENV->set_rpc_server +</a>call +takes a server time-out value. If this value is 0 then a default +(currently 5 minutes) is used. This time-out value is only a hint +to the server. It may choose to disregard this value or set the time-out +based on its own implementation. +<p>For completeness, the flaws of this time-out implementation should be +pointed out. First, it is possible that a client could crash with +open handles, and no other requests come in to the server. Therefore +the time-out function never gets run and those resources are not released +(until a request does come in). Similarly, this time-out is not exact. +The time-out function uses its hint and if it computes a hint on one run, +an earlier time-out might be created before that time-out expires. +This issue simply yields a handle that doesn't get released until that +original hint expires. To illustrate, consider that at the time that +the time-out function is run, the youngest time-out is 5 minutes in the +future. Soon after, a new environment is opened that has a time-out +of 1 minute. If this environment becomes idle (and other operations +are going on), the time-out function will not release that environment +until the original 5 minute hint expires. This is not a problem since +the resources will eventually be released. +<p>On a similar note, if a client crashes during an RPC, our reply generates +a SIGPIPE, and our server crashes unless we catch it. Using <i>signal(SIGPIPE, +SIG_IGN) </i>we can ignore it, and the server will go on. This is +a call in our <i>main()</i> function that we write. Eventually +this client's handles would be timed out as described above. We need +this only for the unfortunate window of a client crashing during the RPC. +<p>The options below are primarily for control of the program itself,. +Details relating to databases and environments should be passed from the +client to the server, since the server can serve many clients, many environments +and many databases. Therefore it makes more sense for the client +to set the cache size of its own environment, rather than setting a default +cachesize on the server that applies as a blanket to any environment it +may be called upon to open. Options are: +<ul> +<li> +<b>-t </b> to set the default time-out given to an environment.</li> + +<li> +<b>-T</b> to set the maximum time-out allowed for the server.</li> + +<li> +<b>-L</b> to log the execution of the server process to a specified file.</li> + +<li> +<b>-v</b> to run in verbose mode.</li> + +<li> +<b>-M</b> to specify the maximum number of outstanding child server +processes/threads we can have at any given time. The default is 10. +<b>[We +are not yet doing multiple threads/processes.]</b></li> +</ul> + +<h2> +The Client Code</h2> +The client code contains all of the supported functions and methods used +in this model. There are several methods in the <i>__db_env +</i>and +<i>__db</i> +structures that currently do not apply, such as the callbacks. Those +fields that are not applicable to the client model point to NULL to notify +the user of their error. Some method functions remain unchanged, +as well such as the error calls. +<p>The client code contains each method function that goes along with the +<a href="#Remote Procedure Calls">RPC +calls</a> described elsewhere. The client library also contains its +own version of <a href="../docs/api_c/env_create.html">db_env_create()</a>, +which does not result in any messages going over to the server (since we +do not yet know what server we are talking to). This function sets +up the pointers to the correct client functions. +<p>All of the method functions that handle the messaging have a basic flow +similar to this: +<ul> +<li> +Local arg parsing that may be needed</li> + +<li> +Marshalling the message header and the arguments we need to send to the +server</li> + +<li> +Sending the message</li> + +<li> +Receiving a reply</li> + +<li> +Unmarshalling the reply</li> + +<li> +Local results processing that may be needed</li> +</ul> + +<h2> +Generated Code</h2> +Almost all of the code is generated from a source file describing the interface +and an <i>awk</i> script. This awk script generates six (6) +files for us. It also modifies one. The files are: +<ol> +<li> +Client file - The C source file created containing the client code.</li> + +<li> +Client template file - The C template source file created containing interfaces +for handling client-local issues such as resource allocation, but with +a consistent interface with the client code generated.</li> + +<li> +Server file - The C source file created containing the server code.</li> + +<li> +Server template file - The C template source file created containing interfaces +for handling server-local issues such as resource allocation, calling into +the DB library but with a consistent interface with the server code generated.</li> + +<li> +XDR file - The XDR message description file created.</li> + +<li> +Server sed file - A sed script that contains commands to apply to the server +procedure file (i.e. the real source file that the server template file +becomes) so that minor interface changes can be consistently and easily +applied to the real code.</li> + +<li> +Server procedure file - This is the file that is modified by the sed script +generated. It originated from the server template file.</li> +</ol> +The awk script reads a source file, <i>db_server/rpc.src </i>that describes +each operation and what sorts of arguments it takes and what it returns +from the server. The syntax of the source file describes the interface +to that operation. There are four (4) parts to the syntax: +<ol> +<li> +<b>BEGIN</b> <b><i>function version# codetype</i></b> - begins a new functional +interface for the given <b><i>function</i></b>. Each function has +a <b><i>version number</i></b>, currently all of them are at version number +one (1). The <b><i>code type</i></b> indicates to the awk script +what kind of code to generate. The choices are:</li> + +<ul> +<li> +<b>CODE </b>- Generate all code, and return a status value. If specified, +the client code will simply return the status to the user upon completion +of the RPC call.</li> + +<li> +<b>RETCODE </b>- Generate all code and call a return function in the client +template file to deal with client issues or with other returned items. +If specified, the client code generated will call a function of the form +<i>__dbcl_<name>_ret() +</i>where +<name> is replaced with the function name given here. This function +is placed in the template file because this indicates that something special +must occur on return. The arguments to this function are the same +as those for the client function, with the addition of the reply message +structure.</li> + +<li> +<b>NOCLNTCODE - </b>Generate XDR and server code, but no corresponding +client code. (This is used for functions that are not named the same thing +on both sides. The only use of this at the moment is db_env_create +and db_create. The environment create call to the server is actually +called from the <a href="../docs/api_c/env_set_rpc_server.html">DBENV->set_rpc_server()</a> +method. The db_create code exists elsewhere in the library and we +modify that code for the client call.)</li> +</ul> + +<li> +<b>ARG <i>RPC-type C-type varname [list-type]</i></b>- each line of this +describes an argument to the function. The argument is called <b><i>varname</i></b>. +The <b><i>C-type</i></b> given is what it should look like in the C code +generated, such as <b>DB *, u_int32_t, const char *</b>. The +<b><i>RPC-type</i></b> +is an indication about how the RPC request message should be constructed. +The RPC-types allowed are described below.</li> + +<li> +<b>RET <i>RPC-type C-type varname [list-type]</i></b>- each line of this +describes what the server should return from this procedure call (in addition +to a status, which is always returned and should not be specified). +The argument is called <b><i>varname</i></b>. The <b><i>C-type</i></b> +given is what it should look like in the C code generated, such as <b>DB +*, u_int32_t, const char *</b>. The <b><i>RPC-type</i></b> is an +indication about how the RPC reply message should be constructed. +The RPC-types are described below.</li> + +<li> +<b>END </b>- End the description of this function. The result is +that when the awk script encounters the <b>END</b> tag, it now has all +the information it needs to construct the generated code for this function.</li> +</ol> +The <b><i>RPC-type</i></b> must be one of the following: +<ul> +<li> +<b>IGNORE </b>- This argument is not passed to the server and should be +ignored when constructing the XDR code. <b>Only allowed for an ARG +specfication.</b></li> + +<li> +<b>STRING</b> - This argument is a string.</li> + +<li> +<b>INT </b>- This argument is an integer of some sort.</li> + +<li> +<b>DBT </b>- This argument is a DBT, resulting in its decomposition into +the request message.</li> + +<li> +<b>LIST</b> - This argument is an opaque list passed to the server (NULL-terminated). +If an argument of this type is given, it must have a <b><i>list-type</i></b> +specified that is one of:</li> + +<ul> +<li> +<b>STRING</b></li> + +<li> +<b>INT</b></li> + +<li> +<b>ID</b>.</li> +</ul> + +<li> +<b>ID</b> - This argument is an identifier.</li> +</ul> +So, for example, the source for the DB->join RPC call looks like: +<pre>BEGIN dbjoin 1 RETCODE +ARG ID DB * dbp +ARG LIST DBC ** curs ID +ARG IGNORE DBC ** dbcpp +ARG INT u_int32_t flags +RET ID long dbcid +END</pre> +Our first line tells us we are writing the dbjoin function. It requires +special code on the client so we indicate that with the RETCODE. +This method takes four arguments. For the RPC request we need the +database ID from the dbp, we construct a NULL-terminated list of IDs for +the cursor list, we ignore the argument to return the cursor handle to +the user, and we pass along the flags. On the return, the reply contains +a status, by default, and additionally, it contains the ID of the newly +created cursor. +<h2> +Building and Installing</h2> +I need to verify with Don Anderson, but I believe we should just build +the server program, just like we do for db_stat, db_checkpoint, etc. +Basically it can be treated as a utility program from the building and +installation perspective. +<p>As mentioned early on, in the section on <a href="#DB Modifications">DB +Modifications</a>, we have a single library, but allowing the user to access +the client portion by sending a flag to <a href="../docs/api_c/env_create.html">db_env_create()</a>. +The Makefile is modified to include the new files. +<p>Testing is performed in two ways. First I have a new example program, +that should become part of the example directory. It is basically +a merging of ex_access.c and ex_env.c. This example is adequate to +test basic functionality, as it does just does database put/get calls and +appropriate open and close calls. However, in order to test the full +set of functions a more generalized scheme is required. For the moment, +I am going to modify the Tcl interface to accept the server information. +Nothing else should need to change in Tcl. Then we can either write +our own test modules or use a subset of the existing ones to test functionality +on a regular basis. +</body> +</html> |