1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="GENERATOR" CONTENT="Mozilla/4.08 [en] (X11; I; FreeBSD 3.3-RELEASE i386) [Netscape]">
</HEAD>
<BODY>
<CENTER>
<H1>
Client/Server Interface for Berkeley DB</H1></CENTER>
<CENTER><I>Susan LoVerso</I>
<BR><I>sue@sleepycat.com</I>
<BR><I>Rev 1.3</I>
<BR><I>1999 Nov 29</I></CENTER>
<P>We provide an interface allowing client/server access to Berkeley DB.
Our goal is to provide a client and server library to allow users to separate
the functionality of their applications yet still have access to the full
benefits of Berkeley DB. The goal is to provide a totally seamless
interface with minimal modification to existing applications as well.
<P>The client/server interface for Berkeley DB can be broken up into several
layers. At the lowest level there is the transport mechanism to send
out the messages over the network. Above that layer is the messaging
layer to interpret what comes over the wire, and bundle/unbundle message
contents. The next layer is Berkeley DB itself.
<P>The transport layer uses ONC RPC (RFC 1831) and XDR (RFC 1832).
We declare our message types and operations supported by our program and
the RPC library and utilities pretty much take care of the rest.
The
<I>rpcgen</I> program generates all of the low level code needed.
We need to define both sides of the RPC.
<BR>
<H2>
<A NAME="DB Modifications"></A>DB Modifications</H2>
To achieve the goal of a seamless interface, it is necessary to impose
a constraint on the application. That constraint is simply that all database
access must be done through an open environment. I.e. this model
does not support standalone databases. The reason for this constraint
is so that we have an environment structure internally to store our connection
to the server. Imposing this constraint means that we can provide
the seamless interface just by adding a single environment method: <A HREF="../docs/api_c/env_set_server.html">DBENV->set_server()</A>.
<P>The planned interface for this method is:
<PRE>DBENV->set_server(dbenv, /* DB_ENV structure */
hostname /* Host of server */
cl_timeout, /* Client timeout (sec) */
srv_timeout,/* Server timeout (sec) */
flags); /* Flags: unused */</PRE>
This new method takes the hostname of the server, establishes our connection
and an environment on the server. If a server timeout is specified,
then we send that to the server as well (and the server may or may not
choose to use that value). This timeout is how long the server will
allow the environment to remain idle before declaring it dead and releasing
resources on the server. The pointer to the connection is stored
on the client in the DBENV structure and is used by all other methods to
figure out with whom to communicate. If a client timeout is specified,
it indicates how long the client is willing to wait for a reply from the
server. If the values are 0, then defaults are used. Flags
is currently unused, but exists because we always need to have a placeholder
for flags and it would be used for specifying authentication desired (were
we to provide an authentication scheme at some point) or other uses not
thought of yet!
<P>This client code is part of the monolithic DB library. The user
accesses the client functions via a new flag to <A HREF="../docs/api_c/db_env_create.html">db_env_create()</A>.
That flag is DB_CLIENT. By using this flag the user indicates they
want to have the client methods rather than the standard methods for the
environment. Also by issuing this flag, the user needs to connect
to the server via the <A HREF="../docs/api_c/env_set_server.html">DBENV->set_server()</A>
method.
<P>We need two new fields in the <I>DB_ENV </I>structure. One is
the socket descriptor to communicate to the server, the other field is
the client identifier the server gives to us. The <I>DB, </I>and<I>
DBC </I>only need one additional field, the client identifier. The
<I>DB_TXN</I>
structure does not need modification, we are overloading the <I>txn_id
</I>field.
<H2>
Issues</H2>
We need to figure out what to do in case of client and server crashes.
Both the client library and the server program are stateful. They
both consume local resources during the lifetime of the connection.
Should one end drop that connection, the other side needs to release those
resources.
<P>If the server crashes, then the client will get an error back.
I have chosen to implement time-outs on the client side, using a default
or allowing the application to specify one through the <A HREF="../docs/api_c/env_set_server.html">DBENV->set_server()</A>
method. Either the current operation will time-out waiting for the
reply or the next operation called will time out (or get back some other
kind of error regarding the server's non-existence). In any case,
if the client application gets back such an error, it should abort any
open transactions locally, close any databases, and close its environment.
It may then decide to retry to connect to the server periodically or whenever
it comes back. If the last operation a client did was a transaction
commit that did not return or timed out from the server, the client cannot
determine if the transaction was committed or not but must release the
local transaction resources. Once the server is back up, recovery must
be run on the server. If the transaction commit completed on
the server before the crash, then the operation is redone, if the transaction
commit did not get to the server, the pieces of the transaction are undone
on recover. The client can then re-establish its connection and begin
again. This is effectively like beginning over. The client
cannot use ID's from its previous connection to the server. However,
if recovery is run, then consistency is assured.
<P>If the client crashes, the server needs to somehow figure this out.
The server is just sitting there waiting for a request to come in.
A server must be able to time-out a client. Similar to ftpd, if a
connection is idle for N seconds, then the server decides the client is
dead and releases that client's resources, aborting any open transactions,
closing any open databases and environments. The server timing
out a client is not a trivial issue however. The generated function
for the server just calls <I>svc_run()</I>. The server code I write
contains procedures to do specific things. We do not have access
to the code calling <I>select()</I>. Timing out the select is not
good enough even if we could do so. We want to time-out idle environments,
not simply cause a time-out if the server is idle a while. See the
discussion of the <A HREF="#The Server Program">server program</A> for
a description of how we accomplish this.
<P>Since rpcgen generates the main() function of the server, I do not yet
know how we are going to have the server multi-threaded or multi-process
without changing the generated code. The RPC book indicates that
the only way to accomplish this is through modifying the generated code
in the server. <B>For the moment we will ignore this issue while
we get the core server working, as it is only a performance issue.</B>
<P>We do not do any security or authentication. Someone could get
the code and modify it to spoof messages, trick the server, etc.
RPC has some amount of authentication built into it. I haven't yet
looked into it much to know if we want to use it or just point a user at
it. The changes to the client code are fairly minor, the changes
to our server procs are fairly minor. We would have to add code to
a <I>sed</I> script or <I>awk</I> script to change the generated server
code (yet again) in the dispatch routine to perform authentication.
<P>We will need to get an official program number from Sun. We can
get this by sending mail to <I>rpc@sun.com</I> and presumably at some point
they will send us back a program number that we will encode into our XDR
description file. Until we release this we can use a program number
in the "user defined" number space.
<BR>
<H2>
<A NAME="The Server Program"></A>The Server Program</H2>
The server is a standalone program that the user builds and runs, probably
as a daemon like process. This program is linked against the Berkeley
DB library and the RPC library (which is part of the C library on my FreeBSD
machine, others may have/need <I>-lrpclib</I>). The server basically
is a slave to the client process. All messages from the client are
synchronous and two-way. The server handles messages one at a time,
and sends a reply back before getting another message. There are
no asynchronous messages generated by the server to the client.
<P>We have made a choice to modify the generated code for the server.
The changes will be minimal, generally calling functions we write, that
are in other source files. The first change is adding a call to our
time-out function as described below. The second change is changing
the name of the generated <I>main()</I> function to <I>__dbsrv_main()</I>,
and adding our own <I>main()</I> function so that we can parse options,
and set up other initialization we require. I have a <I>sed</I> script
that is run from the distribution scripts that massages the generated code
to make these minor changes.
<P>Primarily the code needed for the server is the collection of the specified
RPC functions. Each function receives the structure indicated, and
our code takes out what it needs and passes the information into DB itself.
The server needs to maintain a translation table for identifiers that we
pass back to the client for the environment, transaction and database handles.
<P>The table that the server maintains, assuming one client per server
process/thread, should contain the handle to the environment, database
or transaction, a link to maintain parent/child relationships between transactions,
or databases and cursors, this handle's identifier, a type so that we can
error if the client passes us a bad id for this call, and a link to this
handle's environment entry (for time out/activity purposes). The
table contains, in entries used by environments, a time-out value and an
activity time stamp. Its use is described below for timing out idle
clients.
<P>Here is how we time out clients in the server. We have to modify
the generated server code, but only to add one line during the dispatch
function to run the time-out function. The call is made right before
the return of the dispatch function, after the reply is sent to the client,
so that client's aren't kept waiting for server bookkeeping activities.
This time-out function then runs every time the server processes a request.
In the time-out function we maintain a time-out hint that is the youngest
environment to time-out. If the current time is less than the hint
we know we do not need to run through the list of open handles. If
the hint is expired, then we go through the list of open environment handles,
and if they are past their expiration, then we close them and clean up.
If they are not, we set up the hint for the next time.
<P>Each entry in the open handle table has a pointer back to its environment's
entry. Every operation within this environment can then update the
single environment activity record. Every environment can have a
different time-out. The <A HREF="../docs/api_c/env_set_server.html">DBENV->set_server
</A>call
takes a server time-out value. If this value is 0 then a default
(currently 5 minutes) is used. This time-out value is only a hint
to the server. It may choose to disregard this value or set the time-out
based on its own implementation.
<P>For completeness, the flaws of this time-out implementation should be
pointed out. First, it is possible that a client could crash with
open handles, and no other requests come in to the server. Therefore
the time-out function never gets run and those resources are not released
(until a request does come in). Similarly, this time-out is not exact.
The time-out function uses its hint and if it computes a hint on one run,
an earlier time-out might be created before that time-out expires.
This issue simply yields a handle that doesn't get released until that
original hint expires. To illustrate, consider that at the time that
the time-out function is run, the youngest time-out is 5 minutes in the
future. Soon after, a new environment is opened that has a time-out
of 1 minute. If this environment becomes idle (and other operations
are going on), the time-out function will not release that environment
until the original 5 minute hint expires. This is not a problem since
the resources will eventually be released.
<P>On a similar note, if a client crashes during an RPC, our reply generates
a SIGPIPE, and our server crashes unless we catch it. Using <I>signal(SIGPIPE,
SIG_IGN) </I>we can ignore it, and the server will go on. This is
a call in our <I>main()</I> function that we write. Eventually
this client's handles would be timed out as described above. We need
this only for the unfortunate window of a client crashing during the RPC.
<P>The options below are primarily for control of the program itself,.
Details relating to databases and environments should be passed from the
client to the server, since the server can serve many clients, many environments
and many databases. Therefore it makes more sense for the client
to set the cache size of its own environment, rather than setting a default
cachesize on the server that applies as a blanket to any environment it
may be called upon to open. Options are:
<UL>
<LI>
<B>-t </B> to set the default time-out given to an environment.</LI>
<LI>
<B>-T</B> to set the maximum time-out allowed for the server.</LI>
<LI>
<B>-L</B> to log the execution of the server process to a specified file.</LI>
<LI>
<B>-v</B> to run in verbose mode.</LI>
<LI>
<B>-M</B> to specify the maximum number of outstanding child server
processes/threads we can have at any given time. The default is 10.
<B>[We
are not yet doing multiple threads/processes.]</B></LI>
</UL>
<H2>
The Client Code</H2>
The client code contains all of the supported functions and methods used
in this model. There are several methods in the <I>__db_env
</I>and
<I>__db</I>
structures that currently do not apply, such as the callbacks. Those
fields that are not applicable to the client model point to NULL to notify
the user of their error. Some method functions remain unchanged,
as well such as the error calls.
<P>The client code contains each method function that goes along with the
<A HREF="#Remote Procedure Calls">RPC
calls</A> described elsewhere. The client library also contains its
own version of <A HREF="../docs/api_c/env_create.html">db_env_create()</A>,
which does not result in any messages going over to the server (since we
do not yet know what server we are talking to). This function sets
up the pointers to the correct client functions.
<P>All of the method functions that handle the messaging have a basic flow
similar to this:
<UL>
<LI>
Local arg parsing that may be needed</LI>
<LI>
Marshalling the message header and the arguments we need to send to the
server</LI>
<LI>
Sending the message</LI>
<LI>
Receiving a reply</LI>
<LI>
Unmarshalling the reply</LI>
<LI>
Local results processing that may be needed</LI>
</UL>
<H2>
Generated Code</H2>
Almost all of the code is generated from a source file describing the interface
and an <I>awk</I> script. This awk script generates six (6)
files for us. It also modifies one. The files are:
<OL>
<LI>
Client file - The C source file created containing the client code.</LI>
<LI>
Client template file - The C template source file created containing interfaces
for handling client-local issues such as resource allocation, but with
a consistent interface with the client code generated.</LI>
<LI>
Server file - The C source file created containing the server code.</LI>
<LI>
Server template file - The C template source file created containing interfaces
for handling server-local issues such as resource allocation, calling into
the DB library but with a consistent interface with the server code generated.</LI>
<LI>
XDR file - The XDR message description file created.</LI>
<LI>
Server sed file - A sed script that contains commands to apply to the server
procedure file (i.e. the real source file that the server template file
becomes) so that minor interface changes can be consistently and easily
applied to the real code.</LI>
<LI>
Server procedure file - This is the file that is modified by the sed script
generated. It originated from the server template file.</LI>
</OL>
The awk script reads a source file, <I>db_server/rpc.src </I>that describes
each operation and what sorts of arguments it takes and what it returns
from the server. The syntax of the source file describes the interface
to that operation. There are four (4) parts to the syntax:
<OL>
<LI>
<B>BEGIN</B> <B><I>function version# codetype</I></B> - begins a new functional
interface for the given <B><I>function</I></B>. Each function has
a <B><I>version number</I></B>, currently all of them are at version number
one (1). The <B><I>code type</I></B> indicates to the awk script
what kind of code to generate. The choices are:</LI>
<UL>
<LI>
<B>CODE </B>- Generate all code, and return a status value. If specified,
the client code will simply return the status to the user upon completion
of the RPC call.</LI>
<LI>
<B>RETCODE </B>- Generate all code and call a return function in the client
template file to deal with client issues or with other returned items.
If specified, the client code generated will call a function of the form
<I>__dbcl_<name>_ret()
</I>where
<name> is replaced with the function name given here. This function
is placed in the template file because this indicates that something special
must occur on return. The arguments to this function are the same
as those for the client function, with the addition of the reply message
structure.</LI>
<LI>
<B>NOCLNTCODE - </B>Generate XDR and server code, but no corresponding
client code. (This is used for functions that are not named the same thing
on both sides. The only use of this at the moment is db_env_create
and db_create. The environment create call to the server is actually
called from the <A HREF="../docs/api_c/env_set_server.html">DBENV->set_server()</A>
method. The db_create code exists elsewhere in the library and we
modify that code for the client call.)</LI>
</UL>
<LI>
<B>ARG <I>RPC-type C-type varname [list-type]</I></B>- each line of this
describes an argument to the function. The argument is called <B><I>varname</I></B>.
The <B><I>C-type</I></B> given is what it should look like in the C code
generated, such as <B>DB *, u_int32_t, const char *</B>. The
<B><I>RPC-type</I></B>
is an indication about how the RPC request message should be constructed.
The RPC-types allowed are described below.</LI>
<LI>
<B>RET <I>RPC-type C-type varname [list-type]</I></B>- each line of this
describes what the server should return from this procedure call (in addition
to a status, which is always returned and should not be specified).
The argument is called <B><I>varname</I></B>. The <B><I>C-type</I></B>
given is what it should look like in the C code generated, such as <B>DB
*, u_int32_t, const char *</B>. The <B><I>RPC-type</I></B> is an
indication about how the RPC reply message should be constructed.
The RPC-types are described below.</LI>
<LI>
<B>END </B>- End the description of this function. The result is
that when the awk script encounters the <B>END</B> tag, it now has all
the information it needs to construct the generated code for this function.</LI>
</OL>
The <B><I>RPC-type</I></B> must be one of the following:
<UL>
<LI>
<B>IGNORE </B>- This argument is not passed to the server and should be
ignored when constructing the XDR code. <B>Only allowed for an ARG
specfication.</B></LI>
<LI>
<B>STRING</B> - This argument is a string.</LI>
<LI>
<B>INT </B>- This argument is an integer of some sort.</LI>
<LI>
<B>DBT </B>- This argument is a DBT, resulting in its decomposition into
the request message.</LI>
<LI>
<B>LIST</B> - This argument is an opaque list passed to the server (NULL-terminated).
If an argument of this type is given, it must have a <B><I>list-type</I></B>
specified that is one of:</LI>
<UL>
<LI>
<B>STRING</B></LI>
<LI>
<B>INT</B></LI>
<LI>
<B>ID</B>.</LI>
</UL>
<LI>
<B>ID</B> - This argument is an identifier.</LI>
</UL>
So, for example, the source for the DB->join RPC call looks like:
<PRE>BEGIN dbjoin 1 RETCODE
ARG ID DB * dbp
ARG LIST DBC ** curs ID
ARG IGNORE DBC ** dbcpp
ARG INT u_int32_t flags
RET ID long dbcid
END</PRE>
Our first line tells us we are writing the dbjoin function. It requires
special code on the client so we indicate that with the RETCODE.
This method takes four arguments. For the RPC request we need the
database ID from the dbp, we construct a NULL-terminated list of IDs for
the cursor list, we ignore the argument to return the cursor handle to
the user, and we pass along the flags. On the return, the reply contains
a status, by default, and additionally, it contains the ID of the newly
created cursor.
<H2>
Building and Installing</H2>
I need to verify with Don Anderson, but I believe we should just build
the server program, just like we do for db_stat, db_checkpoint, etc.
Basically it can be treated as a utility program from the building and
installation perspective.
<P>As mentioned early on, in the section on <A HREF="#DB Modifications">DB
Modifications</A>, we have a single library, but allowing the user to access
the client portion by sending a flag to <A HREF="../docs/api_c/env_create.html">db_env_create()</A>.
The Makefile is modified to include the new files.
<P>Testing is performed in two ways. First I have a new example program,
that should become part of the example directory. It is basically
a merging of ex_access.c and ex_env.c. This example is adequate to
test basic functionality, as it does just does database put/get calls and
appropriate open and close calls. However, in order to test the full
set of functions a more generalized scheme is required. For the moment,
I am going to modify the Tcl interface to accept the server information.
Nothing else should need to change in Tcl. Then we can either write
our own test modules or use a subset of the existing ones to test functionality
on a regular basis.
</BODY>
</HTML>
|