1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
|
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Mapping the terrain: theory and practice</title>
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
<meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
<link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
<link rel="up" href="intro.html" title="Chapter 1. Introduction" />
<link rel="prev" href="intro.html" title="Chapter 1. Introduction" />
<link rel="next" href="intro_dbis.html" title="What is Berkeley DB?" />
</head>
<body>
<div xmlns="" class="navheader">
<div class="libver">
<p>Library Version 12.1.6.1</p>
</div>
<table width="100%" summary="Navigation header">
<tr>
<th colspan="3" align="center">Mapping the terrain: theory and
practice</th>
</tr>
<tr>
<td width="20%" align="left"><a accesskey="p" href="intro.html">Prev</a> </td>
<th width="60%" align="center">Chapter 1. Introduction </th>
<td width="20%" align="right"> <a accesskey="n" href="intro_dbis.html">Next</a></td>
</tr>
</table>
<hr />
</div>
<div class="sect1" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h2 class="title" style="clear: both"><a id="intro_terrain"></a>Mapping the terrain: theory and
practice</h2>
</div>
</div>
</div>
<div class="toc">
<dl>
<dt>
<span class="sect2">
<a href="intro_terrain.html#idm3285464">Data access and data management</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="intro_terrain.html#idm2046400">Relational databases</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="intro_terrain.html#idm3331136">Object-oriented databases</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="intro_terrain.html#idm2845328">Network databases</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="intro_terrain.html#idm1493864">Clients and servers</a>
</span>
</dt>
</dl>
</div>
<p>
The first step in selecting a database system is figuring
out what the choices are. Decades of research and real-world
deployment have produced countless systems. We need to
organize them somehow to reduce the number of options.
</p>
<p>
One obvious way to group systems is to use the common labels
that vendors apply to them. The buzzwords here include
"network," "relational," "object-oriented," and "embedded,"
with some cross-fertilization like "object-relational" and
"embedded network". Understanding the buzzwords is important.
Each has some grounding in theory, but has also evolved into a
practical label for categorizing systems that work in a
certain way.
</p>
<p>
All database systems, regardless of the buzzwords that apply
to them, provide a few common services. All of them store
data, for example. We'll begin by exploring the common
services that all systems provide, and then examine the
differences among the different kinds of systems.
</p>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="idm3285464"></a>Data access and data management</h3>
</div>
</div>
</div>
<p>
Fundamentally, database systems provide two
services.
</p>
<p>
The first service is <span class="emphasis"><em>data access</em></span>.
Data access means adding new data to the database
(inserting), finding data of interest (searching),
changing data already stored (updating), and removing data
from the database (deleting). All databases provide these
services. How they work varies from category to category,
and depends on the record structure that the database
supports.
</p>
<p>
Each record in a database is a collection of values. For
example, the record for a Web site customer might include
a name, email address, shipping address, and payment
information. Records are usually stored in tables. Each
table holds records of the same kind. For example, the
<span class="bold"><strong>customer</strong></span> table at an
e-commerce Web site might store the customer records for
every person who shopped at the site. Often, database
records have a different structure from the structures or
instances supported by the programming language in which
an application is written. As a result, working with
records can mean:
</p>
<div class="itemizedlist">
<ul type="disc">
<li>
using database operations like searches and
updates on records; and
</li>
<li>
converting between programming language
structures and database record types in the
application.
</li>
</ul>
</div>
<p>
The second service is <span class="emphasis"><em>data
management</em></span>. Data management is more
complicated than data access. Providing good data
management services is the hard part of building a
database system. When you choose a database system to use
in an application you build, making sure it supports the
data management services you need is critical.
</p>
<p>
Data management services include allowing multiple users
to work on the database simultaneously (concurrency),
allowing multiple records to be changed instantaneously
(transactions), and surviving application and system
crashes (recovery). Different database systems offer
different data management services. Data management
services are entirely independent of the data access
services listed above. For example, nothing about
relational database theory requires that the system
support transactions, but most commercial relational
systems do.
</p>
<p>
Concurrency means that multiple users can operate on the
database at the same time. Support for concurrency ranges
from none (single-user access only) to complete (many
readers and writers working simultaneously).
</p>
<p>
Transactions permit users to make multiple changes
appear at once. For example, a transfer of funds between
bank accounts needs to be a transaction because the
balance in one account is reduced and the balance in the
other increases. If the reduction happened before the
increase, than a poorly-timed system crash could leave the
customer poorer; if the bank used the opposite order, then
the same system crash could make the customer richer.
Obviously, both the customer and the bank are best served
if both operations happen at the same instant.
</p>
<p>
Transactions have well-defined properties in database
systems. They are <span class="emphasis"><em>atomic</em></span>, so that the
changes happen all at once or not at all. They are
<span class="emphasis"><em>consistent</em></span>, so that the database
is in a legal state when the transaction begins and when
it ends. They are typically <span class="emphasis"><em>isolated</em></span>,
which means that any other users in the database cannot
interfere with them while they are in progress. And they
are <span class="emphasis"><em>durable</em></span>, so that if the system or
application crashes after a transaction finishes, the
changes are not lost. Together, the properties of
<span class="emphasis"><em>atomicity</em></span>,
<span class="emphasis"><em>consistency</em></span>,
<span class="emphasis"><em>isolation</em></span>, and
<span class="emphasis"><em>durability</em></span> are known as the ACID
properties.
</p>
<p>
As is the case for concurrency, support for transactions
varies among databases. Some offer atomicity without
making guarantees about durability. Some ignore
isolatability, especially in single-user systems; there's
no need to isolate other users from the effects of changes
when there are no other users.
</p>
<p>
Another important data management service is recovery.
Strictly speaking, recovery is a procedure that the system
carries out when it starts up. The purpose of recovery is
to guarantee that the database is complete and usable.
This is most important after a system or application
crash, when the database may have been damaged. The
recovery process guarantees that the internal structure of
the database is good. Recovery usually means that any
completed transactions are checked, and any lost changes
are reapplied to the database. At the end of the recovery
process, applications can use the database as if there had
been no interruption in service.
</p>
<p>
Finally, there are a number of data management services
that permit copying of data. For example, most database
systems are able to import data from other sources, and to
export it for use elsewhere. Also, most systems provide
some way to back up databases and to restore in the event
of a system failure that damages the database. Many
commercial systems allow <span class="emphasis"><em>hot backups</em></span>,
so that users can back up databases while they are in use.
Many applications must run without interruption, and
cannot be shut down for backups.
</p>
<p>
A particular database system may provide other data
management services. Some provide browsers that show
database structure and contents. Some include tools that
enforce data integrity rules, such as the rule that no
employee can have a negative salary. These data management
services are not common to all systems, however.
Concurrency, recovery, and transactions are the data
management services that most database vendors
support.
</p>
<p>
Deciding what kind of database to use means
understanding the data access and data management services
that your application needs. Berkeley DB is an embedded
database that supports fairly simple data access with a
rich set of data management services. To highlight its
strengths and weaknesses, we can compare it to other
database system categories.
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="idm2046400"></a>Relational databases</h3>
</div>
</div>
</div>
<p>
Relational databases are probably the best-known
database variant, because of the success of companies like
Oracle. Relational databases are based on the mathematical
field of set theory. The term "relation" is really just a
synonym for "set" -- a relation is just a set of records
or, in our terminology, a table. One of the main
innovations in early relational systems was to insulate
the programmer from the physical organization of the
database. Rather than walking through arrays of records or
traversing pointers, programmers make statements about
tables in a high-level language, and the system executes
those statements.
</p>
<p>
Relational databases operate on
<span class="emphasis"><em>tuples</em></span>, or records, composed of
values of several different data types, including
integers, character strings, and others. Operations
include searching for records whose values satisfy some
criteria, updating records, and so on.
</p>
<p>
Virtually all relational databases use the Structured
Query Language, or SQL. This language permits people and
computer programs to work with the database by writing
simple statements. The database engine reads those
statements and determines how to satisfy them on the
tables in the database.
</p>
<p>
SQL is the main practical advantage of relational
database systems. Rather than writing a computer program
to find records of interest, the relational system user
can just type a query in a simple syntax, and let the
engine do the work. This gives users enormous flexibility;
they do not need to decide in advance what kind of
searches they want to do, and they do not need expensive
programmers to find the data they need. Learning SQL
requires some effort, but it's much simpler than a
full-blown high-level programming language for most
purposes. And there are a lot of programmers who have
already learned SQL.
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="idm3331136"></a>Object-oriented databases</h3>
</div>
</div>
</div>
<p>
Object-oriented databases are less common than
relational systems, but are still fairly widespread. Most
object-oriented databases were originally conceived as
persistent storage systems closely wedded to particular
high-level programming languages like C++. With the spread
of Java, most now support more than one programming
language, but object-oriented database systems
fundamentally provide the same class and method
abstractions as do object-oriented programming
languages.
</p>
<p>
Many object-oriented systems allow applications to
operate on objects uniformly, whether they are in memory
or on disk. These systems create the illusion that all
objects are in memory all the time. The advantage to
object-oriented programmers who simply want object storage
and retrieval is clear. They need never be aware of
whether an object is in memory or not. The application
simply uses objects, and the database system moves them
between disk and memory transparently. All of the
operations on an object, and all its behavior, are
determined by the programming language.
</p>
<p>
Object-oriented databases aren't nearly as widely
deployed as relational systems. In order to attract
developers who understand relational systems, many of the
object-oriented systems have added support for query
languages very much like SQL. In practice, though,
object-oriented databases are mostly used for persistent
storage of objects in C++ and Java programs.
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="idm2845328"></a>Network databases</h3>
</div>
</div>
</div>
<p>
The "network model" is a fairly old technique for
managing and navigating application data. Network
databases are designed to make pointer traversal very
fast. Every record stored in a network database is allowed
to contain pointers to other records. These pointers are
generally physical addresses, so fetching the record to
which it refers just means reading it from disk by its
disk address.
</p>
<p>
Network database systems generally permit records to
contain integers, floating point numbers, and character
strings, as well as references to other records. An
application can search for records of interest. After
retrieving a record, the application can fetch any record
to which it refers, quickly.
</p>
<p>
Pointer traversal is fast because most network systems
use physical disk addresses as pointers. When the
application wants to fetch a record, the database system
uses the address to fetch exactly the right string of
bytes from the disk. This requires only a single disk
access in all cases. Other systems, by contrast, often
must do more than one disk read to find a particular
record.
</p>
<p>
The key advantage of the network model is also its main
drawback. The fact that pointer traversal is so fast means
that applications that do it will run well. On the other
hand, storing pointers all over the database makes it very
hard to reorganize the database. In effect, once you store
a pointer to a record, it is difficult to move that record
elsewhere. Some network databases handle this by leaving
forwarding pointers behind, but this defeats the speed
advantage of doing a single disk access in the first
place. Other network databases find, and fix, all the
pointers to a record when it moves, but this makes
reorganization very expensive. Reorganization is often
necessary in databases, since adding and deleting records
over time will consume space that cannot be reclaimed
without reorganizing. Without periodic reorganization to
compact network databases, they can end up with a
considerable amount of wasted space.
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="idm1493864"></a>Clients and servers</h3>
</div>
</div>
</div>
<p>
Database vendors have two choices for system
architecture. They can build a server to which remote
clients connect, and do all the database management inside
the server. Alternatively, they can provide a module that
links directly into the application, and does all database
management locally. In either case, the application
developer needs some way of communicating with the
database (generally, an Application Programming Interface
(API) that does work in the process or that communicates
with a server to get work done).
</p>
<p>
Almost all commercial database products are implemented
as servers, and applications connect to them as clients.
Servers have several features that make them
attractive.
</p>
<p>
First, because all of the data is managed by a separate
process, and possibly on a separate machine, it's easy to
isolate the database server from bugs and crashes in the
application.
</p>
<p>
Second, because some database products (particularly
relational engines) are quite large, splitting them off as
separate server processes keeps applications small, which
uses less disk space and memory. Relational engines
include code to parse SQL statements, to analyze them and
produce plans for execution, to optimize the plans, and to
execute them.
</p>
<p>
Finally, by storing all the data in one place and
managing it with a single server, it's easier for
organizations to back up, protect, and set policies on
their databases. The enterprise databases for large
companies often have several full-time administrators
caring for them, making certain that applications run
quickly, granting and denying access to users, and making
backups.
</p>
<p>
However, centralized administration can be a
disadvantage in some cases. In particular, if a programmer
wants to build an application that uses a database for
storage of important information, then shipping and
supporting the application is much harder. The end user
needs to install and administer a separate database
server, and the programmer must support not just one
product, but two. Adding a server process to the
application creates new opportunity for installation
mistakes and run-time problems.
</p>
</div>
</div>
<div class="navfooter">
<hr />
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left"><a accesskey="p" href="intro.html">Prev</a> </td>
<td width="20%" align="center">
<a accesskey="u" href="intro.html">Up</a>
</td>
<td width="40%" align="right"> <a accesskey="n" href="intro_dbis.html">Next</a></td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Chapter 1. Introduction </td>
<td width="20%" align="center">
<a accesskey="h" href="index.html">Home</a>
</td>
<td width="40%" align="right" valign="top"> What is Berkeley DB?</td>
</tr>
</table>
</div>
</body>
</html>
|