summaryrefslogtreecommitdiff
path: root/bdb/docs/ref/intro/terrain.html
diff options
context:
space:
mode:
Diffstat (limited to 'bdb/docs/ref/intro/terrain.html')
-rw-r--r--bdb/docs/ref/intro/terrain.html248
1 files changed, 0 insertions, 248 deletions
diff --git a/bdb/docs/ref/intro/terrain.html b/bdb/docs/ref/intro/terrain.html
deleted file mode 100644
index f2a7089135c..00000000000
--- a/bdb/docs/ref/intro/terrain.html
+++ /dev/null
@@ -1,248 +0,0 @@
-<!--$Id: terrain.so,v 10.3 2000/12/14 20:52:03 bostic Exp $-->
-<!--Copyright 1997, 1998, 1999, 2000 by Sleepycat Software, Inc.-->
-<!--All rights reserved.-->
-<html>
-<head>
-<title>Berkeley DB Reference Guide: Mapping the terrain: theory and practice</title>
-<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit.">
-<meta name="keywords" content="embedded,database,programmatic,toolkit,b+tree,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,java,C,C++">
-</head>
-<body bgcolor=white>
-<table><tr valign=top>
-<td><h3><dl><dt>Berkeley DB Reference Guide:<dd>Introduction</dl></h3></td>
-<td width="1%"><a href="../../ref/intro/data.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../../ref/toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../../ref/intro/dbis.html"><img src="../../images/next.gif" alt="Next"></a>
-</td></tr></table>
-<p>
-<h1 align=center>Mapping the terrain: theory and practice</h1>
-<p>The first step in selecting a database system is figuring out what the
-choices are. Decades of research and real-world deployment have produced
-countless systems. We need to organize them somehow to reduce the number
-of options.
-<p>One obvious way to group systems is to use the common labels that
-vendors apply to them. The buzzwords here include "network,"
-"relational," "object-oriented," and "embedded," with some
-cross-fertilization like "object-relational" and "embedded network".
-Understanding the buzzwords is important. Each has some grounding in
-theory, but has also evolved into a practical label for categorizing
-systems that work in a certain way.
-<p>All database systems, regardless of the buzzwords that apply to them,
-provide a few common services. All of them store data, for example.
-We'll begin by exploring the common services that all systems provide,
-and then examine the differences among the different kinds of systems.
-<h3>Data access and data management</h3>
-<p>Fundamentally, database systems provide two services.
-<p>The first service is <i>data access</i>. Data access means adding
-new data to the database (inserting), finding data of interest
-(searching), changing data already stored (updating), and removing data
-from the database (deleting). All databases provide these services. How
-they work varies from category to category, and depends on the record
-structure that the database supports.
-<p>Each record in a database is a collection of values. For example, the
-record for a Web site customer might include a name, email address,
-shipping address, and payment information. Records are usually stored
-in tables. Each table holds records of the same kind. For example, the
-<b>customer</b> table at an e-commerce Web site might store the
-customer records for every person who shopped at the site. Often,
-database records have a different structure from the structures or
-instances supported by the programming language in which an application
-is written. As a result, working with records can mean:
-<ul type=disc>
-<li>using database operations like searches and updates on records; and
-<li>converting between programming language structures and database record
-types in the application.
-</ul>
-<p>The second service is <i>data management</i>. Data management is
-more complicated than data access. Providing good data management
-services is the hard part of building a database system. When you
-choose a database system to use in an application you build, making sure
-it supports the data management services you need is critical.
-<p>Data management services include allowing multiple users to work on the
-database simultaneously (concurrency), allowing multiple records to be
-changed instantaneously (transactions), and surviving application and
-system crashes (recovery). Different database systems offer different
-data management services. Data management services are entirely
-independent of the data access services listed above. For example,
-nothing about relational database theory requires that the system
-support transactions, but most commercial relational systems do.
-<p>Concurrency means that multiple users can operate on the database at
-the same time. Support for concurrency ranges from none (single-user
-access only) to complete (many readers and writers working
-simultaneously).
-<p>Transactions permit users to make multiple changes appear at once. For
-example, a transfer of funds between bank accounts needs to be a
-transaction because the balance in one account is reduced and the
-balance in the other increases. If the reduction happened before the
-increase, than a poorly-timed system crash could leave the customer
-poorer; if the bank used the opposite order, then the same system crash
-could make the customer richer. Obviously, both the customer and the
-bank are best served if both operations happen at the same instant.
-<p>Transactions have well-defined properties in database systems. They are
-<i>atomic</i>, so that the changes happen all at once or not at all.
-They are <i>consistent</i>, so that the database is in a legal state
-when the transaction begins and when it ends. They are typically
-<i>isolated</i>, which means that any other users in the database
-cannot interfere with them while they are in progress. And they are
-<i>durable</i>, so that if the system or application crashes after
-a transaction finishes, the changes are not lost. Together, the
-properties of <i>atomicity</i>, <i>consistency</i>,
-<i>isolation</i>, and <i>durability</i> are known as the ACID
-properties.
-<p>As is the case for concurrency, support for transactions varies among
-databases. Some offer atomicity without making guarantees about
-durability. Some ignore isolatability, especially in single-user
-systems; there's no need to isolate other users from the effects of
-changes when there are no other users.
-<p>Another important data management service is recovery. Strictly
-speaking, recovery is a procedure that the system carries out when it
-starts up. The purpose of recovery is to guarantee that the database is
-complete and usable. This is most important after a system or
-application crash, when the database may have been damaged. The recovery
-process guarantees that the internal structure of the database is good.
-Recovery usually means that any completed transactions are checked, and
-any lost changes are reapplied to the database. At the end of the
-recovery process, applications can use the database as if there had been
-no interruption in service.
-<p>Finally, there are a number of data management services that permit
-copying of data. For example, most database systems are able to import
-data from other sources, and to export it for use elsewhere. Also, most
-systems provide some way to back up databases and to restore in the
-event of a system failure that damages the database. Many commercial
-systems allow <i>hot backups</i>, so that users can back up
-databases while they are in use. Many applications must run without
-interruption, and cannot be shut down for backups.
-<p>A particular database system may provide other data management services.
-Some provide browsers that show database structure and contents. Some
-include tools that enforce data integrity rules, such as the rule that
-no employee can have a negative salary. These data management services
-are not common to all systems, however. Concurrency, recovery, and
-transactions are the data management services that most database vendors
-support.
-<p>Deciding what kind of database to use means understanding the data
-access and data management services that your application needs. Berkeley DB
-is an embedded database that supports fairly simple data access with a
-rich set of data management services. To highlight its strengths and
-weaknesses, we can compare it to other database system categories.
-<h3>Relational databases</h3>
-<p>Relational databases are probably the best-known database variant,
-because of the success of companies like Oracle. Relational databases
-are based on the mathematical field of set theory. The term "relation"
-is really just a synonym for "set" -- a relation is just a set of
-records or, in our terminology, a table. One of the main innovations in
-early relational systems was to insulate the programmer from the
-physical organization of the database. Rather than walking through
-arrays of records or traversing pointers, programmers make statements
-about tables in a high-level language, and the system executes those
-statements.
-<p>Relational databases operate on <i>tuples</i>, or records, composed
-of values of several different data types, including integers, character
-strings, and others. Operations include searching for records whose
-values satisfy some criteria, updating records, and so on.
-<p>Virtually all relational databases use the Structured Query Language,
-or SQL. This language permits people and computer programs to work with
-the database by writing simple statements. The database engine reads
-those statements and determines how to satisfy them on the tables in
-the database.
-<p>SQL is the main practical advantage of relational database systems.
-Rather than writing a computer program to find records of interest, the
-relational system user can just type a query in a simple syntax, and
-let the engine do the work. This gives users enormous flexibility; they
-do not need to decide in advance what kind of searches they want to do,
-and they do not need expensive programmers to find the data they need.
-Learning SQL requires some effort, but it's much simpler than a
-full-blown high-level programming language for most purposes. And there
-are a lot of programmers who have already learned SQL.
-<h3>Object-oriented databases</h3>
-<p>Object-oriented databases are less common than relational systems, but
-are still fairly widespread. Most object-oriented databases were
-originally conceived as persistent storage systems closely wedded to
-particular high-level programming languages like C++. With the spread
-of Java, most now support more than one programming language, but
-object-oriented database systems fundamentally provide the same class
-and method abstractions as do object-oriented programming languages.
-<p>Many object-oriented systems allow applications to operate on objects
-uniformly, whether they are in memory or on disk. These systems create
-the illusion that all objects are in memory all the time. The advantage
-to object-oriented programmers who simply want object storage and
-retrieval is clear. They need never be aware of whether an object is in
-memory or not. The application simply uses objects, and the database
-system moves them between disk and memory transparently. All of the
-operations on an object, and all its behavior, are determined by the
-programming language.
-<p>Object-oriented databases aren't nearly as widely deployed as relational
-systems. In order to attract developers who understand relational
-systems, many of the object-oriented systems have added support for
-query languages very much like SQL. In practice, though, object-oriented
-databases are mostly used for persistent storage of objects in C++ and
-Java programs.
-<h3>Network databases</h3>
-<p>The "network model" is a fairly old technique for managing and
-navigating application data. Network databases are designed to make
-pointer traversal very fast. Every record stored in a network database
-is allowed to contain pointers to other records. These pointers are
-generally physical addresses, so fetching the referenced record just
-means reading it from disk by its disk address.
-<p>Network database systems generally permit records to contain integers,
-floating point numbers, and character strings, as well as references to
-other records. An application can search for records of interest. After
-retrieving a record, the application can fetch any referenced record
-quickly.
-<p>Pointer traversal is fast because most network systems use physical disk
-addresses as pointers. When the application wants to fetch a record,
-the database system uses the address to fetch exactly the right string
-of bytes from the disk. This requires only a single disk access in all
-cases. Other systems, by contrast, often must do more than one disk read
-to find a particular record.
-<p>The key advantage of the network model is also its main drawback. The
-fact that pointer traversal is so fast means that applications that do
-it will run well. On the other hand, storing pointers all over the
-database makes it very hard to reorganize the database. In effect, once
-you store a pointer to a record, it is difficult to move that record
-elsewhere. Some network databases handle this by leaving forwarding
-pointers behind, but this defeats the speed advantage of doing a single
-disk access in the first place. Other network databases find, and fix,
-all the pointers to a record when it moves, but this makes
-reorganization very expensive. Reorganization is often necessary in
-databases, since adding and deleting records over time will consume
-space that cannot be reclaimed without reorganizing. Without periodic
-reorganization to compact network databases, they can end up with a
-considerable amount of wasted space.
-<h3>Clients and servers</h3>
-<p>Database vendors have two choices for system architecture. They can
-build a server to which remote clients connect, and do all the database
-management inside the server. Alternatively, they can provide a module
-that links directly into the application, and does all database
-management locally. In either case, the application developer needs
-some way of communicating with the database (generally, an Application
-Programming Interface (API) that does work in the process or that
-communicates with a server to get work done).
-<p>Almost all commercial database products are implemented as servers, and
-applications connect to them as clients. Servers have several features
-that make them attractive.
-<p>First, because all of the data is managed by a separate process, and
-possibly on a separate machine, it's easy to isolate the database server
-from bugs and crashes in the application.
-<p>Second, because some database products (particularly relational engines)
-are quite large, splitting them off as separate server processes keeps
-applications small, which uses less disk space and memory. Relational
-engines include code to parse SQL statements, to analyze them and
-produce plans for execution, to optimize the plans, and to execute
-them.
-<p>Finally, by storing all the data in one place and managing it with a
-single server, it's easier for organizations to back up, protect, and
-set policies on their databases. The enterprise databases for large
-companies often have several full-time administrators caring for them,
-making certain that applications run quickly, granting and denying
-access to users, and making backups.
-<p>However, centralized administration can be a disadvantage in some cases.
-In particular, if a programmer wants to build an application that uses
-a database for storage of important information, then shipping and
-supporting the application is much harder. The end user needs to install
-and administer a separate database server, and the programmer must
-support not just one product, but two. Adding a server process to the
-application creates new opportunity for installation mistakes and
-run-time problems.
-<table><tr><td><br></td><td width="1%"><a href="../../ref/intro/data.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../../ref/toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../../ref/intro/dbis.html"><img src="../../images/next.gif" alt="Next"></a>
-</td></tr></table>
-<p><font size=1><a href="http://www.sleepycat.com">Copyright Sleepycat Software</a></font>
-</body>
-</html>