diff options
Diffstat (limited to 'bdb/docs/ref/intro/terrain.html')
-rw-r--r-- | bdb/docs/ref/intro/terrain.html | 248 |
1 files changed, 0 insertions, 248 deletions
diff --git a/bdb/docs/ref/intro/terrain.html b/bdb/docs/ref/intro/terrain.html deleted file mode 100644 index f2a7089135c..00000000000 --- a/bdb/docs/ref/intro/terrain.html +++ /dev/null @@ -1,248 +0,0 @@ -<!--$Id: terrain.so,v 10.3 2000/12/14 20:52:03 bostic Exp $--> -<!--Copyright 1997, 1998, 1999, 2000 by Sleepycat Software, Inc.--> -<!--All rights reserved.--> -<html> -<head> -<title>Berkeley DB Reference Guide: Mapping the terrain: theory and practice</title> -<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit."> -<meta name="keywords" content="embedded,database,programmatic,toolkit,b+tree,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,java,C,C++"> -</head> -<body bgcolor=white> -<table><tr valign=top> -<td><h3><dl><dt>Berkeley DB Reference Guide:<dd>Introduction</dl></h3></td> -<td width="1%"><a href="../../ref/intro/data.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../../ref/toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../../ref/intro/dbis.html"><img src="../../images/next.gif" alt="Next"></a> -</td></tr></table> -<p> -<h1 align=center>Mapping the terrain: theory and practice</h1> -<p>The first step in selecting a database system is figuring out what the -choices are. Decades of research and real-world deployment have produced -countless systems. We need to organize them somehow to reduce the number -of options. -<p>One obvious way to group systems is to use the common labels that -vendors apply to them. The buzzwords here include "network," -"relational," "object-oriented," and "embedded," with some -cross-fertilization like "object-relational" and "embedded network". -Understanding the buzzwords is important. Each has some grounding in -theory, but has also evolved into a practical label for categorizing -systems that work in a certain way. -<p>All database systems, regardless of the buzzwords that apply to them, -provide a few common services. All of them store data, for example. -We'll begin by exploring the common services that all systems provide, -and then examine the differences among the different kinds of systems. -<h3>Data access and data management</h3> -<p>Fundamentally, database systems provide two services. -<p>The first service is <i>data access</i>. Data access means adding -new data to the database (inserting), finding data of interest -(searching), changing data already stored (updating), and removing data -from the database (deleting). All databases provide these services. How -they work varies from category to category, and depends on the record -structure that the database supports. -<p>Each record in a database is a collection of values. For example, the -record for a Web site customer might include a name, email address, -shipping address, and payment information. Records are usually stored -in tables. Each table holds records of the same kind. For example, the -<b>customer</b> table at an e-commerce Web site might store the -customer records for every person who shopped at the site. Often, -database records have a different structure from the structures or -instances supported by the programming language in which an application -is written. As a result, working with records can mean: -<ul type=disc> -<li>using database operations like searches and updates on records; and -<li>converting between programming language structures and database record -types in the application. -</ul> -<p>The second service is <i>data management</i>. Data management is -more complicated than data access. Providing good data management -services is the hard part of building a database system. When you -choose a database system to use in an application you build, making sure -it supports the data management services you need is critical. -<p>Data management services include allowing multiple users to work on the -database simultaneously (concurrency), allowing multiple records to be -changed instantaneously (transactions), and surviving application and -system crashes (recovery). Different database systems offer different -data management services. Data management services are entirely -independent of the data access services listed above. For example, -nothing about relational database theory requires that the system -support transactions, but most commercial relational systems do. -<p>Concurrency means that multiple users can operate on the database at -the same time. Support for concurrency ranges from none (single-user -access only) to complete (many readers and writers working -simultaneously). -<p>Transactions permit users to make multiple changes appear at once. For -example, a transfer of funds between bank accounts needs to be a -transaction because the balance in one account is reduced and the -balance in the other increases. If the reduction happened before the -increase, than a poorly-timed system crash could leave the customer -poorer; if the bank used the opposite order, then the same system crash -could make the customer richer. Obviously, both the customer and the -bank are best served if both operations happen at the same instant. -<p>Transactions have well-defined properties in database systems. They are -<i>atomic</i>, so that the changes happen all at once or not at all. -They are <i>consistent</i>, so that the database is in a legal state -when the transaction begins and when it ends. They are typically -<i>isolated</i>, which means that any other users in the database -cannot interfere with them while they are in progress. And they are -<i>durable</i>, so that if the system or application crashes after -a transaction finishes, the changes are not lost. Together, the -properties of <i>atomicity</i>, <i>consistency</i>, -<i>isolation</i>, and <i>durability</i> are known as the ACID -properties. -<p>As is the case for concurrency, support for transactions varies among -databases. Some offer atomicity without making guarantees about -durability. Some ignore isolatability, especially in single-user -systems; there's no need to isolate other users from the effects of -changes when there are no other users. -<p>Another important data management service is recovery. Strictly -speaking, recovery is a procedure that the system carries out when it -starts up. The purpose of recovery is to guarantee that the database is -complete and usable. This is most important after a system or -application crash, when the database may have been damaged. The recovery -process guarantees that the internal structure of the database is good. -Recovery usually means that any completed transactions are checked, and -any lost changes are reapplied to the database. At the end of the -recovery process, applications can use the database as if there had been -no interruption in service. -<p>Finally, there are a number of data management services that permit -copying of data. For example, most database systems are able to import -data from other sources, and to export it for use elsewhere. Also, most -systems provide some way to back up databases and to restore in the -event of a system failure that damages the database. Many commercial -systems allow <i>hot backups</i>, so that users can back up -databases while they are in use. Many applications must run without -interruption, and cannot be shut down for backups. -<p>A particular database system may provide other data management services. -Some provide browsers that show database structure and contents. Some -include tools that enforce data integrity rules, such as the rule that -no employee can have a negative salary. These data management services -are not common to all systems, however. Concurrency, recovery, and -transactions are the data management services that most database vendors -support. -<p>Deciding what kind of database to use means understanding the data -access and data management services that your application needs. Berkeley DB -is an embedded database that supports fairly simple data access with a -rich set of data management services. To highlight its strengths and -weaknesses, we can compare it to other database system categories. -<h3>Relational databases</h3> -<p>Relational databases are probably the best-known database variant, -because of the success of companies like Oracle. Relational databases -are based on the mathematical field of set theory. The term "relation" -is really just a synonym for "set" -- a relation is just a set of -records or, in our terminology, a table. One of the main innovations in -early relational systems was to insulate the programmer from the -physical organization of the database. Rather than walking through -arrays of records or traversing pointers, programmers make statements -about tables in a high-level language, and the system executes those -statements. -<p>Relational databases operate on <i>tuples</i>, or records, composed -of values of several different data types, including integers, character -strings, and others. Operations include searching for records whose -values satisfy some criteria, updating records, and so on. -<p>Virtually all relational databases use the Structured Query Language, -or SQL. This language permits people and computer programs to work with -the database by writing simple statements. The database engine reads -those statements and determines how to satisfy them on the tables in -the database. -<p>SQL is the main practical advantage of relational database systems. -Rather than writing a computer program to find records of interest, the -relational system user can just type a query in a simple syntax, and -let the engine do the work. This gives users enormous flexibility; they -do not need to decide in advance what kind of searches they want to do, -and they do not need expensive programmers to find the data they need. -Learning SQL requires some effort, but it's much simpler than a -full-blown high-level programming language for most purposes. And there -are a lot of programmers who have already learned SQL. -<h3>Object-oriented databases</h3> -<p>Object-oriented databases are less common than relational systems, but -are still fairly widespread. Most object-oriented databases were -originally conceived as persistent storage systems closely wedded to -particular high-level programming languages like C++. With the spread -of Java, most now support more than one programming language, but -object-oriented database systems fundamentally provide the same class -and method abstractions as do object-oriented programming languages. -<p>Many object-oriented systems allow applications to operate on objects -uniformly, whether they are in memory or on disk. These systems create -the illusion that all objects are in memory all the time. The advantage -to object-oriented programmers who simply want object storage and -retrieval is clear. They need never be aware of whether an object is in -memory or not. The application simply uses objects, and the database -system moves them between disk and memory transparently. All of the -operations on an object, and all its behavior, are determined by the -programming language. -<p>Object-oriented databases aren't nearly as widely deployed as relational -systems. In order to attract developers who understand relational -systems, many of the object-oriented systems have added support for -query languages very much like SQL. In practice, though, object-oriented -databases are mostly used for persistent storage of objects in C++ and -Java programs. -<h3>Network databases</h3> -<p>The "network model" is a fairly old technique for managing and -navigating application data. Network databases are designed to make -pointer traversal very fast. Every record stored in a network database -is allowed to contain pointers to other records. These pointers are -generally physical addresses, so fetching the referenced record just -means reading it from disk by its disk address. -<p>Network database systems generally permit records to contain integers, -floating point numbers, and character strings, as well as references to -other records. An application can search for records of interest. After -retrieving a record, the application can fetch any referenced record -quickly. -<p>Pointer traversal is fast because most network systems use physical disk -addresses as pointers. When the application wants to fetch a record, -the database system uses the address to fetch exactly the right string -of bytes from the disk. This requires only a single disk access in all -cases. Other systems, by contrast, often must do more than one disk read -to find a particular record. -<p>The key advantage of the network model is also its main drawback. The -fact that pointer traversal is so fast means that applications that do -it will run well. On the other hand, storing pointers all over the -database makes it very hard to reorganize the database. In effect, once -you store a pointer to a record, it is difficult to move that record -elsewhere. Some network databases handle this by leaving forwarding -pointers behind, but this defeats the speed advantage of doing a single -disk access in the first place. Other network databases find, and fix, -all the pointers to a record when it moves, but this makes -reorganization very expensive. Reorganization is often necessary in -databases, since adding and deleting records over time will consume -space that cannot be reclaimed without reorganizing. Without periodic -reorganization to compact network databases, they can end up with a -considerable amount of wasted space. -<h3>Clients and servers</h3> -<p>Database vendors have two choices for system architecture. They can -build a server to which remote clients connect, and do all the database -management inside the server. Alternatively, they can provide a module -that links directly into the application, and does all database -management locally. In either case, the application developer needs -some way of communicating with the database (generally, an Application -Programming Interface (API) that does work in the process or that -communicates with a server to get work done). -<p>Almost all commercial database products are implemented as servers, and -applications connect to them as clients. Servers have several features -that make them attractive. -<p>First, because all of the data is managed by a separate process, and -possibly on a separate machine, it's easy to isolate the database server -from bugs and crashes in the application. -<p>Second, because some database products (particularly relational engines) -are quite large, splitting them off as separate server processes keeps -applications small, which uses less disk space and memory. Relational -engines include code to parse SQL statements, to analyze them and -produce plans for execution, to optimize the plans, and to execute -them. -<p>Finally, by storing all the data in one place and managing it with a -single server, it's easier for organizations to back up, protect, and -set policies on their databases. The enterprise databases for large -companies often have several full-time administrators caring for them, -making certain that applications run quickly, granting and denying -access to users, and making backups. -<p>However, centralized administration can be a disadvantage in some cases. -In particular, if a programmer wants to build an application that uses -a database for storage of important information, then shipping and -supporting the application is much harder. The end user needs to install -and administer a separate database server, and the programmer must -support not just one product, but two. Adding a server process to the -application creates new opportunity for installation mistakes and -run-time problems. -<table><tr><td><br></td><td width="1%"><a href="../../ref/intro/data.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../../ref/toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../../ref/intro/dbis.html"><img src="../../images/next.gif" alt="Next"></a> -</td></tr></table> -<p><font size=1><a href="http://www.sleepycat.com">Copyright Sleepycat Software</a></font> -</body> -</html> |