Date: Tue, 17 Feb 2015 17:25:57 +0000 Subject: Imported from /home/lorry/working-area/delta_berkeleydb/db-6.1.23.tar.gz. --- docs/programmer_reference/intro_terrain.html | 608 ++++++++++++++++----------- 1 file changed, 369 insertions(+), 239 deletions(-) (limited to 'docs/programmer_reference/intro_terrain.html') diff --git a/docs/programmer_reference/intro_terrain.html b/docs/programmer_reference/intro_terrain.html index 4e5ce206..68b7e4cb 100644 --- a/docs/programmer_reference/intro_terrain.html +++ b/docs/programmer_reference/intro_terrain.html @@ -14,17 +14,16 @@

Library Version 11.2.5.3

Library Version 12.1.6.1

- + - +

Mapping the terrain: theory and practice			Mapping the terrain: theory and + practice
Prev	Chapter 1. - Introduction -	Chapter 1. Introduction	Next

@@ -34,7 +33,8 @@

Mapping the terrain: theory and practice

Mapping the terrain: theory and + practice

@@ -42,299 +42,431 @@

- Data access and data management + Data access and data management
- Relational databases + Relational databases
- Object-oriented databases + Object-oriented databases
- Network databases + Network databases
- Clients and servers + Clients and servers

The first step in selecting a database system is figuring out what the -choices are. Decades of research and real-world deployment have produced -countless systems. We need to organize them somehow to reduce the number -of options.

One obvious way to group systems is to use the common labels that -vendors apply to them. The buzzwords here include "network," -"relational," "object-oriented," and "embedded," with some -cross-fertilization like "object-relational" and "embedded network". -Understanding the buzzwords is important. Each has some grounding in -theory, but has also evolved into a practical label for categorizing -systems that work in a certain way.

All database systems, regardless of the buzzwords that apply to them, -provide a few common services. All of them store data, for example. -We'll begin by exploring the common services that all systems provide, -and then examine the differences among the different kinds of systems.

+ The first step in selecting a database system is figuring + out what the choices are. Decades of research and real-world + deployment have produced countless systems. We need to + organize them somehow to reduce the number of options. +

+ One obvious way to group systems is to use the common labels + that vendors apply to them. The buzzwords here include + "network," "relational," "object-oriented," and "embedded," + with some cross-fertilization like "object-relational" and + "embedded network". Understanding the buzzwords is important. + Each has some grounding in theory, but has also evolved into a + practical label for categorizing systems that work in a + certain way. +

+ All database systems, regardless of the buzzwords that apply + to them, provide a few common services. All of them store + data, for example. We'll begin by exploring the common + services that all systems provide, and then examine the + differences among the different kinds of systems. +

Data access and data management

Fundamentally, database systems provide two services.

The first service is data access. Data access means adding -new data to the database (inserting), finding data of interest -(searching), changing data already stored (updating), and removing data -from the database (deleting). All databases provide these services. How -they work varies from category to category, and depends on the record -structure that the database supports.

Each record in a database is a collection of values. For example, the -record for a Web site customer might include a name, email address, -shipping address, and payment information. Records are usually stored -in tables. Each table holds records of the same kind. For example, the -customer table at an e-commerce Web site might store the -customer records for every person who shopped at the site. Often, -database records have a different structure from the structures or -instances supported by the programming language in which an application -is written. As a result, working with records can mean:

+ Fundamentally, database systems provide two + services. +

+ The first service is data access. + Data access means adding new data to the database + (inserting), finding data of interest (searching), + changing data already stored (updating), and removing data + from the database (deleting). All databases provide these + services. How they work varies from category to category, + and depends on the record structure that the database + supports. +

+ Each record in a database is a collection of values. For + example, the record for a Web site customer might include + a name, email address, shipping address, and payment + information. Records are usually stored in tables. Each + table holds records of the same kind. For example, the + customer table at an + e-commerce Web site might store the customer records for + every person who shopped at the site. Often, database + records have a different structure from the structures or + instances supported by the programming language in which + an application is written. As a result, working with + records can mean: +

using database operations like searches and updates on records; and
converting between programming language structures and database record -types in the application.
+ using database operations like searches and + updates on records; and +
+ converting between programming language + structures and database record types in the + application. +

The second service is data management. Data management is -more complicated than data access. Providing good data management -services is the hard part of building a database system. When you -choose a database system to use in an application you build, making sure -it supports the data management services you need is critical.

Data management services include allowing multiple users to work on the -database simultaneously (concurrency), allowing multiple records to be -changed instantaneously (transactions), and surviving application and -system crashes (recovery). Different database systems offer different -data management services. Data management services are entirely -independent of the data access services listed above. For example, -nothing about relational database theory requires that the system -support transactions, but most commercial relational systems do.

Concurrency means that multiple users can operate on the database at -the same time. Support for concurrency ranges from none (single-user -access only) to complete (many readers and writers working -simultaneously).

Transactions permit users to make multiple changes appear at once. For -example, a transfer of funds between bank accounts needs to be a -transaction because the balance in one account is reduced and the -balance in the other increases. If the reduction happened before the -increase, than a poorly-timed system crash could leave the customer -poorer; if the bank used the opposite order, then the same system crash -could make the customer richer. Obviously, both the customer and the -bank are best served if both operations happen at the same instant.

Transactions have well-defined properties in database systems. They are -atomic, so that the changes happen all at once or not at all. -They are consistent, so that the database is in a legal state -when the transaction begins and when it ends. They are typically -isolated, which means that any other users in the database -cannot interfere with them while they are in progress. And they are -durable, so that if the system or application crashes after -a transaction finishes, the changes are not lost. Together, the -properties of atomicity, consistency, -isolation, and durability are known as the ACID -properties.

As is the case for concurrency, support for transactions varies among -databases. Some offer atomicity without making guarantees about -durability. Some ignore isolatability, especially in single-user -systems; there's no need to isolate other users from the effects of -changes when there are no other users.

Another important data management service is recovery. Strictly -speaking, recovery is a procedure that the system carries out when it -starts up. The purpose of recovery is to guarantee that the database is -complete and usable. This is most important after a system or -application crash, when the database may have been damaged. The recovery -process guarantees that the internal structure of the database is good. -Recovery usually means that any completed transactions are checked, and -any lost changes are reapplied to the database. At the end of the -recovery process, applications can use the database as if there had been -no interruption in service.

Finally, there are a number of data management services that permit -copying of data. For example, most database systems are able to import -data from other sources, and to export it for use elsewhere. Also, most -systems provide some way to back up databases and to restore in the -event of a system failure that damages the database. Many commercial -systems allow hot backups, so that users can back up -databases while they are in use. Many applications must run without -interruption, and cannot be shut down for backups.

A particular database system may provide other data management services. -Some provide browsers that show database structure and contents. Some -include tools that enforce data integrity rules, such as the rule that -no employee can have a negative salary. These data management services -are not common to all systems, however. Concurrency, recovery, and -transactions are the data management services that most database vendors -support.

Deciding what kind of database to use means understanding the data -access and data management services that your application needs. Berkeley DB -is an embedded database that supports fairly simple data access with a -rich set of data management services. To highlight its strengths and -weaknesses, we can compare it to other database system categories.

+ The second service is data + management. Data management is more + complicated than data access. Providing good data + management services is the hard part of building a + database system. When you choose a database system to use + in an application you build, making sure it supports the + data management services you need is critical. +

+ Data management services include allowing multiple users + to work on the database simultaneously (concurrency), + allowing multiple records to be changed instantaneously + (transactions), and surviving application and system + crashes (recovery). Different database systems offer + different data management services. Data management + services are entirely independent of the data access + services listed above. For example, nothing about + relational database theory requires that the system + support transactions, but most commercial relational + systems do. +

+ Concurrency means that multiple users can operate on the + database at the same time. Support for concurrency ranges + from none (single-user access only) to complete (many + readers and writers working simultaneously). +

+ Transactions permit users to make multiple changes + appear at once. For example, a transfer of funds between + bank accounts needs to be a transaction because the + balance in one account is reduced and the balance in the + other increases. If the reduction happened before the + increase, than a poorly-timed system crash could leave the + customer poorer; if the bank used the opposite order, then + the same system crash could make the customer richer. + Obviously, both the customer and the bank are best served + if both operations happen at the same instant. +

+ Transactions have well-defined properties in database + systems. They are atomic, so that the + changes happen all at once or not at all. They are + consistent, so that the database + is in a legal state when the transaction begins and when + it ends. They are typically isolated, + which means that any other users in the database cannot + interfere with them while they are in progress. And they + are durable, so that if the system or + application crashes after a transaction finishes, the + changes are not lost. Together, the properties of + atomicity, + consistency, + isolation, and + durability are known as the ACID + properties. +

+ As is the case for concurrency, support for transactions + varies among databases. Some offer atomicity without + making guarantees about durability. Some ignore + isolatability, especially in single-user systems; there's + no need to isolate other users from the effects of changes + when there are no other users. +

+ Another important data management service is recovery. + Strictly speaking, recovery is a procedure that the system + carries out when it starts up. The purpose of recovery is + to guarantee that the database is complete and usable. + This is most important after a system or application + crash, when the database may have been damaged. The + recovery process guarantees that the internal structure of + the database is good. Recovery usually means that any + completed transactions are checked, and any lost changes + are reapplied to the database. At the end of the recovery + process, applications can use the database as if there had + been no interruption in service. +

+ Finally, there are a number of data management services + that permit copying of data. For example, most database + systems are able to import data from other sources, and to + export it for use elsewhere. Also, most systems provide + some way to back up databases and to restore in the event + of a system failure that damages the database. Many + commercial systems allow hot backups, + so that users can back up databases while they are in use. + Many applications must run without interruption, and + cannot be shut down for backups. +

+ A particular database system may provide other data + management services. Some provide browsers that show + database structure and contents. Some include tools that + enforce data integrity rules, such as the rule that no + employee can have a negative salary. These data management + services are not common to all systems, however. + Concurrency, recovery, and transactions are the data + management services that most database vendors + support. +

+ Deciding what kind of database to use means + understanding the data access and data management services + that your application needs. Berkeley DB is an embedded + database that supports fairly simple data access with a + rich set of data management services. To highlight its + strengths and weaknesses, we can compare it to other + database system categories. +

Relational databases

Relational databases are probably the best-known database variant, -because of the success of companies like Oracle. Relational databases -are based on the mathematical field of set theory. The term "relation" -is really just a synonym for "set" -- a relation is just a set of -records or, in our terminology, a table. One of the main innovations in -early relational systems was to insulate the programmer from the -physical organization of the database. Rather than walking through -arrays of records or traversing pointers, programmers make statements -about tables in a high-level language, and the system executes those -statements.

Relational databases operate on tuples, or records, composed -of values of several different data types, including integers, character -strings, and others. Operations include searching for records whose -values satisfy some criteria, updating records, and so on.

Virtually all relational databases use the Structured Query Language, -or SQL. This language permits people and computer programs to work with -the database by writing simple statements. The database engine reads -those statements and determines how to satisfy them on the tables in -the database.

SQL is the main practical advantage of relational database systems. -Rather than writing a computer program to find records of interest, the -relational system user can just type a query in a simple syntax, and -let the engine do the work. This gives users enormous flexibility; they -do not need to decide in advance what kind of searches they want to do, -and they do not need expensive programmers to find the data they need. -Learning SQL requires some effort, but it's much simpler than a -full-blown high-level programming language for most purposes. And there -are a lot of programmers who have already learned SQL.

+ Relational databases are probably the best-known + database variant, because of the success of companies like + Oracle. Relational databases are based on the mathematical + field of set theory. The term "relation" is really just a + synonym for "set" -- a relation is just a set of records + or, in our terminology, a table. One of the main + innovations in early relational systems was to insulate + the programmer from the physical organization of the + database. Rather than walking through arrays of records or + traversing pointers, programmers make statements about + tables in a high-level language, and the system executes + those statements. +

+ Relational databases operate on + tuples, or records, composed of + values of several different data types, including + integers, character strings, and others. Operations + include searching for records whose values satisfy some + criteria, updating records, and so on. +

+ Virtually all relational databases use the Structured + Query Language, or SQL. This language permits people and + computer programs to work with the database by writing + simple statements. The database engine reads those + statements and determines how to satisfy them on the + tables in the database. +

+ SQL is the main practical advantage of relational + database systems. Rather than writing a computer program + to find records of interest, the relational system user + can just type a query in a simple syntax, and let the + engine do the work. This gives users enormous flexibility; + they do not need to decide in advance what kind of + searches they want to do, and they do not need expensive + programmers to find the data they need. Learning SQL + requires some effort, but it's much simpler than a + full-blown high-level programming language for most + purposes. And there are a lot of programmers who have + already learned SQL. +

Object-oriented databases

Object-oriented databases are less common than relational systems, but -are still fairly widespread. Most object-oriented databases were -originally conceived as persistent storage systems closely wedded to -particular high-level programming languages like C++. With the spread -of Java, most now support more than one programming language, but -object-oriented database systems fundamentally provide the same class -and method abstractions as do object-oriented programming languages.

Many object-oriented systems allow applications to operate on objects -uniformly, whether they are in memory or on disk. These systems create -the illusion that all objects are in memory all the time. The advantage -to object-oriented programmers who simply want object storage and -retrieval is clear. They need never be aware of whether an object is in -memory or not. The application simply uses objects, and the database -system moves them between disk and memory transparently. All of the -operations on an object, and all its behavior, are determined by the -programming language.

Object-oriented databases aren't nearly as widely deployed as relational -systems. In order to attract developers who understand relational -systems, many of the object-oriented systems have added support for -query languages very much like SQL. In practice, though, object-oriented -databases are mostly used for persistent storage of objects in C++ and -Java programs.

+ Object-oriented databases are less common than + relational systems, but are still fairly widespread. Most + object-oriented databases were originally conceived as + persistent storage systems closely wedded to particular + high-level programming languages like C++. With the spread + of Java, most now support more than one programming + language, but object-oriented database systems + fundamentally provide the same class and method + abstractions as do object-oriented programming + languages. +

+ Many object-oriented systems allow applications to + operate on objects uniformly, whether they are in memory + or on disk. These systems create the illusion that all + objects are in memory all the time. The advantage to + object-oriented programmers who simply want object storage + and retrieval is clear. They need never be aware of + whether an object is in memory or not. The application + simply uses objects, and the database system moves them + between disk and memory transparently. All of the + operations on an object, and all its behavior, are + determined by the programming language. +

+ Object-oriented databases aren't nearly as widely + deployed as relational systems. In order to attract + developers who understand relational systems, many of the + object-oriented systems have added support for query + languages very much like SQL. In practice, though, + object-oriented databases are mostly used for persistent + storage of objects in C++ and Java programs. +

Network databases

The "network model" is a fairly old technique for managing and -navigating application data. Network databases are designed to make -pointer traversal very fast. Every record stored in a network database -is allowed to contain pointers to other records. These pointers are -generally physical addresses, so fetching the record to which it refers -just means reading it from disk by its disk address.

Network database systems generally permit records to contain integers, -floating point numbers, and character strings, as well as references to -other records. An application can search for records of interest. After -retrieving a record, the application can fetch any record to which it -refers, quickly.

Pointer traversal is fast because most network systems use physical disk -addresses as pointers. When the application wants to fetch a record, -the database system uses the address to fetch exactly the right string -of bytes from the disk. This requires only a single disk access in all -cases. Other systems, by contrast, often must do more than one disk read -to find a particular record.

The key advantage of the network model is also its main drawback. The -fact that pointer traversal is so fast means that applications that do -it will run well. On the other hand, storing pointers all over the -database makes it very hard to reorganize the database. In effect, once -you store a pointer to a record, it is difficult to move that record -elsewhere. Some network databases handle this by leaving forwarding -pointers behind, but this defeats the speed advantage of doing a single -disk access in the first place. Other network databases find, and fix, -all the pointers to a record when it moves, but this makes -reorganization very expensive. Reorganization is often necessary in -databases, since adding and deleting records over time will consume -space that cannot be reclaimed without reorganizing. Without periodic -reorganization to compact network databases, they can end up with a -considerable amount of wasted space.

+ The "network model" is a fairly old technique for + managing and navigating application data. Network + databases are designed to make pointer traversal very + fast. Every record stored in a network database is allowed + to contain pointers to other records. These pointers are + generally physical addresses, so fetching the record to + which it refers just means reading it from disk by its + disk address. +

+ Network database systems generally permit records to + contain integers, floating point numbers, and character + strings, as well as references to other records. An + application can search for records of interest. After + retrieving a record, the application can fetch any record + to which it refers, quickly. +

+ Pointer traversal is fast because most network systems + use physical disk addresses as pointers. When the + application wants to fetch a record, the database system + uses the address to fetch exactly the right string of + bytes from the disk. This requires only a single disk + access in all cases. Other systems, by contrast, often + must do more than one disk read to find a particular + record. +

+ The key advantage of the network model is also its main + drawback. The fact that pointer traversal is so fast means + that applications that do it will run well. On the other + hand, storing pointers all over the database makes it very + hard to reorganize the database. In effect, once you store + a pointer to a record, it is difficult to move that record + elsewhere. Some network databases handle this by leaving + forwarding pointers behind, but this defeats the speed + advantage of doing a single disk access in the first + place. Other network databases find, and fix, all the + pointers to a record when it moves, but this makes + reorganization very expensive. Reorganization is often + necessary in databases, since adding and deleting records + over time will consume space that cannot be reclaimed + without reorganizing. Without periodic reorganization to + compact network databases, they can end up with a + considerable amount of wasted space. +

Clients and servers

Database vendors have two choices for system architecture. They can -build a server to which remote clients connect, and do all the database -management inside the server. Alternatively, they can provide a module -that links directly into the application, and does all database -management locally. In either case, the application developer needs -some way of communicating with the database (generally, an Application -Programming Interface (API) that does work in the process or that -communicates with a server to get work done).

Almost all commercial database products are implemented as servers, and -applications connect to them as clients. Servers have several features -that make them attractive.

First, because all of the data is managed by a separate process, and -possibly on a separate machine, it's easy to isolate the database server -from bugs and crashes in the application.

Second, because some database products (particularly relational engines) -are quite large, splitting them off as separate server processes keeps -applications small, which uses less disk space and memory. Relational -engines include code to parse SQL statements, to analyze them and -produce plans for execution, to optimize the plans, and to execute -them.

Finally, by storing all the data in one place and managing it with a -single server, it's easier for organizations to back up, protect, and -set policies on their databases. The enterprise databases for large -companies often have several full-time administrators caring for them, -making certain that applications run quickly, granting and denying -access to users, and making backups.

However, centralized administration can be a disadvantage in some cases. -In particular, if a programmer wants to build an application that uses -a database for storage of important information, then shipping and -supporting the application is much harder. The end user needs to install -and administer a separate database server, and the programmer must -support not just one product, but two. Adding a server process to the -application creates new opportunity for installation mistakes and -run-time problems.

+ Database vendors have two choices for system + architecture. They can build a server to which remote + clients connect, and do all the database management inside + the server. Alternatively, they can provide a module that + links directly into the application, and does all database + management locally. In either case, the application + developer needs some way of communicating with the + database (generally, an Application Programming Interface + (API) that does work in the process or that communicates + with a server to get work done). +

+ Almost all commercial database products are implemented + as servers, and applications connect to them as clients. + Servers have several features that make them + attractive. +

+ First, because all of the data is managed by a separate + process, and possibly on a separate machine, it's easy to + isolate the database server from bugs and crashes in the + application. +

+ Second, because some database products (particularly + relational engines) are quite large, splitting them off as + separate server processes keeps applications small, which + uses less disk space and memory. Relational engines + include code to parse SQL statements, to analyze them and + produce plans for execution, to optimize the plans, and to + execute them. +

+ Finally, by storing all the data in one place and + managing it with a single server, it's easier for + organizations to back up, protect, and set policies on + their databases. The enterprise databases for large + companies often have several full-time administrators + caring for them, making certain that applications run + quickly, granting and denying access to users, and making + backups. +

+ However, centralized administration can be a + disadvantage in some cases. In particular, if a programmer + wants to build an application that uses a database for + storage of important information, then shipping and + supporting the application is much harder. The end user + needs to install and administer a separate database + server, and the programmer must support not just one + product, but two. Adding a server process to the + application creates new opportunity for installation + mistakes and run-time problems. +

@@ -348,9 +480,7 @@ run-time problems.

Next - Chapter 1. - Introduction - + Chapter 1. Introduction Home -- cgit v1.2.1