summaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAgeFilesLines
* Fix breakage caused by conflicting patches, as evidenced by the buildfarm.Alvaro Herrera2008-06-081-1/+2
|
* Rewrite DROP's dependency traversal algorithm into an honest two-passTom Lane2008-06-081-1/+42
| | | | | | | | | | | | | algorithm, replacing the original intention of a one-pass search, which had been hacked up over time to be partially two-pass in hopes of handling various corner cases better. It still wasn't quite there, especially as regards emitting unwanted NOTICE messages. More importantly, this approach lets us fix a number of open bugs concerning concurrent DROP scenarios, because we can take locks during the first pass and avoid traversing to dependent objects that were just deleted by someone else. There is more that can be done here, but I'll go ahead and commit the base patch before working on the options.
* Move BufferGetPageSize and BufferGetPage from bufpage.h to bufmgr.h. It isAlvaro Herrera2008-06-087-11/+15
| | | | | | | | | | more logical that way, and also it reduces the amount of unnecessary includes in bufpage.h, which is widely used. Zdenek Kotala. My previous patch to bufpage.h should also have credited him as author, but I forgot (sorry about that).
* Set hidden field for guc enum missed in previous commit.Magnus Hagander2008-05-281-7/+7
|
* Alter the xxx_pattern_ops opclasses to use the regular equality operator ofTom Lane2008-05-271-10/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | the associated datatype as their equality member. This means that these opclasses can now support plain equality comparisons along with LIKE tests, thus avoiding the need for an extra index in some applications. This optimization was not possible when the pattern opclasses were first introduced, because we didn't insist that text equality meant bitwise equality; but we do now, so there is no semantic difference between regular and pattern equality operators. I removed the name_pattern_ops opclass altogether, since it's really useless: name's regular comparisons are just strcmp() and are unlikely to become something different. Instead teach indxpath.c that btree name_ops can be used for LIKE whether or not the locale is C. This might lead to a useful speedup in LIKE queries on the system catalogs in non-C locales. The ~=~ and ~<>~ operators are gone altogether. (It would have been nice to keep them for backward compatibility's sake, but since the pg_amop structure doesn't allow multiple equality operators per opclass, there's no way.) A not-immediately-obvious incompatibility is that the sort order within bpchar_pattern_ops indexes changes --- it had been identical to plain strcmp, but is now trailing-blank-insensitive. This will impact in-place upgrades, if those ever happen. Per discussions a couple months ago.
* Remove arbitrary 10MB limit on two-phase state file size. It's not that hardHeikki Linnakangas2008-05-191-3/+15
| | | | | | | | | | | | | | | | to go beoynd 10MB, as demonstrated by Gavin Sharry's example of dropping a schema with ~25000 objects. The really bogus thing about the limit was that it was enforced when a state file file was read in, not when it was written, so you would end up with a prepared transaction that you can't commit or abort, and the only recourse was to shut down the server and remove the file by hand. Raise the limit to MaxAllocSize, and enforce it also when a state file is written. We could've removed the limit altogether, but reading in a file larger than MaxAllocSize would fail anyway because we read it into a palloc'd buffer. Backpatch down to 8.1, where 2PC and this issue was introduced.
* Fix a subtle bug exposed by recent wal_sync_method rearrangements.Tom Lane2008-05-171-3/+1
| | | | | | | | Formerly, the default value of wal_sync_method was determined inside xlog.c, but now it is determined inside guc.c. guc.c was reading xlogdefs.h without having read <fcntl.h>, leading to wrong determination of DEFAULT_SYNC_METHOD. Obviously xlogdefs.h needs to include <fcntl.h> for itself to ensure stable results.
* Reduce unnecessary PANIC to ERROR, improve a couple of comments.Tom Lane2008-05-161-15/+11
|
* Extend GIN to support partial-match searches, and extend tsquery to supportTom Lane2008-05-163-23/+385
| | | | | | prefix matching using this facility. Teodor Sigaev and Oleg Bartunov
* Persuade GIN to react to control-C in a reasonable amount of timeTom Lane2008-05-161-1/+10
| | | | while building a GIN index.
* Remove the special variable for open_sync_bit used in O_SYNC and O_DSYNCMagnus Hagander2008-05-141-31/+32
| | | | | | | | | modes, replacing it with a call to a function that derives it from the sync_method variable, now that it has distinct values for these two cases. This means that assign_xlog_sync_method() no longer changes any settings, thus fixing the bug introduced in the change to use a guc enum for wal_sync_method.
* Don't try to close negative file descriptors, since this can causeMagnus Hagander2008-05-131-3/+6
| | | | | | | crashes on certain platforms. In particular, the MSVC runtime is known to do this. Fixes bug #4162, reported and diagnosed by Javier Pimas
* This is the patch replace offnum++ by OffsetNumberNext, to beBruce Momjian2008-05-131-2/+2
| | | | | | consistent. OffsetNumberNext() has some casting that makes it useful. Fujii Masao
* Improve snapshot manager by keeping explicit track of snapshots.Alvaro Herrera2008-05-121-9/+10
| | | | | | | | | | | | | There are two ways to track a snapshot: there's the "registered" list, which is used for arbitrary long-lived snapshots; and there's the "active stack", which is used for the snapshot that is considered "active" at any time. This also allows users of snapshots to stop worrying about snapshot memory allocation and freeing, and about using PG_TRY blocks around ActiveSnapshot assignment. This is all done automatically now. As a consequence, this allows us to reset MyProc->xmin when there are no more snapshots registered in the current backend, reducing the impact that long-running transactions have on VACUUM.
* Fix breakage by the wal_sync_method patch in installations that useMagnus Hagander2008-05-121-1/+2
| | | | O_DSYNC (specifically this broke all the Windows buildfarm members)
* Put back bufmgr.h in bufpage.h -- it is needed by some macros.Alvaro Herrera2008-05-127-14/+7
| | | | | Remove #include bufmgr.h from (most?) source files which already include bufpage.h.
* Report which WAL sync method we are trying to change *to* when it fails,Magnus Hagander2008-05-121-2/+2
| | | | not which one we had before (that worked, and thus is completley irrelevant)
* Convert wal_sync_method to guc enum.Magnus Hagander2008-05-121-40/+53
|
* Restructure some header files a bit, in particular heapam.h, by removing someAlvaro Herrera2008-05-1238-49/+99
| | | | | | | | | | | | unnecessary #include lines in it. Also, move some tuple routine prototypes and macros to htup.h, which allows removal of heapam.h inclusion from some .c files. For this to work, a new header file access/sysattr.h needed to be created, initially containing attribute numbers of system columns, for pg_dump usage. While at it, make contrib ltree, intarray and hstore header files more consistent with our header style.
* Change the rules for inherited CHECK constraints to be essentially the sameTom Lane2008-05-091-39/+10
| | | | | | | | | | | | | | | | as those for inherited columns; that is, it's no longer allowed for a child table to not have a check constraint matching one that exists on a parent. This satisfies the principle of least surprise (rows selected from the parent will always appear to meet its check constraints) and eliminates some longstanding bogosity in pg_dump, which formerly had to guess about whether check constraints were really inherited or not. The implementation involves adding conislocal and coninhcount columns to pg_constraint (paralleling attislocal and attinhcount in pg_attribute) and refactoring various ALTER TABLE actions to be more like those for columns. Alex Hunsaker, Nikhil Sontakke, Tom Lane
* Fix Assert introduced in previous patch.Heikki Linnakangas2008-05-091-2/+2
|
* Fix incorrect archive truncation point calculation in the %r recovery_commandHeikki Linnakangas2008-05-091-6/+30
| | | | | | | | | | | parameter. This fixes bug 4137 reported by Wojciech Strzalka, where a WAL file is deleted too early when starting the recovery of a warm standby server. Also add a sanity check in pg_standby so that it will refuse to delete anything earlier than the file being restored, and improve the debug message in case nothing is deleted. Simon Riggs. Backpatch to 8.3, which is where %r was introduced.
* Update error messages, per notes from Tom.Magnus Hagander2008-04-241-3/+4
| | | | Laurenz Albe
* Prevent shutdown in normal mode if online backup is running, andMagnus Hagander2008-04-231-1/+51
| | | | | | | | | have pg_ctl warn about this. Cancel running online backups (by renaming the backup_label file, thus rendering the backup useless) when shutting down in fast mode. Laurenz Albe
* Fix using too many LWLocks bug, reported by Craig RingerTeodor Sigaev2008-04-221-205/+157
| | | | | | | | | <craig@postnewspapers.com.au>. It was my mistake, I missed limitation of number of held locks, now GIN doesn't use continiuous locks, but still hold buffers pinned to prevent interference with vacuum's deletion algorithm. Backpatch is needed.
* Allow float8, int8, and related datatypes to be passed by value on machinesTom Lane2008-04-211-5/+39
| | | | | | | | | | where Datum is 8 bytes wide. Since this will break old-style C functions (those still using version 0 calling convention) that have arguments or results of these types, provide a configure option to disable it and retain the old pass-by-reference behavior. Likewise, provide a configure option to disable the recently-committed float4 pass-by-value change. Zoltan Boszormenyi, plus configurability stuff by me.
* Clean up a few places where Datums were being treated as pointers (and viceAlvaro Herrera2008-04-175-41/+42
| | | | | | versa) without going through DatumGetPointer. Gavin Sherry, with Feng Tian.
* Repair two places where SIGTERM exit could leave shared memory stateTom Lane2008-04-163-29/+38
| | | | | | | | | | | | | | corrupted. (Neither is very important if SIGTERM is used to shut down the whole database cluster together, but there's a problem if someone tries to SIGTERM individual backends.) To do this, introduce new infrastructure macros PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care of transiently pushing an on_shmem_exit cleanup hook. Also use this method for createdb cleanup --- that wasn't a shared-memory-corruption problem, but SIGTERM abort of createdb could leave orphaned files lying around. Backpatch as far as 8.2. The shmem corruption cases don't exist in 8.1, and the createdb usage doesn't seem important enough to risk backpatching further.
* Push index operator lossiness determination down to GIST/GIN opclassTom Lane2008-04-144-46/+116
| | | | | | | | | | | "consistent" functions, and remove pg_amop.opreqcheck, as per recent discussion. The main immediate benefit of this is that we no longer need 8.3's ugly hack of requiring @@@ rather than @@ to test weight-using tsquery searches on GIN indexes. In future it should be possible to optimize some other queries better than is done now, by detecting at runtime whether the index match is exact or not. Tom Lane, after an idea of Heikki's, and with some help from Teodor.
* Phase 2 of project to make index operator lossiness be determined at runtimeTom Lane2008-04-136-31/+63
| | | | | | | | | | | | instead of plan time. Extend the amgettuple API so that the index AM returns a boolean indicating whether the indexquals need to be rechecked, and make that rechecking happen in nodeIndexscan.c (currently the only place where it's expected to be needed; other callers of index_getnext are just erroring out for now). For the moment, GIN and GIST have stub logic that just always sets the recheck flag to TRUE --- I'm hoping to get Teodor to handle pushing that control down to the opclass consistent() functions. The planner no longer pays any attention to amopreqcheck, and that catalog column will go away in due course.
* Create new routines systable_beginscan_ordered, systable_getnext_ordered,Tom Lane2008-04-123-67/+110
| | | | | | | | | | | | | | systable_endscan_ordered that have API similar to systable_beginscan etc (in particular, the passed-in scankeys have heap not index attnums), but guarantee ordered output, unlike the existing functions. For the moment these are just very thin wrappers around index_beginscan/index_getnext/etc. Someday they might need to get smarter; but for now this is just a code refactoring exercise to reduce the number of direct callers of index_getnext, in preparation for changing that function's API. In passing, remove index_getnext_indexitem, which has been dead code for quite some time, and will have even less use than that in the presence of run-time-lossy indexes.
* Replace "amgetmulti" AM functions with "amgetbitmap", in which the wholeTom Lane2008-04-106-147/+122
| | | | | | | | | | | | | | | | | | indexscan always occurs in one call, and the results are returned in a TIDBitmap instead of a limited-size array of TIDs. This should improve speed a little by reducing AM entry/exit overhead, and it is necessary infrastructure if we are ever to support bitmap indexes. In an only slightly related change, add support for TIDBitmaps to preserve (somewhat lossily) the knowledge that particular TIDs reported by an index need to have their quals rechecked when the heap is visited. This facility is not really used yet; we'll need to extend the forced-recheck feature to plain indexscans before it's useful, and that hasn't been coded yet. The intent is to use it to clean up 8.3's horrid @@@ kluge for text search with weighted queries. There might be other uses in future, but that one alone is sufficient reason. Heikki Linnakangas, with some adjustments by me.
* Improve hash_any() to use word-wide fetches when hashing suitably alignedTom Lane2008-04-061-37/+193
| | | | | | | | | | | | | data. This makes for a significant speedup at the cost that the results now vary between little-endian and big-endian machines; which forces us to add explicit ORDER BYs in a couple of regression tests to preserve machine-independent comparison results. Also, force initdb by bumping catversion, since the contents of hash indexes will change (at least on big-endian machines). Kenneth Marshall and Tom Lane, based on work from Bob Jenkins. This commit does not adopt Bob's new faster mix() algorithm, however, since we still need to convince ourselves that that doesn't degrade the quality of the hashing.
* Have pg_stop_backup() wait for all archive files to be sent, rather thanBruce Momjian2008-04-051-6/+43
| | | | | | | returing right away. This guarantees that when pg_stop_backup() returns, you have a valid backup. Simon Riggs
* Remove heap_release_fetch, which is no longer used anywhere; this simplifiesTom Lane2008-04-031-29/+3
| | | | heap_fetch a little.
* Move the HTSU_Result enum definition into snapshot.h, to avoid includingAlvaro Herrera2008-03-266-6/+12
| | | | | | tqual.h into heapam.h. This makes all inclusion of tqual.h explicit. I also sorted alphabetically the includes on some source files.
* Rename snapmgmt.c/h to snapmgr.c/h, for consistency with other files.Alvaro Herrera2008-03-266-12/+12
| | | | Per complaint from Tom Lane.
* Separate snapshot management code from tuple visibility code, create aAlvaro Herrera2008-03-266-8/+12
| | | | | | | | | | | | | snapmgmt.c file for the former. The header files have also been reorganized in three parts: the most basic snapshot definitions are now in a new file snapshot.h, and the also new snapmgmt.h keeps the definitions for snapmgmt.c. tqual.h has been reduced to the bare minimum. This patch is just a first step towards managing live snapshots within a transaction; there is no functionality change. Per my proposal to pgsql-patches on 20080318191940.GB27458@alvh.no-ip.org and subsequent discussion.
* Simplify and standardize conversions between TEXT datums and ordinary CTom Lane2008-03-253-40/+16
| | | | | | | | | | | | | | | | | | | | strings. This patch introduces four support functions cstring_to_text, cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and two macros CStringGetTextDatum and TextDatumGetCString. A number of existing macros that provided variants on these themes were removed. Most of the places that need to make such conversions now require just one function or macro call, in place of the multiple notational layers that used to be needed. There are no longer any direct calls of textout or textin, and we got most of the places that were using handmade conversions via memcpy (there may be a few still lurking, though). This commit doesn't make any serious effort to eliminate transient memory leaks caused by detoasting toasted text objects before they reach text_to_cstring. We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few places where it was easy, but much more could be done. Brendan Jurd and Tom Lane
* More README src cleanups.Bruce Momjian2008-03-213-9/+7
|
* Make source code READMEs more consistent. Add CVS tags to all README files.Bruce Momjian2008-03-205-22/+35
|
* Enable probes to work with Mac OS X Leopard and other OSes that willPeter Eisentraut2008-03-171-4/+5
| | | | | | | | | | | support DTrace in the future. Switch from using DTRACE_PROBEn macros to the dynamically generated macros. Use "dtrace -h" to create a header file that contains the dynamically generated macros to be used in the source code instead of the DTRACE_PROBEn macros. A dummy header file is generated for builds without DTrace support. Author: Robert Lor <Robert.Lor@sun.com>
* Fix TransactionIdIsCurrentTransactionId() to use binary search instead ofTom Lane2008-03-172-60/+127
| | | | | | | | | linear search when checking child-transaction XIDs. This makes for an important speedup in transactions that have large numbers of children, as in a recent example from Craig Ringer. We can also get rid of an ugly kluge that represented lists of TransactionIds as lists of OIDs. Heikki Linnakangas
* When creating a large hash index, pre-sort the index entries by estimatedTom Lane2008-03-165-15/+179
| | | | | | | | | | bucket number, so as to ensure locality of access to the index during the insertion step. Without this, building an index significantly larger than available RAM takes a very long time because of thrashing. On the other hand, sorting is just useless overhead when the index does fit in RAM. We choose to sort when the initial index size exceeds effective_cache_size. This is a revised version of work by Tom Raney and Shreya Bhargava.
* Change hash index creation so that rather than always establishing exactlyTom Lane2008-03-153-22/+61
| | | | | | | | | | | two buckets at the start, we create a number of buckets appropriate for the estimated size of the table. This avoids a lot of expensive bucket-split actions during initial index build on an already-populated table. This is one of the two core ideas of Tom Raney and Shreya Bhargava's patch to reduce hash index build time. I'm committing it separately to make it easier for people to test the effects of this separately from the effects of their other core idea (pre-sorting the index entries by bucket number).
* Fix heap_page_prune's problem with failing to send cache invalidationTom Lane2008-03-131-12/+25
| | | | | | | | | | | messages if the calling transaction aborts later on. Collapsing out line pointer redirects is a done deal as soon as we complete the page update, so syscache *must* be notified even if the VACUUM FULL as a whole doesn't complete. To fix, add some functionality to inval.c to allow the pending inval messages to be sent immediately while heap_page_prune is still running. The implementation is a bit chintzy: it will only work in the context of VACUUM FULL. But that's all we need now, and it can always be extended later if needed. Per my trouble report of a week ago.
* Make TransactionIdIsInProgress check transam.c's single-item XID status cacheTom Lane2008-03-111-29/+48
| | | | | | | | before it goes groveling through the ProcArray. In situations where the same recently-committed transaction ID is checked repeatedly by tqual.c, this saves a lot of shared-memory searches. And it's cheap enough that it shouldn't hurt noticeably when it doesn't help. Concept and patch by Simon, some minor tweaking and comment-cleanup by Tom.
* Remove no-longer-used XLogCacheByte field of XLogCtl.Tom Lane2008-03-101-4/+1
| | | | Itagaki Takahiro
* Refactor heap_page_prune so that instead of changing item states on-the-fly,Tom Lane2008-03-082-196/+288
| | | | | | | | | | | | | | | | | it accumulates the set of changes to be made and then applies them. It had to accumulate the set of changes anyway to prepare a WAL record for the pruning action, so this isn't an enormous change; the only new complexity is to not doubly mark tuples that are visited twice in the scan. The main advantage is that we can substantially reduce the scope of the critical section in which the changes are applied, thus avoiding PANIC in foreseeable cases like running out of memory in inval.c. A nice secondary advantage is that it is now far clearer that WAL replay will actually do the same thing that the original pruning did. This commit doesn't do anything about the open problem that CacheInvalidateHeapTuple doesn't have the right semantics for a CTID change caused by collapsing out a redirect pointer. But whatever we do about that, it'll be a good idea to not do it inside a critical section.
* This patch addresses some issues in TOAST compression strategy thatTom Lane2008-03-071-26/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | were discussed last year, but we felt it was too late in the 8.3 cycle to change the code immediately. Specifically, the patch: * Reduces the minimum datum size to be considered for compression from 256 to 32 bytes, as suggested by Greg Stark. * Increases the required compression rate for compressed storage from 20% to 25%, again per Greg's suggestion. * Replaces force_input_size (size above which compression is forced) with a maximum size to be considered for compression. It was agreed that allowing large inputs to escape the minimum-compression-rate requirement was not bright, and that indeed we'd rather have a knob that acted in the other direction. I set this value to 1MB for the moment, but it could use some performance studies to tune it. * Adds an early-failure path to the compressor as suggested by Jan: if it's been unable to find even one compressible substring in the first 1KB (parameterizable), assume we're looking at incompressible input and give up. (Possibly this logic can be improved, but I'll commit it as-is for now.) * Improves the toasting heuristics so that when we have very large fields with attstorage 'x' or 'e', we will push those out to toast storage before considering inline compression of shorter fields. This also responds to a suggestion of Greg's, though my original proposal for a solution was a bit off base because it didn't fix the problem for large 'e' fields. There was some discussion in the earlier threads of exposing some of the compression knobs to users, perhaps even on a per-column basis. I have not done anything about that here. It seems to me that if we are changing around the parameters, we'd better get some experience and be sure we are happy with the design before we set things in stone by providing user-visible knobs.