summaryrefslogtreecommitdiff
path: root/src/backend/access/heap
Commit message (Collapse)AuthorAgeFilesLines
...
* Refactor how VACUUM passes around its XID cutoffs.Peter Geoghegan2022-12-222-361/+313
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use a dedicated struct for the XID/MXID cutoffs used by VACUUM, such as FreezeLimit and OldestXmin. This state is initialized in vacuum.c, and then passed around by code from vacuumlazy.c to heapam.c freezing related routines. The new convention is that everybody works off of the same cutoff state, which is passed around via pointers to const. Also simplify some of the logic for dealing with frozen xmin in heap_prepare_freeze_tuple: add dedicated "xmin_already_frozen" state to clearly distinguish xmin XIDs that we're going to freeze from those that were already frozen from before. That way the routine's xmin handling code is symmetrical with the existing xmax handling code. This is preparation for an upcoming commit that will add page level freezing. Also refactor the control flow within FreezeMultiXactId(), while adding stricter sanity checks. We now test OldestXmin directly, instead of using FreezeLimit as an inexact proxy for OldestXmin. This is further preparation for the page level freezing work, which will make the function's caller cede control of page level freezing to the function where appropriate (where heap_prepare_freeze_tuple sees a tuple that happens to contain a MultiXactId in its xmax). Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/CAH2-WznS9TxXmz2_=SY+SyJyDFbiOftKofM9=aDo68BbXNBUMA@mail.gmail.com
* Add copyright notices to meson filesAndrew Dunstan2022-12-201-0/+2
| | | | Discussion: https://postgr.es/m/222b43a5-2fb3-2c1b-9cd0-375d376c8246@dunslane.net
* Static assertions cleanupPeter Eisentraut2022-12-151-5/+1
| | | | | | | | | | | | | | | | | | | | | Because we added StaticAssertStmt() first before StaticAssertDecl(), some uses as well as the instructions in c.h are now a bit backwards from the "native" way static assertions are meant to be used in C. This updates the guidance and moves some static assertions to better places. Specifically, since the addition of StaticAssertDecl(), we can put static assertions at the file level. This moves a number of static assertions out of function bodies, where they might have been stuck out of necessity, to perhaps better places at the file level or in header files. Also, when the static assertion appears in a position where a declaration is allowed, then using StaticAssertDecl() is more native than StaticAssertStmt(). Reviewed-by: John Naylor <john.naylor@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/941a04e7-dd6f-c0e4-8cdf-a33b3338cbda%40enterprisedb.com
* Generate pg_stat_get*() functions for tables using macrosMichael Paquier2022-12-061-1/+1
| | | | | | | | | | | | | The same code pattern is repeated 17 times for int64 counters (0 for missing entry) and 5 times for timestamps (NULL for missing entry) on table entries. This code is switched to use a macro for the basic code instead, shaving a few hundred lines of originally-duplicated code. The function names remain the same, but some fields of PgStat_StatTabEntry have to be renamed to cope with the new style. Author: Bertrand Drouvot Reviewed-by: Nathan Bossart Discussion: https:/postgr.es/m/20221204173207.GA2669116@nathanxps13
* Simplify vacuum_set_xid_limits() signature.Peter Geoghegan2022-11-231-6/+1
| | | | | | | | | | | | Pass VACUUM parameters (VacuumParams state) to vacuum_set_xid_limits() directly, rather than passing most individual VacuumParams fields as separate arguments. Also make vacuum_set_xid_limits() output parameter symbol names match those used by its vacuumlazy.c caller. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wz=TE7gW5DgSahDkf0UEZigFGAoHNNN6EvSrdzC=Kn+hrA@mail.gmail.com
* Don't test HEAP_XMAX_INVALID when freezing xmax.Peter Geoghegan2022-11-231-2/+5
| | | | | | | | | | | | | | | | | | | | We shouldn't ever need to rely on whether HEAP_XMAX_INVALID is set in t_infomask when considering whether or not an xmax should be deemed already frozen, since that status flag is just a hint. The only acceptable representation for an "xmax_already_frozen" raw xmax field is the transaction ID value zero (also known as InvalidTransactionId). Adjust code that superficially appeared to rely on HEAP_XMAX_INVALID to make the rule about xmax_already_frozen clear. Also avoid needlessly rereading the tuple's raw xmax. Oversight in bugfix commit d2599ecf. There is no evidence that this ever led to incorrect behavior, so no backpatch. The worst consequence of this bug was that VACUUM could hypothetically fail to notice and report on certain kinds of corruption, which seems fairly benign. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wzkh3DMCDRPfhZxj9xCq9v3WmzvmbiCpf1dNKUBPadhCbQ@mail.gmail.com
* Standardize rmgrdesc recovery conflict XID output.Peter Geoghegan2022-11-172-40/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Standardize on the name snapshotConflictHorizon for all XID fields from WAL records that generate recovery conflicts when in hot standby mode. This supersedes the previous latestRemovedXid naming convention. The new naming convention places emphasis on how the values are actually used by REDO routines. How the values are generated during original execution (details of which vary by record type) is deemphasized. Users of tools like pg_waldump can now grep for snapshotConflictHorizon to see all potential sources of recovery conflicts in a standardized way, without necessarily having to consider which specific record types might be involved. Also bring a couple of WAL record types that didn't follow any kind of naming convention into line. These are heapam's VISIBLE record type and SP-GiST's VACUUM_REDIRECT record type. Now every WAL record whose REDO routine calls ResolveRecoveryConflictWithSnapshot() passes through the snapshotConflictHorizon field from its WAL record. This is follow-up work to the refactoring from commit 9e540599 that made FREEZE_PAGE WAL records use a standard snapshotConflictHorizon style XID cutoff. No bump in XLOG_PAGE_MAGIC, since the underlying format of affected WAL records doesn't change. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CAH2-Wzm2CQUmViUq7Opgk=McVREHSOorYaAjR1ZpLYkRN7_dPw@mail.gmail.com
* Use correct type name in comments about freezing.Peter Geoghegan2022-11-171-1/+1
| | | | Oversight in commit 9e540599, which added freeze plan deduplication.
* Variable renaming in preparation for refactoringPeter Eisentraut2022-11-162-122/+122
| | | | | | | | Rename page -> block and dp -> page where appropriate. The old naming mixed up block and page in confusing ways. Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com
* Remove useless castsPeter Eisentraut2022-11-162-7/+7
| | | | | Maybe these are left from when PageGetItem() was a macro, but now they are clearly useless.
* Turn HeapKeyTest macro into inline functionPeter Eisentraut2022-11-161-4/+4
| | | | | | | | It is easier to read as a function. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ@mail.gmail.com
* Deduplicate freeze plans in freeze WAL records.Peter Geoghegan2022-11-152-115/+270
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make heapam WAL records that describe freezing performed by VACUUM more space efficient by storing each distinct "freeze plan" once, alongside an array of associated page offset numbers (one per freeze plan). The freeze plans required for most heap pages tend to naturally have a great deal of redundancy, so this technique is very effective in practice. It often leads to freeze WAL records that are less than 20% of the size of equivalent WAL records generated using the previous approach. The freeze plan concept was introduced by commit 3b97e6823b, which fixed bugs in VACUUM's handling of MultiXacts. We retain the concept of freeze plans, but go back to using page offset number arrays. There is no loss of generality here because deduplication is an additive process that gets applied mechanically when FREEZE_PAGE WAL records are built. More than anything else, freeze plan deduplication is an optimization that reduces the marginal cost of freezing additional tuples on pages that will need to have at least one or two tuples frozen in any case. Ongoing work that adds page-level freezing to VACUUM will take full advantage of the improved cost profile through batching. Also refactor some of the details surrounding recovery conflicts needed to REDO freeze records in passing: make original execution responsible for generating a standard latestRemovedXid cutoff, rather than working backwards to get the same cutoff in the REDO routine. Bugfix commit 66fbcb0d2e did it the other way around, which is equivalent but obscures what's going on. Also rename the cutoff field from the WAL record/struct (rename the field cutoff_xid to latestRemovedXid to match similar WAL records). Processing of conflicts by REDO routines is already completely uniform, so tools like pg_waldump should present the information driving the process uniformly. There are two remaining WAL record types that still don't quite follow this convention (heapam's VISIBLE record type and SP-GiST's VACUUM_REDIRECT record type). They can be brought into line by later work that totally standardizes how the cutoffs are presented. Bump XLOG_PAGE_MAGIC. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Nathan Bossart <nathandbossart@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/CAH2-Wz=XytErMnb8FAyFd+OQEbiipB0Q2FmFdXrggPL4VBnRYQ@mail.gmail.com
* Document WAL rules related to PD_ALL_VISIBLE in README.Jeff Davis2022-11-122-8/+15
| | | | | | | Also improve comments. Discussion: https://postgr.es/m/a50005c1c537f89bb359057fd70e66bb83bce969.camel@j-davis.com Reviewed-by: Peter Geoghegan
* Fix theoretical torn page hazard.Jeff Davis2022-11-111-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | The original report was concerned with a possible inconsistency between the heap and the visibility map, which I was unable to confirm. The concern has been retracted. However, there did seem to be a torn page hazard when using checksums. By not setting the heap page LSN during redo, the protections of minRecoveryPoint were bypassed. Fixed, along with a misleading comment. It may have been impossible to hit this problem in practice, because it would require a page tear between the checksum and the flags, so I am marking this as a theoretical risk. But, as discussed, it did violate expectations about the page LSN, so it may have other consequences. Backpatch to all supported versions. Reported-by: Konstantin Knizhnik Reviewed-by: Konstantin Knizhnik Discussion: https://postgr.es/m/fed17dac-8cb8-4f5b-d462-1bb4908c029e@garret.ru Backpatch-through: 11
* Remove obsolete comments and code from prior to f8f4227976.Jeff Davis2022-11-111-20/+2
| | | | | | | | XLogReadBufferForRedo() and XLogReadBufferForRedoExtended() only return BLK_NEEDS_REDO if the record LSN is greater than the page LSN, so the redo routine doesn't need to do the LSN check again. Discussion: https://postgr.es/m/0c37b80e62b1f3007d5a6d1292bd8fa0c275627a.camel@j-davis.com
* Remove redundant breaks in HeapTupleSatisfiesVisibilityAndres Freund2022-11-051-7/+0
| | | | | Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_ZJg_N7zHtWP+JoSY_hrce4+GKioL137Y2c2En-kuXQ7g@mail.gmail.com
* Remove unneeded includes of <sys/stat.h>Michael Paquier2022-11-051-1/+0
| | | | | | | | | | Since bfb9dfd, none of the files updated in this commit have any stat() calls, so these inclusions are not necessary, for the same reasons as 233cf6e. Per discussion with John Naylor. Discussion: https://postgr.es/m/CAFBsxsGGGX7KD6RxbNoSJzuSc8Gz3hOxcfhTOMLB_hJcm68dKQ@mail.gmail.com
* Add doubly linked count list implementationDavid Rowley2022-11-021-14/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | We have various requirements when using a dlist_head to keep track of the number of items in the list. This, traditionally, has been done by maintaining a counter variable in the calling code. Here we tidy this up by adding "dclist", which is very similar to dlist but also keeps track of the number of items stored in the list. Callers may use the new dclist_count() function when they need to know how many items are stored. Obtaining the count is an O(1) operation. For simplicity reasons, dclist and dlist both use dlist_node as their node type and dlist_iter/dlist_mutable_iter as their iterator type. dclists have all of the same functionality as dlists except there is no function named dclist_delete(). To remove an item from a list dclist_delete_from() must be used. This requires knowing which dclist the given item is stored in. Additionally, here we also convert some dlists where additional code exists to keep track of the number of items stored and to make these use dclists instead. Author: David Rowley Reviewed-by: Bharath Rupireddy, Aleksander Alekseev Discussion: https://postgr.es/m/CAApHDvrtVxr+FXEX0VbViCFKDGxA3tWDgw9oFewNXCJMmwLjLg@mail.gmail.com
* Remove AssertArg and AssertStatePeter Eisentraut2022-10-281-1/+1
| | | | | | | | | These don't offer anything over plain Assert, and their usage had already been declared obsolescent. Author: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/20221009210148.GA900071@nathanxps13
* Allow nodeSort to perform Datum sorts for byref typesDavid Rowley2022-10-281-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | Here we add a new 'copy' parameter to tuplesort_getdatum so that we can instruct the function not to datumCopy() byref Datums before returning. Similar to 91e9e89dc, this can provide significant performance improvements in nodeSort when sorting by a single byref column and the sort's targetlist contains only that column. This allows us to re-enable Datum sorts for byref types which was disabled in 3a5817695 due to a reported memory leak. Additionally, here we slightly optimize DISTINCT aggregates so that we no longer perform any datumCopy() when we find the current value not to be distinct from the previous value. Previously the code would always take a copy of the most recent Datum and pfree the previous value, even when the values were the same. Testing shows a small but noticeable performance increase when aggregate transitions are skipped due to the current transition value being the same as the prior one. Author: David Rowley Discussion: https://postgr.es/m/CAApHDvqS6wC5U==k9Hd26E4EQXH3QR67-T4=Q1rQ36NGvjfVSg@mail.gmail.com Discussion: https://postgr.es/m/CAApHDvqHonfe9G1cVaKeHbDx70R_zCrM3qP2AGXpGrieSKGnhA@mail.gmail.com
* Rename shadowed local variablesDavid Rowley2022-10-051-9/+9
| | | | | | | | | | | | In a similar effort to f01592f91, here we mostly rename shadowed local variables to remove the warnings produced when compiling with -Wshadow=compatible-local. This fixes 63 warnings and leaves just 5. Author: Justin Pryzby, David Rowley Reviewed-by: Justin Pryzby Discussion https://postgr.es/m/20220817145434.GC26426%40telsasoft.com
* Avoid improbable PANIC during heap_update, redux.Tom Lane2022-09-301-18/+23
| | | | | | | | | | | | | | | | | | | | Commit 34f581c39 intended to ensure that RelationGetBufferForTuple would acquire a visibility-map page pin in case the otherBuffer's all-visible bit had become set since we last had lock on that page. But I missed a case: when we're extending the relation, VM concerns were dealt with only in the relatively-less-likely case that we fail to conditionally lock the otherBuffer. I think I'd believed that we couldn't need to worry about it if the conditional lock succeeds, which is true for the target buffer; but the otherBuffer was unlocked for awhile so its bit might be set anyway. So we need to do the GetVisibilityMapPins dance, and then also recheck the page's free space, in both cases. Per report from Jaime Casanova. Back-patch to v12 as the previous patch was (although there's still no evidence that the bug is reachable pre-v14). Discussion: https://postgr.es/m/E1lWLjP-00006Y-Ml@gemulon.postgresql.org
* Restore pg_pread and friends.Thomas Munro2022-09-291-1/+1
| | | | | | | | | | | | | | | | | | | | Commits cf112c12 and a0dc8271 were a little too hasty in getting rid of the pg_ prefixes where we use pread(), pwrite() and vectored variants. We dropped support for ancient Unixes where we needed to use lseek() to implement replacements for those, but it turns out that Windows also changes the current position even when you pass in an offset to ReadFile() and WriteFile() if the file handle is synchronous, despite its documentation saying otherwise. Switching to asynchronous file handles would fix that, but have other complications. For now let's just put back the pg_ prefix and add some comments to highlight the non-standard side-effect, which we can now describe as Windows-only. Reported-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/20220923202439.GA1156054%40nathanxps13
* Fix race condition where heap_delete() fails to pin VM page.Jeff Davis2022-09-221-11/+19
| | | | | | | | | | Similar to 5f12bc94dc, the code must re-check PageIsAllVisible() after buffer lock is re-acquired. Backpatching to the same version, 12. Discussion: https://postgr.es/m/CAEP4nAw9jYQDKd_5Y+-s2E4YiUJq1vqiikFjYGpLShtp-K3gag@mail.gmail.com Reported-by: Robins Tharakan Reviewed-by: Robins Tharakan Backpatch-through: 12
* meson: Add initial version of meson based build systemAndres Freund2022-09-211-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Autoconf is showing its age, fewer and fewer contributors know how to wrangle it. Recursive make has a lot of hard to resolve dependency issues and slow incremental rebuilds. Our home-grown MSVC build system is hard to maintain for developers not using Windows and runs tests serially. While these and other issues could individually be addressed with incremental improvements, together they seem best addressed by moving to a more modern build system. After evaluating different build system choices, we chose to use meson, to a good degree based on the adoption by other open source projects. We decided that it's more realistic to commit a relatively early version of the new build system and mature it in tree. This commit adds an initial version of a meson based build system. It supports building postgres on at least AIX, FreeBSD, Linux, macOS, NetBSD, OpenBSD, Solaris and Windows (however only gcc is supported on aix, solaris). For Windows/MSVC postgres can now be built with ninja (faster, particularly for incremental builds) and msbuild (supporting the visual studio GUI, but building slower). Several aspects (e.g. Windows rc file generation, PGXS compatibility, LLVM bitcode generation, documentation adjustments) are done in subsequent commits requiring further review. Other aspects (e.g. not installing test-only extensions) are not yet addressed. When building on Windows with msbuild, builds are slower when using a visual studio version older than 2019, because those versions do not support MultiToolTask, required by meson for intra-target parallelism. The plan is to remove the MSVC specific build system in src/tools/msvc soon after reaching feature parity. However, we're not planning to remove the autoconf/make build system in the near future. Likely we're going to keep at least the parts required for PGXS to keep working around until all supported versions build with meson. Some initial help for postgres developers is at https://wiki.postgresql.org/wiki/Meson With contributions from Thomas Munro, John Naylor, Stone Tickle and others. Author: Andres Freund <andres@anarazel.de> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Author: Peter Eisentraut <peter@eisentraut.org> Reviewed-By: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Discussion: https://postgr.es/m/20211012083721.hvixq4pnh2pixr3j@alap3.anarazel.de
* Harmonize heapam and tableam parameter names.Peter Geoghegan2022-09-193-37/+37
| | | | | | | | | | | | | | | | Make sure that function declarations use names that exactly match the corresponding names from function definitions. Having parameter names that are reliably consistent in this way will make it easier to reason about groups of related C functions from the same translation unit as a module. It will also make certain refactoring tasks easier. Like other recent commits that cleaned up function parameter names, this commit was written with help from clang-tidy. Later commits will do the same for other parts of the codebase. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
* Instrument freezing in autovacuum log reports.Peter Geoghegan2022-09-081-7/+20
| | | | | | | | | | | Add a new line to log reports from autovacuum (as well as VACUUM VERBOSE output) that shows information about freezing. Emphasis is placed on the total number of heap pages that had one or more tuples frozen by VACUUM. The total number of tuples frozen is also shown. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Jeff Janes <jeff.janes@gmail.com> Discussion: https://postgr.es/m/CAH2-WznTY6D0zyE8VLrC6Gd4kh_HGAXxnTPtcOQOOsxzLx9zog@mail.gmail.com
* Expand the use of get_dirent_type(), shaving a few calls to stat()/lstat()Michael Paquier2022-09-021-2/+5
| | | | | | | | | | | | | | | | | | | Several backend-side loops scanning one or more directories with ReadDir() (WAL segment recycle/removal in xlog.c, backend-side directory copy, temporary file removal, configuration file parsing, some logical decoding logic and some pgtz stuff) already know the type of the entry being scanned thanks to the dirent structure associated to the entry, on platforms where we know about DT_REG, DT_DIR and DT_LNK to make the difference between a regular file, a directory and a symbolic link. Relying on the direct structure of an entry saves a few system calls to stat() and lstat() in the loops updated here, shaving some code while on it. The logic of the code remains the same, calling stat() or lstat() depending on if it is necessary to look through symlinks. Authors: Nathan Bossart, Bharath Rupireddy Reviewed-by: Andres Freund, Thomas Munro, Michael Paquier Discussion: https://postgr.es/m/CALj2ACV8n-J-f=yiLUOx2=HrQGPSOZM3nWzyQQvLPcccPXxEdg@mail.gmail.com
* Derive freeze cutoff from nextXID, not OldestXmin.Peter Geoghegan2022-08-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before now, the cutoffs that VACUUM used to determine which XIDs/MXIDs to freeze were determined at the start of each VACUUM by taking related cutoffs that represent which XIDs/MXIDs VACUUM should treat as still running, and subtracting an XID/MXID age based value controlled by GUCs like vacuum_freeze_min_age. The FreezeLimit cutoff (XID freeze cutoff) was derived by subtracting an XID age value from OldestXmin, while the MultiXactCutoff cutoff (MXID freeze cutoff) was derived by subtracting an MXID age value from OldestMxact. This approach didn't match the approach used nearby to determine whether this VACUUM operation should be an aggressive VACUUM or not. VACUUM now uses the standard approach instead: it subtracts the same age-based values from next XID/next MXID (rather than subtracting from OldestXmin/OldestMxact). This approach is simpler and more uniform. Most of the time it will have only a negligible impact on how and when VACUUM freezes. It will occasionally make VACUUM more robust in the event of problems caused by long running transaction. These are cases where OldestXmin and OldestMxact are held back by so much that they attain an age that is a significant fraction of the value of age-based settings like vacuum_freeze_min_age. There is no principled reason why freezing should be affected in any way by the presence of a long-running transaction -- at least not before the point that the OldestXmin and OldestMxact limits used by each VACUUM operation attain an age that makes it unsafe to freeze some of the XIDs/MXIDs whose age exceeds the value of the relevant age-based settings. The new approach should at least make freezing degrade more gracefully than before, even in the most extreme cases. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Nathan Bossart <nathandbossart@gmail.com> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkOv5CEeyOO=c91XnT5WBR_0gii0Wn5UbZhJ=4TTykDYg@mail.gmail.com
* Adjust comments that called MultiXactIds "XMIDs".Peter Geoghegan2022-08-291-1/+1
| | | | Oversights in commits 0b018fab and f3c15cbe.
* Remove dead pread and pwrite replacement code.Thomas Munro2022-08-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | pread() and pwrite() are in SUSv2, and all targeted Unix systems have them. Previously, we defined pg_pread and pg_pwrite to emulate these function with lseek() on old Unixen. The names with a pg_ prefix were a reminder of a portability hazard: they might change the current file position. That hazard is gone, so we can drop the prefixes. Since the remaining replacement code is Windows-only, move it into src/port/win32p{read,write}.c, and move the declarations into src/include/port/win32_port.h. No need for vestigial HAVE_PREAD, HAVE_PWRITE macros as they were only used for declarations in port.h which have now moved into win32_port.h. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Greg Stark <stark@mit.edu> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA+hUKGJ3LHeP9w5Fgzdr4G8AnEtJ=z=p6hGDEm4qYGEUX5B6fQ@mail.gmail.com
* Change internal RelFileNode references to RelFileNumber or RelFileLocator.Robert Haas2022-07-064-59/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have been using the term RelFileNode to refer to either (1) the integer that is used to name the sequence of files for a certain relation within the directory set aside for that tablespace/database combination; or (2) that value plus the OIDs of the tablespace and database; or occasionally (3) the whole series of files created for a relation based on those values. Using the same name for more than one thing is confusing. Replace RelFileNode with RelFileNumber when we're talking about just the single number, i.e. (1) from above, and with RelFileLocator when we're talking about all the things that are needed to locate a relation's files on disk, i.e. (2) from above. In the places where we refer to (3) as a relfilenode, instead refer to "relation storage". Since there is a ton of SQL code in the world that knows about pg_class.relfilenode, don't change the name of that column, or of other SQL-facing things that derive their name from it. On the other hand, do adjust closely-related internal terminology. For example, the structure member names dbNode and spcNode appear to be derived from the fact that the structure itself was called RelFileNode, so change those to dbOid and spcOid. Likewise, various variables with names like rnode and relnode get renamed appropriately, according to how they're being used in context. Hopefully, this is clearer than before. It is also preparation for future patches that intend to widen the relfilenumber fields from its current width of 32 bits. Variables that store a relfilenumber are now declared as type RelFileNumber rather than type Oid; right now, these are the same, but that can now more easily be changed. Dilip Kumar, per an idea from me. Reviewed also by Andres Freund. I fixed some whitespace issues, changed a couple of words in a comment, and made one other minor correction. Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
* Revert changes in HOT handling of BRIN indexesTomas Vondra2022-06-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | This reverts commits 5753d4ee32 and fe60b67250 that modified HOT to ignore BRIN indexes. The commit message for 5753d4ee32 claims that: When determining whether an index update may be skipped by using HOT, we can ignore attributes indexed only by BRIN indexes. There are no index pointers to individual tuples in BRIN, and the page range summary will be updated anyway as it relies on visibility info. This is partially incorrect - it's true BRIN indexes don't point to individual tuples, so HOT chains are not an issue, but the visibitlity info is not sufficient to keep the index up to date. This can easily result in corrupted indexes, as demonstrated in the hackers thread. This does not mean relaxing the HOT restrictions for BRIN is a lost cause, but it needs to handle the two aspects (allowing HOT chains and updating the page range summaries) as separate. But that requires a major changes, and it's too late for that in the current dev cycle. Reported-by: Tomas Vondra Discussion: https://postgr.es/m/05ebcb44-f383-86e3-4f31-0a97a55634cf@enterprisedb.com
* Pre-beta mechanical code beautification.Tom Lane2022-05-122-13/+13
| | | | | Run pgindent, pgperltidy, and reformat-dat-files. I manually fixed a couple of comments that pgindent uglified.
* vacuumlazy.c: MultiXactIds are MXIDs, not XMIDs.Peter Geoghegan2022-04-201-3/+3
| | | | Oversights in commits 0b018fab and f3c15cbe.
* Fix multi-table VACUUM VERBOSE accounting.Peter Geoghegan2022-04-151-9/+15
| | | | | | | | | | | | | | | | Per-backend global variables like VacuumPageHit are initialized once per VACUUM command. This was missed by commit 49c9d9fc, which unified VACUUM VERBOSE and autovacuum logging. As a result of that oversight, incorrect values were shown when multiple relations were processed by a single VACUUM VERBOSE command. Relations that happened to be processed later on would show "buffer usage:" values that incorrectly included buffer accesses made while processing earlier unrelated relations. The same accesses were counted multiple times. To fix, take initial values for the tracker variables at the start of heap_vacuum_rel(), and report delta values later on.
* VACUUM VERBOSE: Show dead items for an empty table.Peter Geoghegan2022-04-151-20/+18
| | | | | | | | | | | | Be consistent about the lines that VACUUM VERBOSE outputs by including an "index scan not needed: " line for completely empty tables. This makes the output more readable, especially with multiple distinct VACUUM operations processed by the same VACUUM command. It's also more consistent; even empty tables can use the failsafe, which wasn't reported in the standard way until now. Follow-up to commit 6e20f460, which taught VACUUM VERBOSE to be more consistent about reporting on scanned pages with empty tables.
* Adjust VACUUM's removable cutoff log message.Peter Geoghegan2022-04-151-19/+15
| | | | | | | | | | | | | | | | | | | | | The age of OldestXmin (a.k.a. "removable cutoff") when VACUUM ends often indicates the approximate number of XIDs consumed while VACUUM ran. However, there is at least one important exception: the cutoff could be held back by a snapshot that was acquired before our VACUUM even began. Successive VACUUM operations may even use exactly the same old cutoff in extreme cases involving long held snapshots. The log messages that described how removable cutoff aged (which were added by commit 872770fd) created the impression that we were reporting on how VACUUM's usable cutoff advanced while VACUUM ran, which was misleading in these extreme cases. Fix by using a more general wording. Per gripe from Tom Lane. In passing, relocate related instrumentation code for clarity. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/1643035.1650035653@sss.pgh.pa.us
* Prevent access to no-longer-pinned buffer in heapam_tuple_lock().Tom Lane2022-04-132-19/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | heap_fetch() used to have a "keep_buf" parameter that told it to return ownership of the buffer pin to the caller after finding that the requested tuple TID exists but is invisible to the specified snapshot. This was thoughtlessly removed in commit 5db6df0c0, which broke heapam_tuple_lock() (formerly EvalPlanQualFetch) because that function needs to do more accesses to the tuple even if it's invisible. The net effect is that we would continue to touch the page for a microsecond or two after releasing pin on the buffer. Usually no harm would result; but if a different session decided to defragment the page concurrently, we could see garbage data and mistakenly conclude that there's no newer tuple version to chain up to. (It's hard to say whether this has happened in the field. The bug was actually found thanks to a later change that allowed valgrind to detect accesses to non-pinned buffers.) The most reasonable way to fix this is to reintroduce keep_buf, although I made it behave slightly differently: buffer ownership is passed back only if there is a valid tuple at the requested TID. In HEAD, we can just add the parameter back to heap_fetch(). To avoid an API break in the back branches, introduce an additional function heap_fetch_extended() in those branches. In HEAD there is an additional, less obvious API change: tuple->t_data will be set to NULL in all cases where buffer ownership is not returned, in particular when the tuple exists but fails the time qual (and !keep_buf). This is to defend against any other callers attempting to access non-pinned buffers. We concluded that making that change in back branches would be more likely to introduce problems than cure any. In passing, remove a comment about heap_fetch that was obsoleted by 9a8ee1dc6. Per bug #17462 from Daniil Anisimov. Back-patch to v12 where the bug was introduced. Discussion: https://postgr.es/m/17462-9c98a0f00df9bd36@postgresql.org
* Remove extraneous blank lines before block-closing bracesAlvaro Herrera2022-04-132-2/+0
| | | | | | | | | These are useless and distracting. We wouldn't have written the code with them to begin with, so there's no reason to keep them. Author: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com Discussion: https://postgr.es/m/attachment/133167/0016-Extraneous-blank-lines.patch
* Make XLogRecGetBlockTag() throw error if there's no such block.Tom Lane2022-04-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | All but a few existing callers assume without checking that this function succeeds. While it probably will, that's a poor excuse for not checking. Let's make it return void and instead throw an error if it doesn't find the block reference. Callers that actually need to handle the no-such-block case must now use the underlying function XLogRecGetBlockTagExtended. In addition to being a bit less error-prone, this should also serve to suppress some Coverity complaints about XLogRecGetBlockRefInfo. While at it, clean up some inconsistency about use of the XLogRecHasBlockRef macro: make XLogRecGetBlockTagExtended use that instead of open-coding the same condition, and avoid calling XLogRecHasBlockRef twice in relevant code paths. (That is, calling XLogRecHasBlockRef followed by XLogRecGetBlockTag is now deprecated: use XLogRecGetBlockTagExtended instead.) Patch HEAD only; this doesn't seem to have enough value to consider a back-branch API break. Discussion: https://postgr.es/m/425039.1649701221@sss.pgh.pa.us
* Remove comment about historic heap vacuuming issue.Peter Geoghegan2022-04-111-7/+0
| | | | | | | | | Remove comment block about how heap page vacuuming used to set tuples with storage to LP_UNUSED in a rare edge case that can no longer happen following commit 8523492d4e. The comments seem unnecessary now, since it's now generally clear that heap vacuuming only applies to LP_DEAD items from VACUUM's first heap pass following more recent work from commits 12b5ade902 and 4f8d9d1217.
* Fix various typos and spelling mistakes in code commentsDavid Rowley2022-04-111-1/+1
| | | | | Author: Justin Pryzby Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com
* Truncate line pointer array during heap pruning.Peter Geoghegan2022-04-072-4/+11
| | | | | | | | | | | | | | | | | | | Reclaim space from the line pointer array when heap pruning leaves behind a contiguous group of LP_UNUSED items at the end of the array. This happens during subsequent page defragmentation. Certain kinds of heap line pointer bloat are ameliorated by this new optimization. Follow-up work to commit 3c3b8a4b26, which taught VACUUM to truncate the line pointer array in about the same way during VACUUM's second pass over the heap. We now apply line pointer array truncation during both the first and the second pass over the heap made by VACUUM. We can also perform line pointer array truncation during opportunistic pruning. Matthias van de Meent, with small tweaks by me. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAEze2WjgaQc55Y5f5CQd3L=eS5CZcff2Obxp=O6pto8-f0hC4w@mail.gmail.com Discussion: https://postgr.es/m/CAEze2Wg36%2B4at2eWJNcYNiW2FJmht34x3YeX54ctUSs7kKoNcA%40mail.gmail.com
* pgstat: stats collector references in comments.Andres Freund2022-04-062-6/+6
| | | | | | | | | | | | | | | | | | Soon the stats collector will be no more, with statistics instead getting stored in shared memory. There are a lot of references to the stats collector in comments. This commit replaces most of these references with "cumulative statistics system", with the remaining ones getting replaced as part of subsequent commits. This is done separately from the - quite large - shared memory statistics patch to make review easier. Author: Andres Freund <andres@anarazel.de> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de
* vacuumlazy.c: Further consolidate resource allocation.Peter Geoghegan2022-04-041-42/+30
| | | | | | | | | Move remaining VACUUM resource allocation and deallocation code from lazy_scan_heap() to its caller, heap_vacuum_rel(). This finishes off work started by commit 73f6ec3d. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wzk3fNBa_S3Ngi+16GQiyJ=AmUu3oUY99syMDTMRxitfyQ@mail.gmail.com
* Adjust tuplesort API to have bitwise option flagsDavid Rowley2022-04-041-1/+1
| | | | | | | | | | | | | | | | This replaces the bool flag for randomAccess. An upcoming patch requires adding another option, so instead of breaking the API for that, then breaking it again one day if we add more options, let's just break it once. Any boolean options we add in the future will just make use of an unused bit in the flags. Any extensions making use of tuplesorts will need to update their code to pass TUPLESORT_RANDOMACCESS instead of true for randomAccess. TUPLESORT_NONE can be used for a set of empty options. Author: David Rowley Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/CAApHDvoH4ASzsAOyHcxkuY01Qf%2B%2B8JJ0paw%2B03dk%2BW25tQEcNQ%40mail.gmail.com
* Generalize how VACUUM skips all-frozen pages.Peter Geoghegan2022-04-031-163/+146
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Non-aggressive VACUUMs were at a gratuitous disadvantage (relative to aggressive VACUUMs) around advancing relfrozenxid and relminmxid before now. The issue only came up when concurrent activity unset some heap page's visibility map bit right as VACUUM was considering if the page should get counted in frozenskipped_pages. The non-aggressive case would recheck the all-frozen bit at this point. The aggressive case reasoned that the page (a skippable page) must have at least been all-frozen in the recent past, so skipping it won't make relfrozenxid advancement unsafe (which is never okay for aggressive VACUUMs). The recheck created a window for some other backend to confuse matters for VACUUM. If the page's VM bit turned out to be unset, VACUUM would conclude that the page was _never_ all-frozen. frozenskipped_pages was not incremented, and yet VACUUM couldn't back out of skipping at this late stage (it couldn't choose to scan the page instead). This made it unsafe to advance relfrozenxid later on. Consistently avoid the issue by generalizing how we skip frozen pages during aggressive VACUUMs: take the same approach when skipping any skippable page range during aggressive and non-aggressive VACUUMs alike. The new approach makes ranges (not individual pages) the fundamental unit of skipping using the visibility map. frozenskipped_pages is replaced with a boolean flag that represents whether some skippable range with one or more all-visible pages was actually skipped. It is safe for VACUUM to treat a page as all-frozen provided it at least had its all-frozen bit set after the OldestXmin cutoff was established. VACUUM is only required to scan pages that might have XIDs < OldestXmin (unfrozen XIDs) to be able to safely advance relfrozenxid. Tuples concurrently inserted on "skipped" pages can be thought of as equivalent to tuples concurrently inserted on a block >= rel_pages. It's possible that the issue this commit fixes hardly ever came up in practice. But we only had to be unlucky once to lose out on advancing relfrozenxid -- a single affected heap page was enough to throw VACUUM off. That seems like something to avoid on general principle. This is similar to an issue fixed by commit 44fa8488, which taught vacuumlazy.c to not give up on non-aggressive relfrozenxid advancement just because a cleanup lock wasn't immediately available on some heap page. Skipping an all-visible range is now explicitly structured as a choice made by non-aggressive VACUUMs, by weighing known costs (scanning extra skippable pages to freeze their tuples early) against known benefits (advancing relfrozenxid early). This works in essentially the same way as it always has (don't skip ranges < SKIP_PAGES_THRESHOLD). We could do much better here in the future by considering other relevant factors. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-Wzn6bGJGfOy3zSTJicKLw99PHJeSOQBOViKjSCinaxUKDQ@mail.gmail.com Discussion: https://postgr.es/m/CA%2BTgmoZiSOY6H7aadw5ZZGm7zYmfDzL6nwmL5V7GL4HgJgLF_w%40mail.gmail.com
* Set relfrozenxid to oldest extant XID seen by VACUUM.Peter Geoghegan2022-04-032-158/+330
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When VACUUM set relfrozenxid before now, it set it to whatever value was used to determine which tuples to freeze -- the FreezeLimit cutoff. This approach was very naive. The relfrozenxid invariant only requires that new relfrozenxid values be <= the oldest extant XID remaining in the table (at the point that the VACUUM operation ends), which in general might be much more recent than FreezeLimit. VACUUM now carefully tracks the oldest remaining XID/MultiXactId as it goes (the oldest remaining values _after_ lazy_scan_prune processing). The final values are set as the table's new relfrozenxid and new relminmxid in pg_class at the end of each VACUUM. The oldest XID might come from a tuple's xmin, xmax, or xvac fields. It might even come from one of the table's remaining MultiXacts. Final relfrozenxid values must still be >= FreezeLimit in an aggressive VACUUM (FreezeLimit still acts as a lower bound on the final value that aggressive VACUUM can set relfrozenxid to). Since standard VACUUMs still make no guarantees about advancing relfrozenxid, they might as well set relfrozenxid to a value from well before FreezeLimit when the opportunity presents itself. In general standard VACUUMs may now set relfrozenxid to any value > the original relfrozenxid and <= OldestXmin. Credit for the general idea of using the oldest extant XID to set pg_class.relfrozenxid at the end of VACUUM goes to Andres Freund. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAH2-WzkymFbz6D_vL+jmqSn_5q1wsFvFrE+37yLgL_Rkfd6Gzg@mail.gmail.com
* vacuumlazy.c: Clean up variable declarations.Peter Geoghegan2022-04-021-25/+23
| | | | | | | Move some of the heap_vacuum_rel() instrumentation related variables to the scope where they're actually needed. Also reorder some of the variable declarations at the start of heap_vacuum_rel() so that related variables appear together.