summaryrefslogtreecommitdiff
path: root/contrib/pageinspect/expected
Commit message (Collapse)AuthorAgeFilesLines
* pageinspect: Fix crash with gist_page_items()Michael Paquier2023-03-022-7/+11
| | | | | | | | | | | | | | | | | | Attempting to use this function with a raw page not coming from a GiST index would cause a crash, as it was missing the same sanity checks as gist_page_items_bytea(). This slightly refactors the code so as all the basic validation checks for GiST pages are done in a single routine, in the same fashion as the pageinspect functions for hash and BRIN. This fixes an issue similar to 076f4d9. A test is added to stress for this case. While on it, I have added a similar test for brin_page_items() with a combination make of a valid GiST index and a raw btree page. This one was already protected, but it was not tested. Reported-by: Egor Chindyaskin Author: Dmitry Koval Discussion: https://postgr.es/m/17815-fc4a2d3b74705703@postgresql.org Backpatch-through: 14
* Add bt_multi_page_stats() function to contrib/pageinspect.Tom Lane2023-01-021-4/+117
| | | | | | | | | | | | | | This is like the existing bt_page_stats() function, but it can report on a range of pages rather than just one at a time. I don't have a huge amount of faith in the portability of the new test cases, but they do pass in a 32-bit FreeBSD VM here. Further adjustment may be needed depending on buildfarm results. Hamid Akhtar, reviewed by Naeem Akhter, Bertrand Drouvot, Bharath Rupireddy, and myself Discussion: https://postgr.es/m/CANugjht-=oGMRmNJKMqnBC69y7vr+wHDmm0ZK6-1pJsxoBKBbA@mail.gmail.com
* Prevent instability in contrib/pageinspect's regression test.Tom Lane2022-11-211-1/+2
| | | | | | | | | | | | | | | | | | pageinspect has occasionally failed on slow buildfarm members, with symptoms indicating that the expected effects of VACUUM FREEZE didn't happen. This is presumably because a background transaction such as auto-analyze was holding back global xmin. We can work around that by using a temp table in the test. Since commit a7212be8b, that will use an up-to-date cutoff xmin regardless of other processes. And pageinspect itself shouldn't really care whether the table is temp. Back-patch to v14. There would be no point in older branches without back-patching a7212be8b, which seems like more trouble than the problem is worth. Discussion: https://postgr.es/m/2892135.1668976646@sss.pgh.pa.us
* pageinspect: Fix handling of all-zero pagesMichael Paquier2022-04-146-0/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Getting from get_raw_page() an all-zero page is considered as a valid case by the buffer manager and it can happen for example when finding a corrupted page with zero_damaged_pages enabled (using zero_damaged_pages to look at corrupted pages happens), or after a crash when a relation file is extended before any WAL for its new data is generated (before a vacuum or autovacuum job comes in to do some cleanup). However, all the functions of pageinspect, as of the index AMs (except hash that has its own idea of new pages), heap, the FSM or the page header have never worked with all-zero pages, causing various crashes when going through the page internals. This commit changes all the pageinspect functions to be compliant with all-zero pages, where the choice is made to return NULL or no rows for SRFs when finding a new page. get_raw_page() still works the same way, returning a batch of zeros in the bytea of the page retrieved. A hard error could be used but NULL, while more invasive, is useful when scanning relation files in full to get a batch of results for a single relation in one query. Tests are added for all the code paths impacted. Reported-by: Daria Lepikhova Author: Michael Paquier Discussion: https://postgr.es/m/561e187b-3549-c8d5-03f5-525c14e65bd0@postgrespro.ru Backpatch-through: 10
* pageinspect: Add more sanity checks to prevent out-of-bound readsMichael Paquier2022-03-275-11/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A couple of code paths use the special area on the page passed by the function caller, expecting to find some data in it. However, feeding an incorrect page can lead to out-of-bound reads when trying to access the page special area (like a heap page that has no special area, leading PageGetSpecialPointer() to grab a pointer outside the allocated page). The functions used for hash and btree indexes have some protection already against that, while some other functions using a relation OID as argument would make sure that the access method involved is correct, but functions taking in input a raw page without knowing the relation the page is attached to would run into problems. This commit improves the set of checks used in the code paths of BRIN, btree (including one check if a leaf page is found with a non-zero level), GIN and GiST to verify that the page given in input has a special area size that fits with each access method, which is done though PageGetSpecialSize(), becore calling PageGetSpecialPointer(). The scope of the checks done is limited to work with pages that one would pass after getting a block with get_raw_page(), as it is possible to craft byteas that could bypass existing code paths. Having too many checks would also impact the usability of pageinspect, as the existing code is very useful to look at the content details in a corrupted page, so the focus is really to avoid out-of-bound reads as this is never a good thing even with functions whose execution is limited to superusers. The safest approach could be to rework the functions so as these fetch a block using a relation OID and a block number, but there are also cases where using a raw page is useful. Tests are added to cover all the code paths that needed such checks, and an error message for hash indexes is reworded to fit better with what this commit adds. Reported-By: Alexander Lakhin Author: Julien Rouhaud, Michael Paquier Discussion: https://postgr.es/m/16527-ef7606186f0610a1@postgresql.org Discussion: https://postgr.es/m/561e187b-3549-c8d5-03f5-525c14e65bd0@postgrespro.ru Backpatch-through: 10
* pageinspect: Fix handling of page sizes and AM typesMichael Paquier2022-03-166-0/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit fixes a set of issues related to the use of the SQL functions in this module when the caller is able to pass down raw page data as input argument: - The page size check was fuzzy in a couple of places, sometimes looking after only a sub-range, but what we are looking for is an exact match on BLCKSZ. After considering a few options here, I have settled down to do a generalization of get_page_from_raw(). Most of the SQL functions already used that, and this is not strictly required if not accessing an 8-byte-wide value from a raw page, but this feels safer in the long run for alignment-picky environment, particularly if a code path begins to access such values. This also reduces the number of strings that need to be translated. - The BRIN function brin_page_items() uses a Relation but it did not check the access method of the opened index, potentially leading to crashes. All the other functions in need of a Relation already did that. - Some code paths could fail on elog(), but we should to use ereport() for failures that can be triggered by the user. Tests are added to stress all the cases that are fixed as of this commit, with some junk raw pages (\set VERBOSITY ensures that this works across all page sizes) and unexpected index types when functions open relations. Author: Michael Paquier, Justin Prysby Discussion: https://postgr.es/m/20220218030020.GA1137@telsasoft.com Backpatch-through: 10
* Fix collection of typos in the code and the documentationMichael Paquier2022-03-151-1/+1
| | | | | | | | Some words were duplicated while other places were grammatically incorrect, including one variable name in the code. Author: Otto Kekalainen, Justin Pryzby Discussion: https://postgr.es/m/7DDBEFC5-09B6-4325-B942-B563D1A24BDC@amazon.com
* Reduce non-leaf keys overlap in GiST indexes produced by a sorted buildAlexander Korotkov2022-02-071-10/+8
| | | | | | | | | | | | | | | | | | The GiST sorted build currently chooses split points according to the only page space utilization. That may lead to higher non-leaf keys overlap and, in turn, slower search query answers. This commit makes the sorted build use the opclass's picksplit method. Once four pages at the level are accumulated, the picksplit method is applied until each split partition fits the page. Some of our split algorithms could show significant performance degradation while processing 4-times more data at once. But those opclasses haven't received the sorted build support and shouldn't receive it before their split algorithms are improved. Discussion: https://postgr.es/m/CAHqSB9jqtS94e9%3D0vxqQX5dxQA89N95UKyz-%3DA7Y%2B_YJt%2BVW5A%40mail.gmail.com Author: Aliaksandr Kalenik, Sergei Shoulbakov, Andrey Borodin Reviewed-by: Björn Harrtell, Darafei Praliaskouski, Andres Freund Reviewed-by: Alexander Korotkov
* pageinspect: Improve page_header() for pages of 32kBMichael Paquier2021-07-121-0/+16
| | | | | | | | | | | | | | | ld_upper, ld_lower, pd_special and the page size have been using smallint as return type, which could cause those fields to return negative values in certain cases for builds configures with a page size of 32kB. Bump pageinspect to 1.10. page_header() is able to handle the correct return type of those fields at runtime when using an older version of the extension, with some tests are added to cover that. Author: Quan Zongliang Reviewed-by: Michael Paquier, Bharath Rupireddy Discussion: https://postgr.es/m/8b8ec36e-61fe-14f9-005d-07bc85aa4eed@yeah.net
* Improve error messages about mismatching relkindPeter Eisentraut2021-07-081-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | Most error messages about a relkind that was not supported or appropriate for the command was of the pattern "relation \"%s\" is not a table, foreign table, or materialized view" This style can become verbose and tedious to maintain. Moreover, it's not very helpful: If I'm trying to create a comment on a TOAST table, which is not supported, then the information that I could have created a comment on a materialized view is pointless. Instead, write the primary error message shorter and saying more directly that what was attempted is not possible. Then, in the detail message, explain that the operation is not supported for the relkind the object was. To simplify that, add a new function errdetail_relkind_not_supported() that does this. In passing, make use of RELKIND_HAS_STORAGE() where appropriate, instead of listing out the relkinds individually. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/flat/dc35a398-37d0-75ce-07ea-1dd71d98f8ec@2ndquadrant.com
* Use full 64-bit XIDs in deleted nbtree pages.Peter Geoghegan2021-02-241-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise we risk "leaking" deleted pages by making them non-recyclable indefinitely. Commit 6655a729 did the same thing for deleted pages in GiST indexes. That work was used as a starting point here. Stop storing an XID indicating the oldest bpto.xact across all deleted though unrecycled pages in nbtree metapages. There is no longer any reason to care about that condition/the oldest XID. It only ever made sense when wraparound was something _bt_vacuum_needs_cleanup() had to consider. The btm_oldest_btpo_xact metapage field has been repurposed and renamed. It is now btm_last_cleanup_num_delpages, which is used to remember how many non-recycled deleted pages remain from the last VACUUM (in practice its value is usually the precise number of pages that were _newly deleted_ during the specific VACUUM operation that last set the field). The general idea behind storing btm_last_cleanup_num_delpages is to use it to give _some_ consideration to non-recycled deleted pages inside _bt_vacuum_needs_cleanup() -- though never too much. We only really need to avoid leaving a truly excessive number of deleted pages in an unrecycled state forever. We only do this to cover certain narrow cases where no other factor makes VACUUM do a full scan, and yet the index continues to grow (and so actually misses out on recycling existing deleted pages). These metapage changes result in a clear user-visible benefit: We no longer trigger full index scans during VACUUM operations solely due to the presence of only 1 or 2 known deleted (though unrecycled) blocks from a very large index. All that matters now is keeping the costs and benefits in balance over time. Fix an issue that has been around since commit 857f9c36, which added the "skip full scan of index" mechanism (i.e. the _bt_vacuum_needs_cleanup() logic). The accuracy of btm_last_cleanup_num_heap_tuples accidentally hinged upon _when_ the source value gets stored. We now always store btm_last_cleanup_num_heap_tuples in btvacuumcleanup(). This fixes the issue because IndexVacuumInfo.num_heap_tuples (the source field) is expected to accurately indicate the state of the table _after_ the VACUUM completes inside btvacuumcleanup(). A backpatchable fix cannot easily be extracted from this commit. A targeted fix for the issue will follow in a later commit, though that won't happen today. I (pgeoghegan) have chosen to remove any mention of deleted pages in the documentation of the vacuum_cleanup_index_scale_factor GUC/param, since the presence of deleted (though unrecycled) pages is no longer of much concern to users. The vacuum_cleanup_index_scale_factor description in the docs now seems rather unclear in any case, and it should probably be rewritten in the near future. Perhaps some passing mention of page deletion will be added back at the same time. Bump XLOG_PAGE_MAGIC due to nbtree WAL records using full XIDs now. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznpdHvujGUwYZ8sihX=d5u-tRYhi-F4wnV2uN2zHpMUXw@mail.gmail.com
* Add "LP_DEAD item?" column to GiST pageinspect functionsPeter Geoghegan2021-02-141-16/+16
| | | | | | | | | | This brings gist_page_items() and gist_page_items_bytea() in line with nbtree's bt_page_items() function. Minor follow-up to commit 756ab291, which added the GiST functions. Author: Andrey Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/E0794687-7315-4C29-A9C7-EC54D448596D@yandex-team.ru
* Disable vacuum page skipping in selected test cases.Tom Lane2021-01-201-13/+3
| | | | | | | | | | | | | | | | | | By default VACUUM will skip pages that it can't immediately get exclusive access to, which means that even activities as harmless and unpredictable as checkpoint buffer writes might prevent a page from being processed. Ordinarily this is no big deal, but we have a small number of test cases that examine the results of VACUUM's processing and therefore will fail if the page of interest is skipped. This seems to be the explanation for some rare buildfarm failures. To fix, add the DISABLE_PAGE_SKIPPING option to the VACUUM commands in tests where this could be an issue. In passing, remove a duplicated query in pageinspect/sql/page.sql. Back-patch as necessary (some of these cases are as old as v10). Discussion: https://postgr.es/m/413923.1611006484@sss.pgh.pa.us
* pageinspect: Change block number arguments to bigintPeter Eisentraut2021-01-195-0/+55
| | | | | | | | | | | | | | | | | | | | | | Block numbers are 32-bit unsigned integers. Therefore, the smallest SQL integer type that they can fit in is bigint. However, in the pageinspect module, most input and output parameters dealing with block numbers were declared as int. The behavior with block numbers larger than a signed 32-bit integer was therefore dubious. Change these arguments to type bigint and add some more explicit error checking on the block range. (Other contrib modules appear to do this correctly already.) Since we are changing argument types of existing functions, in order to not misbehave if the binary is updated before the extension is updated, we need to create new C symbols for the entry points, similar to how it's done in other extensions as well. Reported-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/d8f6bdd536df403b9b33816e9f7e0b9d@G08CNEXMBPEKD05.g08.fujitsu.local
* Fix test failure with wal_level=minimal.Heikki Linnakangas2021-01-131-0/+10
| | | | | | | | | | | | The newly-added gist pageinspect test prints the LSNs of GiST pages, expecting them all to be 1 (GistBuildLSN). But with wal_level=minimal, they got updated by the whole-relation WAL-logging at commit. Fix by wrapping the problematic tests in the same transaction with the CREATE INDEX. Per buildfarm failure on thorntail. Discussion: https://www.postgresql.org/message-id/3B4F97E5-40FB-4142-8CAA-B301CDFBF982%40iki.fi
* Fix portability issues in the new gist pageinspect test.Heikki Linnakangas2021-01-131-40/+12
| | | | | | | | | | | | | | 1. The raw bytea representation of the point-type keys used in the test depends on endianess. Remove the raw key_data column from the test. 2. The items stored on non-leftmost gist page depends on how many items git on the other pages. This showed up as a failure on 32-bit i386 systems. To fix, only test the gist_page_items() function on the leftmost leaf page. Per Andrey Borodin and the buildfarm. Discussion: https://www.postgresql.org/message-id/9FCEC1DC-86FB-4A57-88EF-DD13663B36AF%40yandex-team.ru
* Add functions to 'pageinspect' to inspect GiST indexes.Heikki Linnakangas2021-01-131-0/+87
| | | | | Author: Andrey Borodin and me Discussion: https://www.postgresql.org/message-id/3E4F9093-A1B5-4DF8-A292-0B48692E3954%40yandex-team.ru
* Add an explicit test to catch changes in checksumming calculations.Tom Lane2020-03-082-0/+80
| | | | | | | | Seems like a good idea in view of 006517432 and addd034ae. Michael Paquier, Tom Lane Discussion: https://postgr.es/m/20200306075230.GA118430@paquier.xyz
* Teach pageinspect about nbtree deduplication.Peter Geoghegan2020-02-291-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | Add a new bt_metap() column to display the metapage's allequalimage field. Also add three new columns to contrib/pageinspect's bt_page_items() function: * Add a boolean column ("dead") that displays the LP_DEAD bit value for each non-pivot tuple. * Add a TID column ("htid") that displays a single heap TID value for each tuple. This is the TID that is returned by BTreeTupleGetHeapTID(), so comparable values are shown for pivot tuples, plain non-pivot tuples, and posting list tuples. * Add a TID array column ("tids") that displays TIDs from each tuple's posting list, if any. This works just like the "tids" column from pageinspect's gin_leafpage_items() function. No version bump for the pageinspect extension, since there hasn't been a stable Postgres release since the last version bump (the last bump was part of commit 58b4cb30). Author: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-WzmSMmU2eNvY9+a4MNP+z02h6sa-uxZvN3un6jY02ZVBSw@mail.gmail.com
* Redesign pageinspect function printing infomask bitsMichael Paquier2019-09-191-128/+46
| | | | | | | | | | | | | | | | | | | After more discussion, the new function added by ddbd5d8 could have been designed in a better way. Based on an idea from Álvaro, instead of returning one column which includes both the raw and combined flags, use two columns, with one for the raw flags and one for the combined flags. This also takes care of some issues with HEAP_LOCKED_UPGRADED and HEAP_XMAX_IS_LOCKED_ONLY which are not really combined flags as they depend on conditions defined by other raw bits, as mentioned by Amit. While on it, fix an extra issue with combined flags. A combined flag was returned if at least one of its bits was set, but all its bits need to be set to include it in the result. Author: Michael Paquier Reviewed-by: Álvaro Herrera, Amit Kapila Discussion: https://postgr.es/m/20190913114950.GA3824@alvherre.pgsql
* Add to pageinspect function to make t_infomask/t_infomask2 human-readableMichael Paquier2019-09-121-0/+180
| | | | | | | | | | | | | | | | | Flags of t_infomask and t_infomask2 for each tuple are already included in the information returned by heap_page_items as integers, and we lacked a way to make that information human-readable. Per discussion, the function includes an option which controls if combined flags should be decomposed or not. The default is false, to not decompose combined flags. The module is bumped to version 1.8. Author: Craig Ringer, Sawada Masahiko Reviewed-by: Peter Geoghegan, Robert Haas, Álvaro Herrera, Moon Insung, Amit Kapila, Michael Paquier, Tomas Vondra Discussion: https://postgr.es/m/CAMsr+YEY7jeaXOb+oX+RhDyOFuTMdmHjGsBxL=igCm03J0go9Q@mail.gmail.com
* Revert "Avoid the creation of the free space map for small heap relations".Amit Kapila2019-05-071-26/+38
| | | | | | | | | | | | | | | | | | | | | This feature was using a process local map to track the first few blocks in the relation. The map was reset each time we get the block with enough freespace. It was discussed that it would be better to track this map on a per-relation basis in relcache and then invalidate the same whenever vacuum frees up some space in the page or when FSM is created. The new design would be better both in terms of API design and performance. List of commits reverted, in reverse chronological order: 06c8a5090e Improve code comments in b0eaa4c51b. 13e8643bfc During pg_upgrade, conditionally skip transfer of FSMs. 6f918159a9 Add more tests for FSM. 9c32e4c350 Clear the local map when not used. 29d108cdec Update the documentation for FSM behavior.. 08ecdfe7e5 Make FSM test portable. b0eaa4c51b Avoid creation of the free space map for small heap relations. Discussion: https://postgr.es/m/20190416180452.3pm6uegx54iitbt5@alap3.anarazel.de
* Make heap TID a tiebreaker nbtree index column.Peter Geoghegan2019-03-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make nbtree treat all index tuples as having a heap TID attribute. Index searches can distinguish duplicates by heap TID, since heap TID is always guaranteed to be unique. This general approach has numerous benefits for performance, and is prerequisite to teaching VACUUM to perform "retail index tuple deletion". Naively adding a new attribute to every pivot tuple has unacceptable overhead (it bloats internal pages), so suffix truncation of pivot tuples is added. This will usually truncate away the "extra" heap TID attribute from pivot tuples during a leaf page split, and may also truncate away additional user attributes. This can increase fan-out, especially in a multi-column index. Truncation can only occur at the attribute granularity, which isn't particularly effective, but works well enough for now. A future patch may add support for truncating "within" text attributes by generating truncated key values using new opclass infrastructure. Only new indexes (BTREE_VERSION 4 indexes) will have insertions that treat heap TID as a tiebreaker attribute, or will have pivot tuples undergo suffix truncation during a leaf page split (on-disk compatibility with versions 2 and 3 is preserved). Upgrades to version 4 cannot be performed on-the-fly, unlike upgrades from version 2 to version 3. contrib/amcheck continues to work with version 2 and 3 indexes, while also enforcing stricter invariants when verifying version 4 indexes. These stricter invariants are the same invariants described by "3.1.12 Sequencing" from the Lehman and Yao paper. A later patch will enhance the logic used by nbtree to pick a split point. This patch is likely to negatively impact performance without smarter choices around the precise point to split leaf pages at. Making these two mostly-distinct sets of enhancements into distinct commits seems like it might clarify their design, even though neither commit is particularly useful on its own. The maximum allowed size of new tuples is reduced by an amount equal to the space required to store an extra MAXALIGN()'d TID in a new high key during leaf page splits. The user-facing definition of the "1/3 of a page" restriction is already imprecise, and so does not need to be revised. However, there should be a compatibility note in the v12 release notes. Author: Peter Geoghegan Reviewed-By: Heikki Linnakangas, Alexander Korotkov Discussion: https://postgr.es/m/CAH2-WzkVb0Kom=R+88fDFb=JSxZMFvbHVC6Mn9LJ2n=X=kS-Uw@mail.gmail.com
* Make FSM test portable.Amit Kapila2019-02-041-18/+5
| | | | | | | | | | | | In b0eaa4c51b, we allow FSM to be created only after 4 pages. One of the tests check the FSM contents and to do that it populates many tuples in the relation. The FSM contents depend on the availability of freespace in the page and it could vary because of the alignment of tuples. This commit removes the dependency on FSM contents. Author: Amit Kapila Discussion: https://postgr.es/m/CAA4eK1KADF6K1bagr0--mGv3dMcZ%3DH_Z-Qtvdfbp5PjaC6PJJA%40mail.gmail.com
* Avoid creation of the free space map for small heap relations, take 2.Amit Kapila2019-02-041-38/+39
| | | | | | | | | | | | | | | | | | | | | | | | Previously, all heaps had FSMs. For very small tables, this means that the FSM took up more space than the heap did. This is wasteful, so now we refrain from creating the FSM for heaps with 4 pages or fewer. If the last known target block has insufficient space, we still try to insert into some other page before giving up and extending the relation, since doing otherwise leads to table bloat. Testing showed that trying every page penalized performance slightly, so we compromise and try every other page. This way, we visit at most two pages. Any pages with wasted free space become visible at next relation extension, so we still control table bloat. As a bonus, directly attempting one or two pages can even be faster than consulting the FSM would have been. Once the FSM is created for a heap we don't remove it even if somebody deletes all the rows from the corresponding relation. We don't think it is a useful optimization as it is quite likely that relation will again grow to the same size. Author: John Naylor, Amit Kapila Reviewed-by: Amit Kapila Tested-by: Mithun C Y Discussion: https://www.postgresql.org/message-id/CAJVSVGWvB13PzpbLEecFuGFc5V2fsO736BsdTakPiPAcdMM5tQ@mail.gmail.com
* Revert "Avoid creation of the free space map for small heap relations."Amit Kapila2019-01-281-39/+38
| | | | This reverts commit ac88d2962a96a9c7e83d5acfc28fe49a72812086.
* Avoid creation of the free space map for small heap relations.Amit Kapila2019-01-281-38/+39
| | | | | | | | | | | | | | | | | | | | | | | | Previously, all heaps had FSMs. For very small tables, this means that the FSM took up more space than the heap did. This is wasteful, so now we refrain from creating the FSM for heaps with 4 pages or fewer. If the last known target block has insufficient space, we still try to insert into some other page before giving up and extending the relation, since doing otherwise leads to table bloat. Testing showed that trying every page penalized performance slightly, so we compromise and try every other page. This way, we visit at most two pages. Any pages with wasted free space become visible at next relation extension, so we still control table bloat. As a bonus, directly attempting one or two pages can even be faster than consulting the FSM would have been. Once the FSM is created for a heap we don't remove it even if somebody deletes all the rows from the corresponding relation. We don't think it is a useful optimization as it is quite likely that relation will again grow to the same size. Author: John Naylor with design inputs and some code contribution by Amit Kapila Reviewed-by: Amit Kapila Tested-by: Mithun C Y Discussion: https://www.postgresql.org/message-id/CAJVSVGWvB13PzpbLEecFuGFc5V2fsO736BsdTakPiPAcdMM5tQ@mail.gmail.com
* pgstatindex, pageinspect: handle partitioned indexesAlvaro Herrera2018-05-091-1/+5
| | | | | | | | | | | | | | Commit 8b08f7d4820f failed to update these modules to at least give non-broken error messages for partitioned indexes. Add appropriate error support to them. Peter G. was complaining about a problem of unfriendly error messages; while we haven't fixed that yet, subsequent discussion let to discovery of these unhandled cases. Author: Michaël Paquier Reported-by: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-WzkOKptQiE51Bh4_xeEHhaBwHkZkGtKizrFMgEkfUuRRQg@mail.gmail.com
* Skip full index scan during cleanup of B-tree indexes when possibleTeodor Sigaev2018-04-041-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vacuum of index consists from two stages: multiple (zero of more) ambulkdelete calls and one amvacuumcleanup call. When workload on particular table is append-only, then autovacuum isn't intended to touch this table. However, user may run vacuum manually in order to fill visibility map and get benefits of index-only scans. Then ambulkdelete wouldn't be called for indexes of such table (because no heap tuples were deleted), only amvacuumcleanup would be called In this case, amvacuumcleanup would perform full index scan for two objectives: put recyclable pages into free space map and update index statistics. This patch allows btvacuumclanup to skip full index scan when two conditions are satisfied: no pages are going to be put into free space map and index statistics isn't stalled. In order to check first condition, we store oldest btpo_xact in the meta-page. When it's precedes RecentGlobalXmin, then there are some recyclable pages. In order to check second condition we store number of heap tuples observed during previous full index scan by cleanup. If fraction of newly inserted tuples is less than vacuum_cleanup_index_scale_factor, then statistics isn't considered to be stalled. vacuum_cleanup_index_scale_factor can be defined as both reloption and GUC (default). This patch bumps B-tree meta-page version. Upgrade of meta-page is performed "on the fly": during VACUUM meta-page is rewritten with new version. No special handling in pg_upgrade is required. Author: Masahiko Sawada, Alexander Korotkov Review by: Peter Geoghegan, Kyotaro Horiguchi, Alexander Korotkov, Yura Sokolov Discussion: https://www.postgresql.org/message-id/flat/CAD21AoAX+d2oD_nrd9O2YkpzHaFr=uQeGr9s1rKC3O4ENc568g@mail.gmail.com
* Fix new test case to not be endian-dependent.Tom Lane2018-01-041-3/+3
| | | | | | Per buildfarm. Discussion: https://postgr.es/m/ec295792-a69f-350f-6287-25a20e8f31d5@gmail.com
* Fix incorrect computations of length of null bitmap in pageinspect.Tom Lane2018-01-041-0/+17
| | | | | | | | | | | | | | | Instead of using our standard macro for this calculation, this code did it itself ... and got it wrong, leading to incorrect display of the null bitmap in some cases. Noted and fixed by Maksim Milyutin. In passing, remove a uselessly duplicative error check. Errors were introduced in commit d6061f83a; back-patch to 9.6 where that came in. Maksim Milyutin, reviewed by Andrey Borodin Discussion: https://postgr.es/m/ec295792-a69f-350f-6287-25a20e8f31d5@gmail.com
* hash: Increase the number of possible overflow bitmaps by 8x.Robert Haas2017-08-041-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Per a report from AP, it's not that hard to exhaust the supply of bitmap pages if you create a table with a hash index and then insert a few billion rows - and then you start getting errors when you try to insert additional rows. In the particular case reported by AP, there's another fix that we can make to improve recycling of overflow pages, which is another way to avoid the error, but there may be other cases where this problem happens and that fix won't help. So let's buy ourselves as much headroom as we can without rearchitecting anything. The comments claim that the old limit was 64GB, but it was really only 32GB, because we didn't use all the bits in the page for bitmap bits - only the largest power of 2 that could fit after deducting space for the page header and so forth. Thus, we have 4kB per page for bitmap bits, not 8kB. The new limit is thus actually 8 times the old *real* limit but only 4 times the old *purported* limit. Since this breaks on-disk compatibility, bump HASH_VERSION. We've already done this earlier in this release cycle, so this doesn't cause any incremental inconvenience for people using pg_upgrade from releases prior to v10. However, users who use pg_upgrade to reach 10beta3 or later from 10beta2 or earlier will need to REINDEX any hash indexes again. Amit Kapila and Robert Haas Discussion: http://postgr.es/m/20170704105728.mwb72jebfmok2nm2@zip.com.au
* pageinspect: Add bt_page_items function with bytea argumentPeter Eisentraut2017-04-041-0/+13
| | | | | Author: Tomas Vondra <tomas.vondra@2ndquadrant.com> Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
* Expand hash indexes more gradually.Robert Haas2017-04-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Since hash indexes typically have very few overflow pages, adding a new splitpoint essentially doubles the on-disk size of the index, which can lead to large and abrupt increases in disk usage (and perhaps long delays on occasion). To mitigate this problem to some degree, divide larger splitpoints into four equal phases. This means that, for example, instead of growing from 4GB to 8GB all at once, a hash index will now grow from 4GB to 5GB to 6GB to 7GB to 8GB, which is perhaps still not as smooth as we'd like but certainly an improvement. This changes the on-disk format of the metapage, so bump HASH_VERSION from 2 to 3. This will force a REINDEX of all existing hash indexes, but that's probably a good idea anyway. First, hash indexes from pre-10 versions of PostgreSQL could easily be corrupted, and we don't want to confuse corruption carried over from an older release with any corruption caused despite the new write-ahead logging in v10. Second, it will let us remove some backward-compatibility code added by commit 293e24e507838733aba4748b514536af2d39d7f2. Mithun Cy, reviewed by Amit Kapila, Jesper Pedersen and me. Regression test outputs updated by me. Discussion: http://postgr.es/m/CAD__OuhG6F1gQLCgMQNnMNgoCvOLQZz9zKYJQNYvYmmJoM42gA@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoYty0jCf-pa+m+vYUJ716+AxM7nv_syvyanyf5O-L_i2A@mail.gmail.com
* pageinspect: Add page_checksum functionPeter Eisentraut2017-03-171-0/+6
| | | | | Author: Tomas Vondra <tomas.vondra@2ndquadrant.com> Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
* pageinspect: Add test for page_header functionPeter Eisentraut2017-03-171-0/+6
|
* hash: Add write-ahead logging support.Robert Haas2017-03-141-1/+0
| | | | | | | | | | | | | | | | | | The warning about hash indexes not being write-ahead logged and their use being discouraged has been removed. "snapshot too old" is now supported for tables with hash indexes. Most importantly, barring bugs, hash indexes will now be crash-safe and usable on standbys. This commit doesn't yet add WAL consistency checking for hash indexes, as we now have for other index types; a separate patch has been submitted to cure that lack. Amit Kapila, reviewed and slightly modified by me. The larger patch series of which this is a part has been reviewed and tested by Álvaro Herrera, Ashutosh Sharma, Mark Kirkwood, Jeff Janes, and Jesper Pedersen. Discussion: http://postgr.es/m/CAA4eK1JOBX=YU33631Qh-XivYXtPSALh514+jR8XeD7v+K3r_Q@mail.gmail.com
* Add relkind checks to certain contrib modulesStephen Frost2017-03-091-0/+9
| | | | | | | | | | | | | | | | | | The contrib extensions pageinspect, pg_visibility and pgstattuple only work against regular relations which have storage. They don't work against foreign tables, partitioned (parent) tables, views, et al. Add checks to the user-callable functions to return a useful error message to the user if they mistakenly pass an invalid relation to a function which doesn't accept that kind of relation. In passing, improve some of the existing checks to use ereport() instead of elog(), add a function to consolidate common checks where appropriate, and add some regression tests. Author: Amit Langote, with various changes by me Reviewed by: Michael Paquier and Corey Huinker Discussion: https://postgr.es/m/ab91fd9d-4751-ee77-c87b-4dd704c1e59c@lab.ntt.co.jp
* pageinspect: Fix hash_bitmap_info not to read the underlying page.Robert Haas2017-02-091-12/+6
| | | | | | | | | | | | It did that to verify that the page was an overflow page rather than anything else, but that means that checking the status of all the overflow bits requires reading the entire index. So don't do that. The new code validates that the page is not a primary bucket page or bitmap page by looking at the metapage, so that using this on large numbers of pages can be reasonably efficient. Ashutosh Sharma, per a complaint from me, and with further modifications by me.
* Cache hash index's metapage in rel->rd_amcache.Robert Haas2017-02-071-4/+4
| | | | | | | | | | | | | | | | | | | | | | | This avoids a very significant amount of buffer manager traffic and contention when scanning hash indexes, because it's no longer necessary to lock and pin the metapage for every scan. We do need some way of figuring out when the cache is too stale to use any more, so that when we lock the primary bucket page to which the cached metapage points us, we can tell whether a split has occurred since we cached the metapage data. To do that, we use the hash_prevblkno field in the primary bucket page, which would otherwise always be set to InvalidBuffer. This patch contains code so that it will continue working (although less efficiently) with hash indexes built before this change, but perhaps we should consider bumping the hash version and ripping out the compatibility code. That decision can be made later, though. Mithun Cy, reviewed by Jesper Pedersen, Amit Kapila, and by me. Before committing, I made a number of cosmetic changes to the last posted version of the patch, adjusted _hash_getcachedmetap to be more careful about order of operation, and made some necessary updates to the pageinspect documentation and regression tests.
* pageinspect: Remove platform-dependent values from hash tests.Robert Haas2017-02-031-17/+36
| | | | | | | Per a report from Tom Lane, the ffactor reported by hash_metapage_info and the free_size reported by hash_page_stats vary by platform. Ashutosh Sharma and Robert Haas
* pageinspect: Support hash indexes.Robert Haas2017-02-021-0/+150
| | | | | | | | | Patch by Jesper Pedersen and Ashutosh Sharma, with some error handling improvements by me. Tests from Peter Eisentraut. Reviewed by Álvaro Herrera, Michael Paquier, Jesper Pedersen, Jeff Janes, Peter Eisentraut, Amit Kapila, Mithun Cy, and me. Discussion: http://postgr.es/m/e2ac6c58-b93f-9dd9-f4e6-d6d30add7fdf@redhat.com
* Fix gin_leafpage_items().Tom Lane2016-11-041-2/+9
| | | | | | | | | | On closer inspection, commit 84ad68d64 broke gin_leafpage_items(), because the aligned copy of the page got palloc'd in a short-lived context whereas it needs to be in the SRF's multi_call_memory_ctx. This was not exposed by the regression test, because the regression test doesn't actually exercise the function in a meaningful way. Fix the code bug, and extend the test in what I hope is a portable fashion.
* pageinspect: Make page test more portablePeter Eisentraut2016-11-021-3/+3
| | | | Choose test data that makes the output independent of endianness.
* pageinspect: Make btree test more portablePeter Eisentraut2016-11-011-3/+3
| | | | | Choose test data that makes the output independent of endianness and alignment.
* pageinspect: Add testsPeter Eisentraut2016-11-014-0/+199