summaryrefslogtreecommitdiff
path: root/src/backend/replication/logical
Commit message (Collapse)AuthorAgeFilesLines
* Fix invalid memory access during the shutdown of the parallel apply worker.Amit Kapila2023-05-092-17/+37
| | | | | | | | | | | | | | | | | | | | | | | The callback function pa_shutdown() accesses MyLogicalRepWorker which may not be initialized if there is an error during the initialization of the parallel apply worker. The other problem is that by the time it is invoked even after the initialization of the worker, the MyLogicalRepWorker will be reset by another callback logicalrep_worker_onexit. So, it won't have the required information. To fix this, register the shutdown callback after we are attached to the worker slot. After this fix, we observed another issue which is that sometimes the leader apply worker tries to receive the message from the error queue that might already be detached by the parallel apply worker leading to an error. To prevent such an error, we ensure that the leader apply worker detaches from the parallel apply worker's error queue before stopping it. Reported-by: Sawada Masahiko Author: Hou Zhijie Reviewed-by: Sawada Masahiko, Amit Kapila Discussion: https://postgr.es/m/CAD21AoDo+yUwNq6nTrvE2h9bB2vZfcag=jxWc7QxuWCmkDAqcA@mail.gmail.com
* Fix assertion failure in apply worker.Amit Kapila2023-05-033-1/+15
| | | | | | | | | | | | | | During exit, the logical replication apply worker tries to release session level locks, if any. However, if the apply worker exits due to an error before its connection is initialized, trying to release locks can lead to assertion failure. The locks will be acquired once the worker is initialized, so we don't need to release them till the worker initialization is complete. Reported-by: Alexander Lakhin Author: Hou Zhijie based on inputs from Sawada Masahiko and Amit Kapila Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/2185d65f-5aae-3efa-c48f-fb42b173ef5c@gmail.com
* Fix typos in commentsMichael Paquier2023-05-021-2/+2
| | | | | | | | | The changes done in this commit impact comments with no direct user-visible changes, with fixes for incorrect function, variable or structure names. Author: Alexander Lakhin Discussion: https://postgr.es/m/e8c38840-596a-83d6-bd8d-cebc51111572@gmail.com
* Use elog to report unexpected action in handle_streamed_transaction().Masahiko Sawada2023-04-241-1/+1
| | | | | | | | An oversight in commit 216a784829c. Author: Masahiko Sawada Reviewed-by: Kyotaro Horiguchi, Amit Kapila Discussion: https://postgr.es/m/CAD21AoDDbM8_HJt-nMCvcjTK8K9hPzXWqJj7pyaUvR4mm_NrSg@mail.gmail.com
* Restart the apply worker if the 'password_required' option is changed.Amit Kapila2023-04-201-0/+1
| | | | | | | | | | | | The apply worker is restarted if any subscription option that affects the remote connection was changed. In commit c3afe8cf5a, we added the option 'password_required' which can affect the remote connection, so we should restart the worker if it was changed. Author: Amit Kapila Reviewed-by: Robert Haas Discussion: https://postgr.es/m/CAA4eK1+z9UDFEynXLsWeMMuUZc1iQkRwj2HNDtxUHTPo-u1F4A@mail.gmail.com Discussion: https://postgr.es/m/9DFC88D3-1300-4DE8-ACBC-4CEF84399A53@enterprisedb.com
* Fix various typos and incorrect/outdated name referencesDavid Rowley2023-04-191-1/+1
| | | | | Author: Alexander Lakhin Discussion: https://postgr.es/m/699beab4-a6ca-92c9-f152-f559caf6dc25@gmail.com
* Improve error messages introduced in be87200efd9 and 0fdab27ad68Andres Freund2023-04-122-2/+2
| | | | | | Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20230411.120301.93333867350615278.horikyota.ntt@gmail.com Discussion: https://postgr.es/m/20230412174244.6njadz4uoiez3l74@awork3.anarazel.de
* Allow logical decoding on standbysAndres Freund2023-04-082-17/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unsurprisingly, this requires wal_level = logical to be set on the primary and standby. The infrastructure added in 26669757b6a ensures that slots are invalidated if the primary's wal_level is lowered. Creating a slot on a standby waits for a xl_running_xact record to be processed. If the primary is idle (and thus not emitting xl_running_xact records), that can take a while. To make that faster, this commit also introduces the pg_log_standby_snapshot() function. By executing it on the primary, completion of slot creation on the standby can be accelerated. Note that logical decoding on a standby does not itself enforce that required catalog rows are not removed. The user has to use physical replication slots + hot_standby_feedback or other measures to prevent that. If catalog rows required for a slot are removed, the slot is invalidated. See 6af1793954e for an overall design of logical decoding on a standby. Bumps catversion, for the addition of the pg_log_standby_snapshot() function. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> (in an older version) Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: FabrÌzio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-By: Robert Haas <robertmhaas@gmail.com>
* Support invalidating replication slots due to horizon and wal_levelAndres Freund2023-04-071-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Needed for logical decoding on a standby. Slots need to be invalidated because of the horizon if rows required for logical decoding are removed. If the primary's wal_level is lowered from 'logical', logical slots on the standby need to be invalidated. The new invalidation methods will be used in a subsequent commit. Logical slots that have been invalidated can be identified via the new pg_replication_slots.conflicting column. See 6af1793954e for an overall design of logical decoding on a standby. Bumps catversion for the addition of the new pg_replication_slots column. Author: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Amit Khandekar <amitdkhan.pg@gmail.com> (in an older version) Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de
* Prevent use of invalidated logical slot in CreateDecodingContext()Andres Freund2023-04-072-13/+16
| | | | | | | | | | | | | | Previously we had checks for this in multiple places. Support for logical decoding on standbys will add other forms of invalidation, making it worth while to centralize the checks. This slightly changes the error message for both the walsender and SQL interface. Particularly the SQL interface error was inaccurate, as the "This slot has never previously reserved WAL" portion was unreachable. Reviewed-by: "Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/20230407075009.igg7be27ha2htkbt@awork3.anarazel.de
* Add a run_as_owner option to subscriptions.Robert Haas2023-04-041-10/+36
| | | | | | | | | | | | | | | | | | This option is normally false, but can be set to true to obtain the legacy behavior where the subscription runs with the permissions of the subscription owner rather than the permissions of the table owner. The advantages of this mode are (1) it doesn't require that the subscription owner have permission to SET ROLE to each table owner and (2) since no role switching occurs, the SECURITY_RESTRICTED_OPERATION restrictions do not apply. On the downside, it allows any table owner to easily usurp the privileges of the subscription owner - basically, to take over their account. Because that's generally quite undesirable, we don't make this mode the default, but we do make it available, just in case the new behavior causes too many problems for someone. Discussion: http://postgr.es/m/CA+TgmoZ-WEeG6Z14AfH7KhmpX2eFh+tZ0z+vf0=eMDdbda269g@mail.gmail.com
* Perform logical replication actions as the table owner.Robert Haas2023-04-041-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up until now, logical replication actions have been performed as the subscription owner, who will generally be a superuser. Commit cec57b1a0fbcd3833086ba686897c5883e0a2afc documented hazards associated with that situation, namely, that any user who owns a table on the subscriber side could assume the privileges of the subscription owner by attaching a trigger, expression index, or some other kind of executable code to it. As a remedy, it suggested not creating configurations where users who are not fully trusted own tables on the subscriber. Although that will work, it basically precludes using logical replication in the way that people typically want to use it, namely, to replicate a database from one node to another without necessarily having any restrictions on which database users can own tables. So, instead, change logical replication to execute INSERT, UPDATE, DELETE, and TRUNCATE operations as the table owner when they are replicated. Since this involves switching the active user frequently within a session that is authenticated as the subscription user, also impose SECURITY_RESTRICTED_OPERATION restrictions on logical replication code. As an exception, if the table owner can SET ROLE to the subscription owner, these restrictions have no security value, so don't impose them in that case. Subscription owners are now required to have the ability to SET ROLE to every role that owns a table that the subscription is replicating. If they don't, replication will fail. Superusers, who normally own subscriptions, satisfy this property by default. Non-superusers users who own subscriptions will need to be granted the roles that own relevant tables. Patch by me, reviewed (but not necessarily in its entirety) by Jelte Fennema, Jeff Davis, and Noah Misch. Discussion: http://postgr.es/m/CA+TgmoaSCkg9ww9oppPqqs+9RVqCexYCE6Aq=UsYPfnOoDeFkw@mail.gmail.com
* Fix possible logical replication crash.Robert Haas2023-04-031-1/+3
| | | | | | | | | | | Commit c3afe8cf5a1e465bd71e48e4bc717f5bfdc7a7d6 added a new password_required option but forgot that you need database access to check whether an arbitrary role ID is a superuser. Report and patch by Hou Zhijie. I added a comment. Thanks to Alexander Lakhin for devising a way to reproduce the crash. Discussion: http://postgr.es/m/OS0PR01MB5716BFD7EC44284C89F40808948F9@OS0PR01MB5716.jpnprd01.prod.outlook.com
* Add new predefined role pg_create_subscription.Robert Haas2023-03-302-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This role can be granted to non-superusers to allow them to issue CREATE SUBSCRIPTION. The non-superuser must additionally have CREATE permissions on the database in which the subscription is to be created. Most forms of ALTER SUBSCRIPTION, including ALTER SUBSCRIPTION .. SKIP, now require only that the role performing the operation own the subscription, or inherit the privileges of the owner. However, to use ALTER SUBSCRIPTION ... RENAME or ALTER SUBSCRIPTION ... OWNER TO, you also need CREATE permission on the database. This is similar to what we do for schemas. To change the owner of a schema, you must also have permission to SET ROLE to the new owner, similar to what we do for other object types. Non-superusers are required to specify a password for authentication and the remote side must use the password, similar to what is required for postgres_fdw and dblink. A superuser who wants a non-superuser to own a subscription that does not rely on password authentication may set the new password_required=false property on that subscription. A non-superuser may not set password_required=false and may not modify a subscription that already has password_required=false. This new password_required subscription property works much like the eponymous postgres_fdw property. In both cases, the actual semantics are that a password is not required if either (1) the property is set to false or (2) the relevant user is the superuser. Patch by me, reviewed by Andres Freund, Jeff Davis, Mark Dilger, and Stephen Frost (but some of those people did not fully endorse all of the decisions that the patch makes). Discussion: http://postgr.es/m/CA+TgmoaDH=0Xj7OBiQnsHTKcF2c4L+=gzPBUKSJLh8zed2_+Dg@mail.gmail.com
* Allow logical replication to copy tables in binary format.Amit Kapila2023-03-231-1/+16
| | | | | | | | | | | | | | | This patch allows copying tables in the binary format during table synchronization when the binary option for a subscription is enabled. Previously, tables are copied in text format even if the subscription is created with the binary option enabled. Copying tables in binary format may reduce the time spent depending on column types. A binary copy for initial table synchronization is supported only when both publisher and subscriber are v16 or later. Author: Melih Mutlu Reviewed-by: Peter Smith, Shi yu, Euler Taveira, Vignesh C, Kuroda Hayato, Osumi Takamichi, Bharath Rupireddy, Hou Zhijie Discussion: https://postgr.es/m/CAGPVpCQvAziCLknEnygY0v1-KBtg%2BOm-9JHJYZOnNPKFJPompw%40mail.gmail.com
* Add macros for ReorderBufferTXN toptxn.Amit Kapila2023-03-171-28/+18
| | | | | | | | | | Currently, there are quite a few places in reorderbuffer.c that tries to access top-transaction for a subtransaction. This makes the code to access top-transaction consistent and easier to follow. Author: Peter Smith Reviewed-by: Vignesh C, Sawada Masahiko Discussion: https://postgr.es/m/CAHut+PuCznOyTqBQwjRUu-ibG-=KHyCv-0FTcWQtZUdR88umfg@mail.gmail.com
* Allow the use of indexes other than PK and REPLICA IDENTITY on the subscriber.Amit Kapila2023-03-152-35/+233
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Using REPLICA IDENTITY FULL on the publisher can lead to a full table scan per tuple change on the subscription when REPLICA IDENTITY or PK index is not available. This makes REPLICA IDENTITY FULL impractical to use apart from some small number of use cases. This patch allows using indexes other than PRIMARY KEY or REPLICA IDENTITY on the subscriber during apply of update/delete. The index that can be used must be a btree index, not a partial index, and it must have at least one column reference (i.e. cannot consist of only expressions). We can uplift these restrictions in the future. There is no smart mechanism to pick the index. If there is more than one index that satisfies these requirements, we just pick the first one. We discussed using some of the optimizer's low-level APIs for this but ruled it out as that can be a maintenance burden in the long run. This patch improves the performance in the vast majority of cases and the improvement is proportional to the amount of data in the table. However, there could be some regression in a small number of cases where the indexes have a lot of duplicate and dead rows. It was discussed that those are mostly impractical cases but we can provide a table or subscription level option to disable this feature if required. Author: Onder Kalaci, Amit Kapila Reviewed-by: Peter Smith, Shi yu, Hou Zhijie, Vignesh C, Kuroda Hayato, Amit Kapila Discussion: https://postgr.es/m/CACawEhVLqmAAyPXdHEPv1ssU2c=dqOniiGz7G73HfyS7+nGV4w@mail.gmail.com
* Fill EState.es_rteperminfos more systematically.Tom Lane2023-03-061-2/+4
| | | | | | | | | | | | | | | | | | While testing a fix for bug #17823, I discovered that EvalPlanQualStart failed to copy es_rteperminfos from the parent EState, resulting in failure if anything in EPQ execution wanted to consult that information. This led me to conclude that commit a61b1f748 had been too haphazard about where to fill es_rteperminfos, and that we need to be sure that that happens exactly where es_range_table gets filled. So I changed the signature of ExecInitRangeTable to help ensure that this new requirement doesn't get missed. (Indeed, pgoutput.c was also failing to fill it. Maybe we don't ever need it there, but I wouldn't bet on that.) No test case yet; one will arrive with the fix for #17823. But that needs to be back-patched, while this fix is HEAD-only. Discussion: https://postgr.es/m/17823-b64909cf7d63de84@postgresql.org
* Deduplicate handling of binary and text modes in logicalrep_read_tuple().Amit Kapila2023-03-061-12/+6
| | | | | | Author: Bharath Rupireddy Reviewed-by: Peter Smith Discussion: https://postgr.es/m/CALj2ACXdbq7kW_+bRrSGMsR6nefCvwbHBJ5J51mr3gFf7QysTA@mail.gmail.com
* Remove bms_first_member().Tom Lane2023-03-021-1/+2
| | | | | | | | | | | | | | This function has been semi-deprecated ever since we invented bms_next_member(). Its habit of scribbling on the input bitmapset isn't great, plus for sufficiently large bitmapsets it would take O(N^2) time to complete a loop. Now we have the additional problem that reducing the input to empty while leaving it still accessible would violate a planned invariant. So let's just get rid of it, after updating the few extant callers to use bms_next_member(). Patch by me; thanks to Nathan Bossart and Richard Guo for review. Discussion: https://postgr.es/m/1159933.1677621588@sss.pgh.pa.us
* Fix snapshot handling in logicalmsg_decodeTomas Vondra2023-02-222-2/+22
| | | | | | | | | | | | | | | | | Whe decoding a transactional logical message, logicalmsg_decode called SnapBuildGetOrBuildSnapshot. But we may not have a consistent snapshot yet at that point. We don't actually need the snapshot in this case (during replay we'll have the snapshot from the transaction), so in practice this is harmless. But in assert-enabled build this crashes. Fixed by requesting the snapshot only in non-transactional case, where we are guaranteed to have SNAPBUILD_CONSISTENT. Backpatch to 11. The issue exists since 9.6. Backpatch-through: 11 Reviewed-by: Andres Freund Discussion: https://postgr.es/m/84d60912-6eab-9b84-5de3-41765a5449e8@enterprisedb.com
* Add a new wait state and use it when sending data in the apply worker.Amit Kapila2023-02-161-1/+2
| | | | | | | | | | | | | d9d7fe68d3 made use of an existing wait event when sending data from the apply worker, but we should have invented a new wait event since this is a new place to wait. This patch corrects the mistake by using a new wait event "LogicalApplySendData". Author: Hou Zhijie Reviewed-by: Peter Smith Discussion: https://postgr.es/m/CA+TgmobWzbr9H3yN3dLVckviEZKemPwd+XyCFKEgyZQZhgP66Q@mail.gmail.com
* Fix various typos in code and testsMichael Paquier2023-02-091-1/+1
| | | | | | | | Most of these are recent, and the documentation portions are new as of v16 so there is no need for a backpatch. Author: Justin Pryzby Discussion: https://postgr.es/m/20230208155644.GM1653@telsasoft.com
* Fix the logical replication timeout during large DDLs.Amit Kapila2023-02-082-0/+70
| | | | | | | | | | | | | | | | | | | The DDLs like Refresh Materialized views that generate lots of temporary data due to rewrite rules may not be processed by output plugins (for example pgoutput). So, we won't send keep-alive messages for a long time while processing such commands and that can lead the subscriber side to timeout. We have previously fixed a similar case for large transactions in commit f95d53eded where the output plugin filters all or most of the changes but missed to handle the DDLs. We decided not to backpatch this as this adds a new callback in the existing exposed structure and moreover, users can increase the wal_sender_timeout and wal_receiver_timeout to avoid this problem. Author: Wang wei, Hou Zhijie Reviewed-by: Peter Smith, Ashutosh Bapat, Shi yu, Amit Kapila Discussion: https://postgr.es/m/OS3PR01MB6275478E5D29E4A563302D3D9E2B9@OS3PR01MB6275.jpnprd01.prod.outlook.com Discussion: https://postgr.es/m/CAA5-nLARN7-3SLU_QUxfy510pmrYK6JJb=bk3hcgemAM_pAv+w@mail.gmail.com
* Use appropriate wait event when sending data in the apply worker.Amit Kapila2023-02-071-2/+1
| | | | | | | | | | | | | | | Currently, we reuse WAIT_EVENT_LOGICAL_PARALLEL_APPLY_STATE_CHANGE in the apply worker while sending data to the parallel apply worker via a shared memory queue. This is not appropriate as one won't be able to distinguish whether the worker is waiting for sending data or for the state change. To patch instead uses the wait event WAIT_EVENT_MQ_SEND which has been already used in blocking mode while sending data via a shared memory queue. Author: Hou Zhijie Reviewed-by: Kuroda Hayato, Amit Kapila Discussion: https://postgr.es/m/OS0PR01MB57161C680B22E4C591628EE994DA9@OS0PR01MB5716.jpnprd01.prod.outlook.com
* Remove useless casts to (void *) in hash_search() callsPeter Eisentraut2023-02-062-29/+11
| | | | | | | | | | | Some of these appear to be leftovers from when hash_search() took a char * argument (changed in 5999e78fc45dcb91784b64b6e9ae43f4e4f68ca2). Since after this there is some more horizontal space available, do some light reformatting where suitable. Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/fd9adf5d-b1aa-e82f-e4c7-263c30145807%40enterprisedb.com
* Optimize the origin drop functionality.Amit Kapila2023-02-031-29/+33
| | | | | | | | | | | | | | To interlock against concurrent drops, we use to hold ExclusiveLock on pg_replication_origin till xact commit. This blocks even concurrent drops of different origins by tablesync workers. So, instead, lock the specific origin to interlock against concurrent drops. This reduces the test time variability in src/test/subscription where multiple tables are being synced. Author: Vignesh C Reviewed-by: Hou Zhijie, Amit Kapila Discussion: https://postgr.es/m/1412708.1674417574@sss.pgh.pa.us
* Allow the logical_replication_mode to be used on the subscriber.Amit Kapila2023-02-021-5/+11
| | | | | | | | | | | | | | | | | Extend the existing developer option 'logical_replication_mode' to help test the parallel apply of large transactions on the subscriber. When set to 'buffered', the leader sends changes to parallel apply workers via a shared memory queue. When set to 'immediate', the leader serializes all changes to files and notifies the parallel apply workers to read and apply them at the end of the transaction. This helps in adding tests to cover the serialization code path in parallel streaming mode. Author: Hou Zhijie Reviewed-by: Peter Smith, Kuroda Hayato, Sawada Masahiko, Amit Kapila Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
* Rename GUC logical_decoding_mode to logical_replication_mode.Amit Kapila2023-01-301-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | Rename the developer option 'logical_decoding_mode' to the more flexible name 'logical_replication_mode' because doing so will make it easier to extend this option in the future to help test other areas of logical replication. Currently, it is used on the publisher side to allow streaming or serializing each change in logical decoding. In the upcoming patch, we are planning to use it on the subscriber. On the subscriber, it will allow serializing the changes to file and notifies the parallel apply workers to read and apply them at the end of the transaction. We discussed exposing this parameter as a subscription option but it did not seem advisable since it is primarily used for testing/debugging and there is no other such parameter. We also discussed having separate GUCs for publisher and subscriber but for current testing/debugging requirements, one GUC is sufficient. Author: Hou Zhijie Reviewed-by: Peter Smith, Kuroda Hayato, Sawada Masahiko, Amit Kapila Discussion: https://postgr.es/m/CAD21AoAy2c=Mx=FTCs+EwUsf2kQL5MmU3N18X84k0EmCXntK4g@mail.gmail.com Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
* Avoid type cheats for invalid dsa_handles and dshash_table_handles.Tom Lane2023-01-251-4/+4
| | | | | | | | | | | | | | Invent separate macros for "invalid" values of these types, so that we needn't embed knowledge of their representations into calling code. These are all zeroes anyway ATM, so this is not fixing any live bug, but it makes the code cleaner and more future-proof. I (tgl) also chose to move DSM_HANDLE_INVALID into dsm_impl.h, since it seems like it should live beside the typedef for dsm_handle. Hou Zhijie, Nathan Bossart, Kyotaro Horiguchi, Tom Lane Discussion: https://postgr.es/m/OS0PR01MB5716860B1454C34E5B179B6694C99@OS0PR01MB5716.jpnprd01.prod.outlook.com
* Fix the Drop Database hang.Amit Kapila2023-01-241-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The drop database command waits for the logical replication sync worker to accept ProcSignalBarrier and the worker's slot creation waits for the drop database to finish which leads to a deadlock. This happens because the tablesync worker holds interrupts while creating a slot. We prevent cancel/die interrupts while creating a slot in the table sync worker because it is possible that before the server finishes this command, a concurrent drop subscription happens which would complete without removing this slot and that leads to the slot existing until the end of walsender. However, the slot will eventually get dropped at the walsender exit time, so there is no danger of the dangling slot. This patch reallows cancel/die interrupts while creating a slot and modifies the test to wait for slots to become zero to prevent finding an ephemeral slot. The reported hang doesn't happen in PG14 as the drop database starts to wait for ProcSignalBarrier with PG15 (commits 4eb2176318 and e2f65f4255) but it is good to backpatch this till PG14 as it is not a good idea to prevent interrupts during a network call that could block indefinitely. Reported-by: Lakshmi Narayanan Sreethar Diagnosed-by: Andres Freund Author: Hou Zhijie Reviewed-by: Vignesh C, Amit Kapila Backpatch-through: 14, where it was introduced in commit 6b67d72b60 Discussion: https://postgr.es/m/CA+kvmZELXQ4ZD3U=XCXuG3KvFgkuPoN1QrEj8c-rMRodrLOnsg@mail.gmail.com
* Track logrep apply workers' last start times to avoid useless waits.Tom Lane2023-01-223-49/+211
| | | | | | | | | | | | | | Enforce wal_retrieve_retry_interval on a per-subscription basis, rather than globally, and arrange to skip that delay in case of an intentional worker exit. This probably makes little difference in the field, where apply workers wouldn't be restarted often; but it has a significant impact on the runtime of our logical replication regression tests (even though those tests use artificially-small wal_retrieve_retry_interval settings already). Nathan Bossart, with mostly-cosmetic editorialization by me Discussion: https://postgr.es/m/20221122004119.GA132961@nathanxps13
* Display the leader apply worker's PID for parallel apply workers.Amit Kapila2023-01-182-20/+50
| | | | | | | | | | | | | | | | | Add leader_pid to pg_stat_subscription. leader_pid is the process ID of the leader apply worker if this process is a parallel apply worker. If this field is NULL, it indicates that the process is a leader apply worker or a synchronization worker. The new column makes it easier to distinguish parallel apply workers from other kinds of workers and helps to identify the leader for the parallel workers corresponding to a particular subscription. Additionally, update the leader_pid column in pg_stat_activity as well to display the PID of the leader apply worker for parallel apply workers. Author: Hou Zhijie Reviewed-by: Peter Smith, Sawada Masahiko, Amit Kapila, Shveta Mallik Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
* Improve the code to decide and process the apply action.Amit Kapila2023-01-171-26/+47
| | | | | | | | | | | | | | | | | | | | | The code that decides the apply action missed to handle non-transactional messages and we didn't catch it in our testing as currently such messages are simply ignored by the apply worker. This was introduced by changes in commit 216a784829. While testing this, I noticed that we forgot to reset stream_xid after processing the stream stop message which could also result in the wrong apply action after the fix for non-transactional messages. In passing, change assert to elog for unexpected apply action in some of the routines so as to catch the problems in the production environment, if any. Reported-by: Tomas Vondra Author: Amit Kapila Reviewed-by: Tomas Vondra, Sawada Masahiko, Hou Zhijie Discussion: https://postgr.es/m/984ff689-adde-9977-affe-cd6029e850be@enterprisedb.com Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
* Add BufFileRead variants with short read and EOF detectionPeter Eisentraut2023-01-161-28/+4
| | | | | | | | | | | | | | | Most callers of BufFileRead() want to check whether they read the full specified length. Checking this at every call site is very tedious. This patch provides additional variants BufFileReadExact() and BufFileReadMaybeEOF() that include the length checks. I considered changing BufFileRead() itself, but this function is also used in extensions, and so changing the behavior like this would create a lot of problems there. The new names are analogous to the existing LogicalTapeReadExact(). Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/f3501945-c591-8cc3-5ef0-b72a2e0eaa9c@enterprisedb.com
* Fix some BufFileRead() error reportingPeter Eisentraut2023-01-161-14/+19
| | | | | | | | | | | | Remove "%m" from error messages where errno would be bogus. Add short read byte counts where appropriate. This is equivalent to what was done in 7897e3bb902c557412645b82120f4d95f7474906, but some code was apparently developed concurrently to that and not updated accordingly. Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/f3501945-c591-8cc3-5ef0-b72a2e0eaa9c@enterprisedb.com
* Avoid creating parallel apply state hash table unless required.Amit Kapila2023-01-131-4/+4
| | | | | | | | | | This hash table is used to cache the state of streaming transactions being applied by the parallel apply workers. So, this should be created only when we are successful in launching at least one worker. This avoids rare case memory leak when we are never able to launch any worker. Author: Ted Yu Discussion: https://postgr.es/m/CALte62wg0rBR3Vj2beV=HiWo2qG9L0hzKcX=yULNER0wmf4aEw@mail.gmail.com
* Acquire spinlock when updating 2PC slot data during logical decoding creationMichael Paquier2023-01-121-0/+2
| | | | | | | | | | | | | | The creation of a logical decoding context in CreateDecodingContext() updates some data of its slot for two-phase transactions if enabled by the caller, but the code forgot to acquire a spinlock when updating these fields like any other code paths. This could lead to the read of inconsistent data. Oversight in a8fd13c. Author: Sawada Masahiko Discussion: https://postgr.es/m/CAD21AoAD8_fp47191LKuecjDd3DYhoQ4TaucFco1_TEr_jQ-Zw@mail.gmail.com Backpatch-through: 15
* Fix the file mode of worker.c changed by the commit 216a784829.Amit Kapila2023-01-091-0/+0
| | | | | Reported-by: Japin Li Discussion: https://postgr.es/m/MEYP282MB166970D1559B7CC74D3E339BB6FE9@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
* Perform apply of large transactions by parallel workers.Amit Kapila2023-01-0910-291/+3013
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, for large transactions, the publisher sends the data in multiple streams (changes divided into chunks depending upon logical_decoding_work_mem), and then on the subscriber-side, the apply worker writes the changes into temporary files and once it receives the commit, it reads from those files and applies the entire transaction. To improve the performance of such transactions, we can instead allow them to be applied via parallel workers. In this approach, we assign a new parallel apply worker (if available) as soon as the xact's first stream is received and the leader apply worker will send changes to this new worker via shared memory. The parallel apply worker will directly apply the change instead of writing it to temporary files. However, if the leader apply worker times out while attempting to send a message to the parallel apply worker, it will switch to "partial serialize" mode - in this mode, the leader serializes all remaining changes to a file and notifies the parallel apply workers to read and apply them at the end of the transaction. We use a non-blocking way to send the messages from the leader apply worker to the parallel apply to avoid deadlocks. We keep this parallel apply assigned till the transaction commit is received and also wait for the worker to finish at commit. This preserves commit ordering and avoid writing to and reading from files in most cases. We still need to spill if there is no worker available. This patch also extends the SUBSCRIPTION 'streaming' parameter so that the user can control whether to apply the streaming transaction in a parallel apply worker or spill the change to disk. The user can set the streaming parameter to 'on/off', or 'parallel'. The parameter value 'parallel' means the streaming will be applied via a parallel apply worker, if available. The parameter value 'on' means the streaming transaction will be spilled to disk. The default value is 'off' (same as current behaviour). In addition, the patch extends the logical replication STREAM_ABORT message so that abort_lsn and abort_time can also be sent which can be used to update the replication origin in parallel apply worker when the streaming transaction is aborted. Because this message extension is needed to support parallel streaming, parallel streaming is not supported for publications on servers < PG16. Author: Hou Zhijie, Wang wei, Amit Kapila with design inputs from Sawada Masahiko Reviewed-by: Sawada Masahiko, Peter Smith, Dilip Kumar, Shi yu, Kuroda Hayato, Shveta Mallik Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
* Remove the streaming files for incomplete xacts after restart.Amit Kapila2023-01-071-0/+4
| | | | | | | | | | | | | | After restart, we try to stream the changes for large transactions that were not sent before server crash and restart. However, we forget to send the abort message for such transactions. This leads to spurious streaming files on the subscriber which won't be cleaned till the apply worker or the subscriber server restarts. Reported-by: Dilip Kumar Author: Hou Zhijie Reviewed-by: Dilip Kumar and Amit Kapila Backpatch-through: 14 Discussion: https://postgr.es/m/OS0PR01MB5716A773F46768A1B75BE24394FB9@OS0PR01MB5716.jpnprd01.prod.outlook.com
* Wake up a subscription's replication worker processes after DDL.Tom Lane2023-01-061-0/+52
| | | | | | | | | | | | | | | | Waken related worker processes immediately at commit of a transaction that has performed ALTER SUBSCRIPTION (including the RENAME and OWNER variants). This reduces the response time for such operations. In the real world that might not be worth much, but it shaves several seconds off the runtime for the subscription test suite. In the case of PREPARE, we just throw away this notification state; it doesn't seem worth the work to preserve it. The workers will still react after the eventual COMMIT PREPARED, but not as quickly. Nathan Bossart Discussion: https://postgr.es/m/20221122004119.GA132961@nathanxps13
* Check for two_phase change at end of process_syncing_tables_for_apply.Tom Lane2023-01-061-22/+29
| | | | | | | | | | | | | | Previously this function checked to see if we were ready to switch to two_phase mode at its start, but that's silly: we should check at the end, after we've done the work that might make us ready. This simple change removes one sleep cycle from the time needed to switch to two_phase mode. In the real world that might not be worth much, but it shaves a few seconds off the runtime for the subscription test suite. Nathan Bossart Discussion: https://postgr.es/m/20221122004119.GA132961@nathanxps13
* Fix calculation of which GENERATED columns need to be updated.Tom Lane2023-01-051-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were identifying the updatable generated columns of inheritance children by transposing the calculation made for their parent. However, there's nothing that says a traditional-inheritance child can't have generated columns that aren't there in its parent, or that have different dependencies than are in the parent's expression. (At present it seems that we don't enforce that for partitioning either, which is likely wrong to some degree or other; but the case clearly needs to be handled with traditional inheritance.) Hence, drop the very-klugy-anyway "extraUpdatedCols" RTE field in favor of identifying which generated columns depend on updated columns during executor startup. In HEAD we can remove extraUpdatedCols altogether; in back branches, it's still there but always empty. Another difference between the HEAD and back-branch versions of this patch is that in HEAD we can add the new bitmap field to ResultRelInfo, but that would cause an ABI break in back branches. Like 4b3e37993, add a List field at the end of struct EState instead. Back-patch to v13. The bogus calculation is also being made in v12, but it doesn't have the same visible effect because we don't use it to decide which generated columns to recalculate; as a consequence of which the patch doesn't apply easily. I think that there might still be a demonstrable bug associated with trigger firing conditions, but that's such a weird corner-case usage that I'm content to leave it unfixed in v12. Amit Langote and Tom Lane Discussion: https://postgr.es/m/CA+HiwqFshLKNvQUd1DgwJ-7tsTp=dwv7KZqXC4j2wYBV1aCDUA@mail.gmail.com Discussion: https://postgr.es/m/2793383.1672944799@sss.pgh.pa.us
* Update copyright for 2023Bruce Momjian2023-01-0213-13/+13
| | | | Backpatch-through: 11
* Add 'logical_decoding_mode' GUC.Amit Kapila2022-12-261-10/+23
| | | | | | | | | | | | | | | | This enables streaming or serializing changes immediately in logical decoding. This parameter is intended to be used to test logical decoding and replication of large transactions for which otherwise we need to generate the changes till logical_decoding_work_mem is reached. This helps in reducing the timing of existing tests related to logical replication of in-progress transactions and will help in writing tests for for the upcoming feature for parallelly applying large in-progress transactions. Author: Shi yu Reviewed-by: Sawada Masahiko, Shveta Mallik, Amit Kapila, Dilip Kumar, Kuroda Hayato, Kyotaro Horiguchi Discussion: https://postgr.es/m/OSZPR01MB63104E7449DBE41932DB19F1FD1B9@OSZPR01MB6310.jpnprd01.prod.outlook.com
* Add copyright notices to meson filesAndrew Dunstan2022-12-201-0/+2
| | | | Discussion: https://postgr.es/m/222b43a5-2fb3-2c1b-9cd0-375d376c8246@dunslane.net
* Avoid unnecessary streaming of transactions during logical replication.Amit Kapila2022-12-081-13/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After restart, we don't perform streaming of an in-progress transaction if it was previously decoded and confirmed by the client. To achieve that we were comparing the END location of the WAL record being decoded with the WAL location we have already decoded and confirmed by the client. While decoding the commit record, to decide whether to process and send the complete transaction, we compare its START location with the WAL location we have already decoded and confirmed by the client. Now, if we need to queue some change in the transaction while decoding the commit record (e.g. snapshot), it is possible that we decide to stream the transaction but later commit processing decides to skip it. In such a case, we would needlessly send the changes and later when we decide to skip it, we will send stream abort. We also sometimes decide to stream the changes when we actually just need to process them locally like a change for invalidations. This will lead us to send empty streams. To avoid this, while queuing each change for decoding, we remember whether the transaction has any change that actually needs to be sent downstream and use that information later to decide whether to stream the transaction or not. Note, we can't avoid all cases where we have to send empty streams like the case where the plugin later decides that the change is not publishable. However, we will no longer need to send stream_abort when we skip sending a particular transaction. Author: Dilip Kumar Reviewed-by: Hou Zhijie, Ashutosh Bapat, Shi yu, Amit Kapila Discussion: https://postgr.es/m/CAFiTN-tHK=7LzfrPs8fbT2ksrOJGQbzywcgXst2bM9-rJJAAUg@mail.gmail.com
* Rework query relation permission checkingAlvaro Herrera2022-12-061-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, information about the permissions to be checked on relations mentioned in a query is stored in their range table entries. So the executor must scan the entire range table looking for relations that need to have permissions checked. This can make the permission checking part of the executor initialization needlessly expensive when many inheritance children are present in the range range. While the permissions need not be checked on the individual child relations, the executor still must visit every range table entry to filter them out. This commit moves the permission checking information out of the range table entries into a new plan node called RTEPermissionInfo. Every top-level (inheritance "root") RTE_RELATION entry in the range table gets one and a list of those is maintained alongside the range table. This new list is initialized by the parser when initializing the range table. The rewriter can add more entries to it as rules/views are expanded. Finally, the planner combines the lists of the individual subqueries into one flat list that is passed to the executor for checking. To make it quick to find the RTEPermissionInfo entry belonging to a given relation, RangeTblEntry gets a new Index field 'perminfoindex' that stores the corresponding RTEPermissionInfo's index in the query's list of the latter. ExecutorCheckPerms_hook has gained another List * argument; the signature is now: typedef bool (*ExecutorCheckPerms_hook_type) (List *rangeTable, List *rtePermInfos, bool ereport_on_violation); The first argument is no longer used by any in-core uses of the hook, but we leave it in place because there may be other implementations that do. Implementations should likely scan the rtePermInfos list to determine which operations to allow or deny. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqGjJDmUhDSfv-U2qhKJjt9ST7Xh9JXC_irsAQ1TAUsJYg@mail.gmail.com
* Generalize ri_RootToPartitionMap to use for non-partition childrenAlvaro Herrera2022-12-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | ri_RootToPartitionMap is currently only initialized for tuple routing target partitions, though a future commit will need the ability to use it even for the non-partition child tables, so make adjustments to the decouple it from the partitioning code. Also, make it lazily initialized via ExecGetRootToChildMap(), making that function its preferred access path. Existing third-party code accessing it directly should no longer do so; consequently, it's been renamed to ri_RootToChildMap, which also makes it consistent with ri_ChildToRootMap. ExecGetRootToChildMap() houses the logic of setting the map appropriately depending on whether a given child relation is partition or not. To support this, also add a separate entry point for TupleConversionMap creation that receives an AttrMap. No new code here, just split an existing function in two. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqEYUhDXSK5BTvG_xk=eaAEJCD4GS3C6uH7ybBvv+Z_Tmg@mail.gmail.com