summaryrefslogtreecommitdiff
path: root/doc/build
diff options
context:
space:
mode:
authorMike Bayer <mike_mp@zzzcomputing.com>2020-04-17 10:55:08 -0400
committerMike Bayer <mike_mp@zzzcomputing.com>2020-06-27 21:30:37 -0400
commit08c46eea924d23a234bf3feea1a928eb8ae8a00a (patch)
tree3795e1d04fa0e35c1e93080320b43c8fe0ed792e /doc/build
parent2d9387354f11da322c516412eb5dfe937163c90b (diff)
downloadsqlalchemy-08c46eea924d23a234bf3feea1a928eb8ae8a00a.tar.gz
ORM executemany returning
Build on #5401 to allow the ORM to take advanage of executemany INSERT + RETURNING. Implemented the feature updated tests to support INSERT DEFAULT VALUES, needed to come up with a new syntax for compiler INSERT INTO table (anycol) VALUES (DEFAULT) which can then be iterated out for executemany. Added graceful degrade to plain executemany for PostgreSQL <= 8.2 Renamed EXECUTEMANY_DEFAULT to EXECUTEMANY_PLAIN Fix issue where unicode identifiers or parameter names wouldn't work with execute_values() under Py2K, because we have to encode the statement and therefore have to encode the insert_single_values_expr too. Correct issue from #5401 to support executemany + return_defaults for a PK that is explicitly pre-generated, meaning we aren't actually getting RETURNING but need to return it from compiled_parameters. Fixes: #5263 Change-Id: Id68e5c158c4f9ebc33b61c06a448907921c2a657
Diffstat (limited to 'doc/build')
-rw-r--r--doc/build/changelog/migration_14.rst74
-rw-r--r--doc/build/changelog/unreleased_14/5263.rst18
2 files changed, 91 insertions, 1 deletions
diff --git a/doc/build/changelog/migration_14.rst b/doc/build/changelog/migration_14.rst
index 1ee52b86a..dd1c3db59 100644
--- a/doc/build/changelog/migration_14.rst
+++ b/doc/build/changelog/migration_14.rst
@@ -594,6 +594,75 @@ as was present previously.
:ticket:`4826`
+.. _change_5263:
+
+ORM Batch inserts with psycopg2 now batch statements with RETURNING in most cases
+---------------------------------------------------------------------------------
+
+The change in :ref:`change_5401` adds support for "executemany" + "RETURNING"
+at the same time in Core, which is now enabled for the psycopg2 dialect
+by default using the psycopg2 ``execute_values()`` extension. The ORM flush
+process now makes use of this feature such that the retrieval of newly generated
+primary key values and server defaults can be achieved while not losing the
+performance benefits of being able to batch INSERT statements together. Additionally,
+psycopg2's ``execute_values()`` extension itself provides a five-fold performance
+improvement over psycopg2's default "executemany" implementation, by rewriting
+an INSERT statement to include many "VALUES" expressions all in one statement
+rather than invoking the same statement repeatedly, as psycopg2 lacks the ability
+to PREPARE the statement ahead of time as would normally be expected for this
+approach to be performant.
+
+SQLAlchemy includes a :ref:`performance suite <examples_performance>` within
+its examples, where we can compare the times generated for the "batch_inserts"
+runner against 1.3 and 1.4, revealing a 3x-5x speedup for most flavors
+of batch insert::
+
+ # 1.3
+ $ python -m examples.performance bulk_inserts --dburl postgresql://scott:tiger@localhost/test
+ test_flush_no_pk : (100000 iterations); total time 14.051527 sec
+ test_bulk_save_return_pks : (100000 iterations); total time 15.002470 sec
+ test_flush_pk_given : (100000 iterations); total time 7.863680 sec
+ test_bulk_save : (100000 iterations); total time 6.780378 sec
+ test_bulk_insert_mappings : (100000 iterations); total time 5.363070 sec
+ test_core_insert : (100000 iterations); total time 5.362647 sec
+
+ # 1.4 with enhancement
+ $ python -m examples.performance bulk_inserts --dburl postgresql://scott:tiger@localhost/test
+ test_flush_no_pk : (100000 iterations); total time 3.820807 sec
+ test_bulk_save_return_pks : (100000 iterations); total time 3.176378 sec
+ test_flush_pk_given : (100000 iterations); total time 4.037789 sec
+ test_bulk_save : (100000 iterations); total time 2.604446 sec
+ test_bulk_insert_mappings : (100000 iterations); total time 1.204897 sec
+ test_core_insert : (100000 iterations); total time 0.958976 sec
+
+Note that the ``execute_values()`` extension modifies the INSERT statement in the psycopg2
+layer, **after** it's been logged by SQLAlchemy. So with SQL logging, one will see the
+parameter sets batched together, but the joining of multiple "values" will not be visible
+on the application side::
+
+ 2020-06-27 19:08:18,166 INFO sqlalchemy.engine.Engine INSERT INTO a (data) VALUES (%(data)s) RETURNING a.id
+ 2020-06-27 19:08:18,166 INFO sqlalchemy.engine.Engine [generated in 0.00698s] ({'data': 'data 1'}, {'data': 'data 2'}, {'data': 'data 3'}, {'data': 'data 4'}, {'data': 'data 5'}, {'data': 'data 6'}, {'data': 'data 7'}, {'data': 'data 8'} ... displaying 10 of 4999 total bound parameter sets ... {'data': 'data 4998'}, {'data': 'data 4999'})
+ 2020-06-27 19:08:18,254 INFO sqlalchemy.engine.Engine COMMIT
+
+The ultimate INSERT statement can be seen by enabling statement logging on the PostgreSQL side::
+
+ 2020-06-27 19:08:18.169 EDT [26960] LOG: statement: INSERT INTO a (data)
+ VALUES ('data 1'),('data 2'),('data 3'),('data 4'),('data 5'),('data 6'),('data
+ 7'),('data 8'),('data 9'),('data 10'),('data 11'),('data 12'),
+ ... ('data 999'),('data 1000') RETURNING a.id
+
+ 2020-06-27 19:08:18.175 EDT
+ [26960] LOG: statement: INSERT INTO a (data) VALUES ('data 1001'),('data
+ 1002'),('data 1003'),('data 1004'),('data 1005 '),('data 1006'),('data
+ 1007'),('data 1008'),('data 1009'),('data 1010'),('data 1011'), ...
+
+The feature batches rows into groups of 1000 by default which can be affected
+using the ``executemany_values_page_size`` argument documented at
+:ref:`psycopg2_executemany_mode`.
+
+:ticket:`5263`
+
+
.. _change_orm_update_returning_14:
ORM Bulk Update and Delete use RETURNING for "fetch" strategy when available
@@ -1591,7 +1660,10 @@ psycopg2 dialect features "execute_values" with RETURNING for INSERT statements
The first half of a significant performance enhancement for PostgreSQL when
using both Core and ORM, the psycopg2 dialect now uses
``psycopg2.extras.execute_values()`` by default for compiled INSERT statements
-and also implements RETURNING support in this mode.
+and also implements RETURNING support in this mode. The other half of this
+change is :ref:`change_5263` which allows the ORM to take advantage of
+RETURNING with executemany (i.e. batching of INSERT statements) so that ORM
+bulk inserts with psycopg2 are up to 400% faster depending on specifics.
This extension method allows many rows to be INSERTed within a single
statement, using an extended VALUES clause for the statement. While
diff --git a/doc/build/changelog/unreleased_14/5263.rst b/doc/build/changelog/unreleased_14/5263.rst
new file mode 100644
index 000000000..4b8af30b8
--- /dev/null
+++ b/doc/build/changelog/unreleased_14/5263.rst
@@ -0,0 +1,18 @@
+.. change::
+ :tags: orm, performance, postgresql
+ :tickets: 5263
+
+ Implemented support for the psycopg2 ``execute_values()`` extension
+ within the ORM flush process via the enhancements to Core made
+ in :ticket:`5401`, so that this extension is used
+ both as a strategy to batch INSERT statements together as well as
+ that RETURNING may now be used among multiple parameter sets to
+ retrieve primary key values back in batch. This allows nearly
+ all INSERT statements emitted by the ORM on behalf of PostgreSQL
+ to be submitted in batch and also via the ``execute_values()``
+ extension which benches at five times faster than plain
+ executemany() for this particular backend.
+
+ .. seealso::
+
+ :ref:`change_5263`