diff options
author | Mike Bayer <mike_mp@zzzcomputing.com> | 2020-06-28 11:59:34 -0400 |
---|---|---|
committer | Mike Bayer <mike_mp@zzzcomputing.com> | 2020-08-05 22:19:46 -0400 |
commit | 30885744142d89740d459f4dae670ba4775d1d8c (patch) | |
tree | 0fdd30c0c96778029a48414195f95c5b1fdba30c | |
parent | c7b489b25802f7a25ef78d0731411295c611cc1c (diff) | |
download | sqlalchemy-30885744142d89740d459f4dae670ba4775d1d8c.tar.gz |
Documentation updates for 1.4
* major additions to 1.4 migration doc; removed additional
verbosity regarding caching methodology and reorganized the
doc to present itself more as a "what's changed" guide
* as we now have a path for asyncio, update that doc so that
we aren't spreading obsolete information
* updates to the 2.0 migration guide with latest info, however
this is still an architecture doc and not a migration guide
yet, will need further rework.
* start really talking about 1.x vs. 2.0 style everywhere. Querying
is most of the docs so this is going to be a prominent
theme, start getting it to fit in
* Add introductory documentation for ORM example sections as these
are too sparse
* new documentation for do_orm_execute(), many separate sections,
adding deprecation notes to before_compile() and similar
* new example suites to illustrate do_orm_execute(),
with_loader_criteria()
* modernized horizontal sharding examples and added a separate
example to distinguish between multiple databases and single
database w/ multiple tables use case
* introducing DEEP ALCHEMY, will use zzzeeksphinx 1.1.6
* no name for the alchemist yet however the dragon's name
is Flambé
Change-Id: Id6b5c03b1ce9ddb7b280f66792212a0ef0a1c541
35 files changed, 2240 insertions, 950 deletions
diff --git a/doc/build/changelog/migration_14.rst b/doc/build/changelog/migration_14.rst index 08ff190b8..4207e4e81 100644 --- a/doc/build/changelog/migration_14.rst +++ b/doc/build/changelog/migration_14.rst @@ -23,6 +23,118 @@ What's New in SQLAlchemy 1.4? Behavioral Changes - General ============================ +.. _change_5159: + +ORM Query is internally unified with select, update, delete; 2.0 style execution available +------------------------------------------------------------------------------------------ + +The biggest conceptual change to SQLAlchemy for version 2.0 and essentially +in 1.4 as well is that the great separation between the :class:`_sql.Select` +construct in Core and the :class:`_orm.Query` object in the ORM has been removed, +as well as between the :meth:`_orm.Query.update` and :meth:`_orm.Query.delete` +methods in how they relate to :class:`_dml.Update` and :class:`_dml.Delete`. + +With regards to :class:`_sql.Select` and :class:`_orm.Query`, these two objects +have for many versions had similar, largely overlapping APIs and even some +ability to change between one and the other, while remaining very different in +their usage patterns and behaviors. The historical background for this was +that the :class:`_orm.Query` object was introduced to overcome shortcomings in +the :class:`_sql.Select` object which used to be at the core of how ORM objects +were queried, except that they had to be queried in terms of +:class:`_schema.Table` metadata only. However :class:`_orm.Query` had only a +simplistic interface for loading objects, and only over the course of many +major releases did it eventually gain most of the flexibility of the +:class:`_sql.Select` object, which then led to the ongoing awkwardness that +these two objects became highly similar yet still largely incompatible with +each other. + +In version 1.4, all Core and ORM SELECT statements are rendered from a +:class:`_sql.Select` object directly; when the :class:`_orm.Query` object +is used, at statement invocation time it copies its state to a :class:`_sql.Select` +which is then invoked internally using :term:`2.0 style` execution. Going forward, +the :class:`_orm.Query` object will become legacy only, and applications will +be encouraged to move to :term:`2.0 style` execution which allows Core constructs +to be used freely against ORM entities:: + + with Session(engine, future=True) as sess: + + stmt = select(User).where( + User.name == 'sandy' + ).join(User.addresses).where(Address.email_address.like("%gmail%")) + + result = sess.execute(stmt) + + for user in result.scalars(): + print(user) + +Things to note about the above example: + +* The :class:`_orm.Session` and :class:`_orm.sessionmaker` objects now feature + full context manager (i.e. the ``with:`` statement) capability; + see the revised documentation at :ref:`session_getting` for an example. + +* Within the 1.4 series, all :term:`2.0 style` ORM invocation uses a + :class:`_orm.Session` that includes the :paramref:`_orm.Session.future` + flag set to ``True``; this flag indicates the :class:`_orm.Session` should + have 2.0-style behaviors, which include that ORM queries can be invoked + from :class:`_orm.Session.execute` as well as some changes in transactional + features. In version 2.0 this flag will always be ``True``. + +* The :func:`_sql.select` construct no longer needs brackets around the + columns clause; see :ref:`change_5284` for background on this improvement. + +* The :func:`_sql.select` / :class:`_sql.Select` object has a :meth:`_sql.Select.join` + method that acts like that of the :class:`_orm.Query` and even accommodates + an ORM relationship attribute (without breaking the separation between + Core and ORM!) - see :ref:`change_select_join` for background on this. + +* Statements that work with ORM entities and are expected to return ORM + results are invoked using :meth:`.orm.Session.execute`. See + :ref:`session_querying_20` for a primer. + +* a :class:`_engine.Result` object is returned, rather than a plain list, which + itself is a much more sophisticated version of the previous ``ResultProxy`` + object; this object is now used both for Core and ORM results. See + :ref:`change_result_14_core`, + :ref:`change_4710_core`, and :ref:`change_4710_orm` for information on this. + +Throughout SQLAlchemy's documentation, there will be many references to +:term:`1.x style` and :term:`2.0 style` execution. This is to distinguish +between the two querying styles and to attempt to forwards-document the new +calling style going forward. In SQLAlchemy 2.0, while the :class:`_orm.Query` +object may remain as a legacy construct, it will no longer be featured in +most documentation. + +Similar adjustments have been made to "bulk updates and deletes" such that +Core :func:`_sql.update` and :func:`_sql.delete` can be used for bulk +operations. A bulk update like the following:: + + session.query(User).filter(User.name == 'sandy').update({"password": "foobar"}, synchronize_session="fetch") + +can now be achieved in :term:`2.0 style` (and indeed the above runs internally +in this way) as follows:: + + with Session(engine, future=True) as sess: + stmt = update(User).where( + User.name == 'sandy' + ).values(password="foobar").execution_options( + synchronize_session="fetch" + ) + + sess.execute(stmt) + +Note the use of the :meth:`_sql.Executable.execution_options` method to pass +ORM-related options. The use of "execution options" is now much more prevalent +within both Core and ORM, and many ORM-related methods from :class:`_orm.Query` +are now implemented as execution options (see :meth:`_orm.Query.execution_options` +for some examples). + +.. seealso:: + + :ref:`migration_20_toplevel` + +:ticket:`5159` + .. _change_4639: Transparent SQL Compilation Caching added to All DQL, DML Statements in Core, ORM @@ -34,7 +146,7 @@ systems from the base of Core all the way through ORM now allows the majority of Python computation involved producing SQL strings and related statement metadata from a user-constructed statement to be cached in memory, such that subsequent invocations of an identical statement construct will use -35-60% fewer resources. +35-60% fewer CPU resources. This caching goes beyond the construction of the SQL string to also include the construction of result fetching structures that link the SQL construct to the @@ -66,23 +178,17 @@ In 1.4, the code above without modification completes:: This first test indicates that regular ORM queries when using caching can run over many iterations in the range of **30% faster**. -"Baked Query" style construction now available for all Core and ORM Queries -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The "Baked Query" extension has been in SQLAlchemy for several years and -provides a caching system that is based on defining segments of SQL statements -within Python functions, so that the functions both serve as cache keys -(since they uniquely and persistently identify a specific line in the -source code) as well as that they allow the construction of a statement -to be deferred so that it only need to be invoked once, rather than every -time the query is rendered. The functionality of "Baked Query" is now a native -part of the new caching system, which is available by simply using Python -functions, typically lambda expressions, either inside of a statement, -or on the outside using the ``lambda_stmt()`` function that works just -like a Baked Query. +A second variant of the feature is the optional use of Python lambdas to defer +the construction of the query itself. This is a more sophisticated variant of +the approach used by the "Baked Query" extension, which was introduced in +version 1.0.0. The "lambda" feature may be used in a style very similar to +that of baked queries, except that it is available in an ad-hoc way for any SQL +construct. It additionally includes the ability to scan each invocation of the +lambda for bound literal values that change on every invocation, as well as +changes to other constructs, such as querying from a different entity or column +each time, while still not having to run the actual code each time. -Making use of the newer 2.0 style of using ``select()`` and adding the use -of **optional** lambdas to defer the computation:: +Using this API looks as follows:: session = Session(bind=engine) for id_ in random.sample(ids, n): @@ -96,127 +202,17 @@ The code above completes:: This test indicates that using the newer "select()" style of ORM querying, in conjunction with a full "baked" style invocation that caches the entire -construction, can run over many iterations in the range of **60% faster**. -This performance is roughly the same as what the Baked Query extension -provides as well. The new approach effectively supersedes the Baked Query -extension. +construction, can run over many iterations in the range of **60% faster** and +grants performace about the same as the baked query system which is now superseded +by the native caching system. -For comparison, a Baked Query looks like the following:: +The new system makes use of the existing +:paramref:`_engine.Connection.execution_options.compiled_cache` execution +option and also adds a cache to the :class:`_engine.Engine` directly, which is +configured using the :paramref:`_engine.Engine.query_cache_size` parameter. - bakery = baked.bakery() - s = Session(bind=engine) - for id_ in random.sample(ids, n): - q = bakery(lambda s: s.query(Customer)) - q += lambda q: q.filter(Customer.id == bindparam("id")) - q(s).params(id=id_).one() - -The new API allows the same very fast "baked query" approach of building up a -statement with lambdas, but does not require any other syntactical changes from -regular statements. It also no longer requires that "bindparam()" is used for -literal values that may change; the "closure" of the Python function is scanned -on every call to extract Python literal values that should be turned into -parameters. - -Methodology Overview -^^^^^^^^^^^^^^^^^^^^ - -SQLAlchemy has also for many years included a "compiled_cache" option that is -used internally by the ORM flush process as well as the Baked Query extension, -which caches a SQL expression object based on the identity of the object -itself. That is, if you create a particular select() object and make use of -the compiled cache feature, if you pass the same select() object each time, the -SQL compilation would be cached. This feature was of limited use since -SQLAlchemy's programming paradigm is based on the continuous construction of -new SQL expression objects each time one is required. - -The new caching feature uses the same "compiled_cache", however instead of -using the statement object itself as the cache key, a separate tuple-oriented -cache key is generated which represents the complete structure of the -statement. Two SQL constructs that are composed in exactly the same way will -produce the same cache key, independent of the bound parameter values that are -bundled with the statement; these are collected separately from each statement -and are used when the cached SQL is executed. The ORM ``Query`` integrates by -producing a ``select()`` object from itself that is interpreted as an -ORM-enabled SELECT within the SQL compilation process that occurs beyond the -cache boundary. - -A general listing of architectural changes needed to support this feature: - -* The system by which arguments passed to SQL constructs are type-checked and - coerced into their desired form was rewritten from an ad-hoc and disorganized - system into the ``sqlalchemy.sql.roles`` and - ``sqlalchemy.sql.coercions`` modules which provide a type-based approach - to the task of composing SQL expression objects, error handling, coercion - of objects such as turning SELECT statements into subqueries, as well as - integrating with a new "plugin" system that allows SQL constructs to include - ORM functionality. - -* The system by which clause expressions constructs are iterated and compared - from an object structure point of view was also - rewritten from one which was ad-hoc and inconsistent into a complete system - within the new ``sqlalchemy.sql.traversals`` module. A test suite was added - which ensures that all SQL construction objects include fully consistent - comparison and iteration behavior. This work began with :ticket:`4336`. - -* The new iteration system naturally gave rise to the cache-key creation - system, which also uses a performance-optimized version of the - ``sqlalchemy.sql.traversals`` module to generate a deterministic cache key - for any SQL expression based on its structure. Two instances of a SQL - expression that represent the same SQL structure, such as ``select(table('x', - column('q'))).where(column('z') > 5)``, are guaranteed to produce the same - cache key, independent of the bound parameters which for this statement would - be one parameter with the value "5". Two instances of a SQL expression - where any elements are different will produce different cache keys. When - the cache key is generated, the parameters are also collected which will be - used to formulate the final parameter list. This work was completed over - many merges and was overall related to :ticket:`4639`. - -* The mechanism by which statements such as ``select()`` generate expensive - collections and datamembers that are only used for SQL compilation, such - as the list of columns and their labels, were organized into a new - decoupled system called ``CompileState``. - -* All elements of queries that needed to be made compatible with the concept of - deterministic SQL compilation were updated, including an expansion of the - "postcompile" concept used to render individual parameters inside of "IN" - expressions first included in 1.3 as well as alterations to how dialects like - the SQL Server dialect render LIMIT / OFFSET expressions that are not - compatible with bound parameters. - -* The ORM ``Query`` object was fully refactored such that all of the intense - computation which would previously occur whenever methods of ``Query`` were - called, such as the construction of the ``Query`` itself, when methods - ``filter()`` or ``join()`` would be called, etc., was completely reorganized - to take place within the ``CompileState`` architecture, meaning the ORM - process that generates a Core ``select()`` to render now takes place - **within** the SQL compilation process, beyond the caching boundary. More - detail on this change is at - :ref:`change_deferred_construction`. - -* The ``Query`` object was unified with the ``select()`` object, such that - these two objects now have cross-compatible internal state. The ``Query`` - can turn itself into a ``select()`` that generates ORM queries by copying its - ``__dict__`` into a new ``Select`` object. - -* The 2.0-style :class:`.Result` object as well as the "future" version of - :class:`_engine.Engine` were developed and integrated into Core and later - the ORM also integrated on top of :class:`.Result`. - -* The Core and ORM execution models were completely reworked to integrate the - new cache key system, and in particular the ORM ``Query`` was reworked such - that its execution model now produces a ``Select`` which is passed to - ``Session.execute()``, which then invokes the 2.0-style execution model that - allows the ``Select`` to be processed as an ORM query beyond the caching - boundary. - -* Other systems such as ``Query`` bulk updates and deletes, the horizontal - sharding extension, the Baked Query extension, and the dogpile caching - example were updated to integrate with the new execution model and a new - event hook :meth:`.SessionEvents.do_orm_execute` has been added. - -* The caching has been enabled via the :paramref:`.create_engine.query_cache_size` - parameter, new logging features were added, and the "lambda" argument - construction module was added. +A significant portion of API and behavioral changes throughout 1.4 were +driven in order to support this new feature. .. seealso:: @@ -232,8 +228,8 @@ A general listing of architectural changes needed to support this feature: .. _change_deferred_construction: -Many Core and ORM statement objects now perform much of their validation in the compile phase ---------------------------------------------------------------------------------------------- +Many Core and ORM statement objects now perform much of their construction and validation in the compile phase +-------------------------------------------------------------------------------------------------------------- A major initiative in the 1.4 series is to approach the model of both Core SQL statements as well as the ORM Query to allow for an efficient, cacheable model @@ -241,12 +237,17 @@ of statement creation and compilation, where the compilation step would be cached, based on a cache key generated by the created statement object, which itself is newly created for each use. Towards this goal, much of the Python computation which occurs within the construction of statements, particularly -the ORM :class:`_query.Query`, is being moved to occur later, when the -statement is actually compiled, and additionally that it will only occur if the -compiled form of the statement is not already cached. This means that some of -the error messages which can arise based on arguments passed to the object will -no longer be raised immediately, and instead will occur only when the statement -is invoked and its compiled form is not yet cached. +that of the ORM :class:`_query.Query` as well as the :func:`_sql.select` +construct when used to invoke ORM queries, is being moved to occur within +the compilation phase of the statement which only occurs after the statement +has been invoked, and only if the statement's compiled form was not yet +cached. + +From an end-user perspective, this means that some of the error messages which +can arise based on arguments passed to the object will no longer be raised +immediately, and instead will occur only when the statement is invoked for +the first time. These conditions are always structural and not data driven, +so there is no risk of such a condition being missed due to a cached statement. Error conditions which fall under this category include: @@ -286,8 +287,8 @@ instead. :ticket:`4689` -API Changes - Core -================== +API and Behavioral Changes - Core +================================== .. _change_4617: @@ -318,93 +319,70 @@ Raising:: got <...Select object ...>. To create a FROM clause from a <class 'sqlalchemy.sql.selectable.Select'> object, use the .subquery() method. -The correct calling form is instead:: +The correct calling form is instead (noting also that :ref:`brackets are no +longer required for select() <change_5284>`):: - sq1 = select([user.c.id, user.c.name]).subquery() - stmt2 = select([addresses, sq1]).select_from(addresses.join(sq1)) + sq1 = select(user.c.id, user.c.name).subquery() + stmt2 = select(addresses, sq1).select_from(addresses.join(sq1)) Noting above that the :meth:`_expression.SelectBase.subquery` method is essentially equivalent to using the :meth:`_expression.SelectBase.alias` method. -The above calling form is typically required in any case as the call to -:meth:`_expression.SelectBase.subquery` or :meth:`_expression.SelectBase.alias` is needed to -ensure the subquery has a name. The MySQL and PostgreSQL databases do not -accept unnamed subqueries in the FROM clause and they are of limited use -on other platforms; this is described further below. - -Along with the above change, the general capability of :func:`_expression.select` and -related constructs to create unnamed subqueries, which means a FROM subquery -that renders without any name i.e. "AS somename", has been removed, and the -ability of the :func:`_expression.select` construct to implicitly create subqueries -without explicit calling code to do so is mostly deprecated. In the above -example, as has always been the case, using the :meth:`_expression.SelectBase.alias` -method as well as the new :meth:`_expression.SelectBase.subquery` method without passing a -name will generate a so-called "anonymous" name, which is the familiar -``anon_1`` name we see in SQLAlchemy queries:: - SELECT - addresses.id, addresses.email, addresses.user_id, - anon_1.id, anon_1.name - FROM - addresses JOIN - (SELECT users.id AS id, users.name AS name FROM users) AS anon_1 - ON addresses.user_id = anon_1.id - -Unnamed subqueries in the FROM clause (which note are different from -so-called "scalar subqueries" which take the place of a column expression -in the columns clause or WHERE clause) are of extremely limited use in SQL, -and their production in SQLAlchemy has mostly presented itself as an -undesirable behavior that needs to be worked around. For example, -both the MySQL and PostgreSQL outright reject the usage of unnamed subqueries:: - - # MySQL / MariaDB: - - MariaDB [(none)]> select * from (select 1); - ERROR 1248 (42000): Every derived table must have its own alias - - - # PostgreSQL: - - test=> select * from (select 1); - ERROR: subquery in FROM must have an alias - LINE 1: select * from (select 1); - ^ - HINT: For example, FROM (SELECT ...) [AS] foo. - -A database like SQLite accepts them, however it is still often the case that -the names produced from such a subquery are too ambiguous to be useful:: - - sqlite> CREATE TABLE a(id integer); - sqlite> CREATE TABLE b(id integer); - sqlite> SELECT * FROM a JOIN (SELECT * FROM b) ON a.id=id; - Error: ambiguous column name: id - sqlite> SELECT * FROM a JOIN (SELECT * FROM b) ON a.id=b.id; - Error: no such column: b.id - - # use a name - sqlite> SELECT * FROM a JOIN (SELECT * FROM b) AS anon_1 ON a.id=anon_1.id; - -Due to the above limitations, there are very few places in SQLAlchemy where -such a query form was valid; the one exception was within the Oracle dialect -where they were used to create OFFSET / LIMIT subqueries as Oracle does not -support these keywords directly; this implementation has been replaced by -one which uses anonymous subqueries. Throughout the ORM, exception cases -that detect where a SELECT statement would be SELECTed from either encourage -the user to, or implicitly create, an anonymously named subquery; it is hoped -by moving to an all-explicit subquery much of the complexity incurred by -these areas can be removed. - -As :class:`_expression.SelectBase` objects are no longer :class:`_expression.FromClause` objects, -attributes like the ``.c`` attribute as well as methods like ``.select()``, -``.join()``, and ``.outerjoin()`` upon :class:`_expression.SelectBase` are now -deprecated, as these methods all imply implicit production of a subquery. -Instead, as is already what the vast majority of applications have to do -in any case, invoking :meth:`_expression.SelectBase.alias` or :meth:`_expression.SelectBase.subquery` -will provide for a :class:`.Subquery` object that provides all these attributes, -as it is part of the :class:`_expression.FromClause` hierarchy. In the interim, these -methods are still available, however they now produce an anonymously named -subquery rather than an unnamed one, and this subquery is distinct from the -:class:`_expression.SelectBase` construct itself. +The rationale for this change is based on the following: + +* In order to support the unification of :class:`_sql.Select` with + :class:`_orm.Query`, the :class:`_sql.Select` object needs to have + :meth:`_sql.Select.join` and :meth:`_sql.Select.outerjoin` methods that + actually add JOIN criteria to the existing FROM clause, as is what users have + always expected it to do in any case. The previous behavior, having to + align with what a :class:`.FromClause` would do, was that it would generate + an unnamed subquery and then JOIN to it, which was a completely useless + feature that only confused those users unfortunate enough to try this. This + change is discussed at :ref:`change_select_join`. + +* The behavior of including a SELECT in the FROM clause of another SELECT + without first creating an alias or subquery would be that it creates an + unnamed subquery. While standard SQL does support this syntax, in practice + it is rejected by most databases. For example, both the MySQL and PostgreSQL + outright reject the usage of unnamed subqueries:: + + # MySQL / MariaDB: + + MariaDB [(none)]> select * from (select 1); + ERROR 1248 (42000): Every derived table must have its own alias + + + # PostgreSQL: + + test=> select * from (select 1); + ERROR: subquery in FROM must have an alias + LINE 1: select * from (select 1); + ^ + HINT: For example, FROM (SELECT ...) [AS] foo. + + A database like SQLite accepts them, however it is still often the case that + the names produced from such a subquery are too ambiguous to be useful:: + + sqlite> CREATE TABLE a(id integer); + sqlite> CREATE TABLE b(id integer); + sqlite> SELECT * FROM a JOIN (SELECT * FROM b) ON a.id=id; + Error: ambiguous column name: id + sqlite> SELECT * FROM a JOIN (SELECT * FROM b) ON a.id=b.id; + Error: no such column: b.id + + # use a name + sqlite> SELECT * FROM a JOIN (SELECT * FROM b) AS anon_1 ON a.id=anon_1.id; + + .. + +As :class:`_expression.SelectBase` objects are no longer +:class:`_expression.FromClause` objects, attributes like the ``.c`` attribute +as well as methods like ``.select()`` is now deprecated, as they imply implicit +production of a subquery. The ``.join()`` and ``.outerjoin()`` methods are now +:ref:`repurposed to append JOIN criteria to the existing query <change_select_join>` in a similar +way as that of :meth:`_orm.Query.join`, which is what users have always +expected these methods to do in any case. In place of the ``.c`` attribute, a new attribute :attr:`_expression.SelectBase.selected_columns` is added. This attribute resolves to a column collection that is what most @@ -427,36 +405,100 @@ present in the ``users.c`` collection:: stmt = select([users]) stmt = stmt.where(stmt.selected_columns.name == 'foo') -There is of course the notion that perhaps ``.c`` on :class:`_expression.SelectBase` could -simply act the way :attr:`_expression.SelectBase.selected_columns` does above, however in -light of the fact that ``.c`` is strongly associated with the :class:`_expression.FromClause` -hierarchy, meaning that it is a set of columns that can be directly in the -FROM clause of another SELECT, it's better that a column collection that -serves an entirely different purpose have a new name. - -In the bigger picture, the reason this change is being made now is towards the -goal of unifying the ORM :class:`_query.Query` object into the :class:`_expression.SelectBase` -hierarchy in SQLAlchemy 2.0, so that the ORM will have a "``select()``" -construct that extends directly from the existing :func:`_expression.select` object, -having the same methods and behaviors except that it will have additional ORM -functionality. All statement objects in Core will also be fully cacheable -using a new system that resembles "baked queries" except that it will work -transparently for all statements across Core and ORM. In order to achieve -this, the Core class hierarchy needs to be refined to behave in such a way that -is more easily compatible with the ORM, and the ORM class hierarchy needs to be -refined so that it is more compatible with Core. - :ticket:`4617` +.. _change_select_join: + +select().join() and outerjoin() add JOIN criteria to the current query, rather than creating a subquery +------------------------------------------------------------------------------------------------------- + +Towards the goal of unifying :class:`_orm.Query` and :class:`_sql.Select`, +particularly for :term:`2.0 style` use of :class:`_sql.Select`, it was critical +that there be a working :meth:`_sql.Select.join` method that behaves like the +:meth:`_orm.Query.join` method, adding additional entries to the FROM clause of +the existing SELECT and then returning the new :class:`_sql.Select` object for +further modification, instead of wrapping the object inside of an unnamed +subquery and returning a JOIN from that subquery, a behavior that has always +been virtually useless and completely misleading to users. + +To allow this to be the case, :ref:`change_4617` was first implemented which +splits off :class:`_sql.Select` from having to be a :class:`_sql.FromClause`; +this removed the requirement that :meth:`_sql.Select.join` would need to +return a :class:`_sql.Join` object rather than a new version of that +:class:`_sql.Select` object that includes a new JOIN in its FROM clause. + +From that point on, as the :meth:`_sql.Select.join` and :meth:`_sql.Select.outerjoin` +did have an existing behavior, the original plan was that these +methods would be deprecated, and the new "useful" version of +the methods would be available on an alternate, "future" :class:`_sql.Select` +object available as a separate import. + +However, after some time working with this particular codebase, it was decided +that having two different kinds of :class:`_sql.Select` objects floating +around, each with 95% the same behavior except for some subtle difference +in how some of the methods behave was going to be more misleading and inconvenient +than simply making a hard change in how these two methods behave, given +that the existing behavior of :meth:`_sql.Select.join` and :meth:`_sql.Select.outerjoin` +is essentially never used and only causes confusion. + +So it was decided, given how very useless the current behavior is, and how +extremely useful and important and useful the new behavior would be, to make a +**hard behavioral change** in this one area, rather than waiting another year +and having a more awkward API in the interim. SQLAlchemy developers do not +take it lightly to make a completely breaking change like this, however this is +a very special case and it is extremely unlikely that the previous +implementation of these methods was being used; as noted in +:ref:`change_4617`, major databases such as MySQL and PostgreSQL don't allow +for unnamed subqueries in any case and from a syntactical point of view it's +nearly impossible for a JOIN from an unnamed subquery to be useful since it's +very difficult to refer to the columns within it unambiguously. + +With the new implementation, :meth:`_sql.Select.join` and +:meth:`_sql.Select.outerjoin` now behave very similarly to that of +:meth:`_orm.Query.join`, adding JOIN criteria to the existing statement by +matching to the left entity:: + + stmt = select(user_table).join(addresses_table, user_table.c.id == addresses_table.c.user_id) + +producing:: + + SELECT user.id, user.name FROM user JOIN address ON user.id=address.user_id + +As is the case for :class:`_sql.Join`, the ON clause is automatically determined +if feasible:: + + stmt = select(user_table).join(addresses_table) + +When ORM entities are used in the statement, this is essentially how ORM +queries are built up using :term:`2.0 style` invocation. ORM entities will +assign a "plugin" to the statement internally such that ORM-related compilation +rules will take place when the statement is compiled into a SQL string. More +directly, the :meth:`_sql.Select.join` method can accommodate ORM +relationships, without breaking the hard separation between Core and ORM +internals:: + + stmt = select(User).join(User.addresses) + +Another new method :meth:`_sql.Select.join_from` is also added, which +allows easier specification of the left and right side of a join at once:: + + stmt = select(Address.email_address, User.name).join_from(User, Address) + +producing:: + + SELECT address.email_address, user.name FROM user JOIN address ON user.id == address.user_id + + .. _change_5284: -select() now accepts positional expressions -------------------------------------------- +select(), case() now accept positional expressions +--------------------------------------------------- -The :func:`.select` construct will now accept "columns clause" -arguments positionally:: +As it may be seen elsewhere in this document, the :func:`_sql.select` construct will +now accept "columns clause" arguments positionally, rather than requiring they +be passed as a list:: # new way, supports 2.0 stmt = select(table.c.col1, table.c.col2, ...) @@ -472,7 +514,8 @@ to function, which passes the list of columns or other expressions as a list:: stmt = select([table.c.col1, table.c.col2, ...]) The above legacy calling style also accepts the old keyword arguments that have -since been removed from most narrative documentation:: +since been removed from most narrative documentation. The existence of these +keyword arguments is why the columns clause was passed as a list in the first place:: # very much the old way, but still works in 1.4 stmt = select([table.c.col1, table.c.col2, ...], whereclause=table.c.col1 == 5) @@ -489,12 +532,28 @@ As part of this change, the :class:`.Select` construct also gains the 2.0-style "future" API which includes an updated :meth:`.Select.join` method as well as methods like :meth:`.Select.filter_by` and :meth:`.Select.join_from`. +In a related change, the :func:`_sql.case` construct has also been modified +to accept its list of WHEN clauses positionally, with a similar deprecation +track for the old calling style:: + + stmt = select(users_table).where( + case( + (users_table.c.name == 'wendy', 'W'), + (users_table.c.name == 'jack', 'J'), + else_='E' + ) + ) + +The convention for SQLAlchemy constructs accepting ``*args`` vs. a list of +values, as is the latter case for a construct like +:meth:`_sql.ColumnOperators.in_`, is that **positional arguments are used for +structural specification, lists are used for data specification**. + + .. seealso:: :ref:`error_c9ae` - :ref:`migration_20_toplevel` - :ticket:`5284` @@ -606,6 +665,134 @@ details. :ticket:`4645` +.. _change_4737: + + +Built-in FROM linting will warn for any potential cartesian products in a SELECT statement +------------------------------------------------------------------------------------------ + +As the Core expression language as well as the ORM are built on an "implicit +FROMs" model where a particular FROM clause is automatically added if any part +of the query refers to it, a common issue is the case where a SELECT statement, +either a top level statement or an embedded subquery, contains FROM elements +that are not joined to the rest of the FROM elements in the query, causing +what's referred to as a "cartesian product" in the result set, i.e. every +possible combination of rows from each FROM element not otherwise joined. In +relational databases, this is nearly always an undesirable outcome as it +produces an enormous result set full of duplicated, uncorrelated data. + +SQLAlchemy, for all of its great features, is particularly prone to this sort +of issue happening as a SELECT statement will have elements added to its FROM +clause automatically from any table seen in the other clauses. A typical +scenario looks like the following, where two tables are JOINed together, +however an additional entry in the WHERE clause that perhaps inadvertently does +not line up with these two tables will create an additional FROM entry:: + + address_alias = aliased(Address) + + q = session.query(User).\ + join(address_alias, User.addresses).\ + filter(Address.email_address == 'foo') + +The above query selects from a JOIN of ``User`` and ``address_alias``, the +latter of which is an alias of the ``Address`` entity. However, the +``Address`` entity is used within the WHERE clause directly, so the above would +result in the SQL:: + + SELECT + users.id AS users_id, users.name AS users_name, + users.fullname AS users_fullname, + users.nickname AS users_nickname + FROM addresses, users JOIN addresses AS addresses_1 ON users.id = addresses_1.user_id + WHERE addresses.email_address = :email_address_1 + +In the above SQL, we can see what SQLAlchemy developers term "the dreaded +comma", as we see "FROM addresses, users JOIN addresses" in the FROM clause +which is the classic sign of a cartesian product; where a query is making use +of JOIN in order to join FROM clauses together, however because one of them is +not joined, it uses a comma. The above query will return a full set of +rows that join the "user" and "addresses" table together on the "id / user_id" +column, and will then apply all those rows into a cartesian product against +every row in the "addresses" table directly. That is, if there are ten user +rows and 100 rows in addresses, the above query will return its expected result +rows, likely to be 100 as all address rows would be selected, multiplied by 100 +again, so that the total result size would be 10000 rows. + +The "table1, table2 JOIN table3" pattern is one that also occurs quite +frequently within the SQLAlchemy ORM due to either subtle mis-application of +ORM features particularly those related to joined eager loading or joined table +inheritance, as well as a result of SQLAlchemy ORM bugs within those same +systems. Similar issues apply to SELECT statements that use "implicit joins", +where the JOIN keyword is not used and instead each FROM element is linked with +another one via the WHERE clause. + +For some years there has been a recipe on the Wiki that applies a graph +algorithm to a :func:`_expression.select` construct at query execution time and inspects +the structure of the query for these un-linked FROM clauses, parsing through +the WHERE clause and all JOIN clauses to determine how FROM elements are linked +together and ensuring that all the FROM elements are connected in a single +graph. This recipe has now been adapted to be part of the :class:`.SQLCompiler` +itself where it now optionally emits a warning for a statement if this +condition is detected. The warning is enabled using the +:paramref:`_sa.create_engine.enable_from_linting` flag and is enabled by default. +The computational overhead of the linter is very low, and additionally it only +occurs during statement compilation which means for a cached SQL statement it +only occurs once. + +Using this feature, our ORM query above will emit a warning:: + + >>> q.all() + SAWarning: SELECT statement has a cartesian product between FROM + element(s) "addresses_1", "users" and FROM element "addresses". + Apply join condition(s) between each element to resolve. + +The linter feature accommodates not just for tables linked together through the +JOIN clauses but also through the WHERE clause Above, we can add a WHERE +clause to link the new ``Address`` entity with the previous ``address_alias`` +entity and that will remove the warning:: + + q = session.query(User).\ + join(address_alias, User.addresses).\ + filter(Address.email_address == 'foo').\ + filter(Address.id == address_alias.id) # resolve cartesian products, + # will no longer warn + +The cartesian product warning considers **any** kind of link between two +FROM clauses to be a resolution, even if the end result set is still +wasteful, as the linter is intended only to detect the common case of a +FROM clause that is completely unexpected. If the FROM clause is referred +to explicitly elsewhere and linked to the other FROMs, no warning is emitted:: + + q = session.query(User).\ + join(address_alias, User.addresses).\ + filter(Address.email_address == 'foo').\ + filter(Address.id > address_alias.id) # will generate a lot of rows, + # but no warning + +Full cartesian products are also allowed if they are explicitly stated; if we +wanted for example the cartesian product of ``User`` and ``Address``, we can +JOIN on :func:`.true` so that every row will match with every other; the +following query will return all rows and produce no warnings:: + + from sqlalchemy import true + + # intentional cartesian product + q = session.query(User).join(Address, true()) # intentional cartesian product + +The warning is only generated by default when the statement is compiled by the +:class:`_engine.Connection` for execution; calling the :meth:`_expression.ClauseElement.compile` +method will not emit a warning unless the linting flag is supplied:: + + >>> from sqlalchemy.sql import FROM_LINTING + >>> print(q.statement.compile(linting=FROM_LINTING)) + SAWarning: SELECT statement has a cartesian product between FROM element(s) "addresses" and FROM element "users". Apply join condition(s) between each element to resolve. + SELECT users.id, users.name, users.fullname, users.nickname + FROM addresses, users JOIN addresses AS addresses_1 ON users.id = addresses_1.user_id + WHERE addresses.email_address = :email_address_1 + +:ticket:`4737` + + .. _change_result_14_core: New Result object @@ -782,6 +969,289 @@ There are many reasons why the above assumptions do not hold: :ticket:`4710` +.. _change_4753: + +SELECT objects and derived FROM clauses allow for duplicate columns and column labels +------------------------------------------------------------------------------------- + +This change allows that the :func:`_expression.select` construct now allows for duplicate +column labels as well as duplicate column objects themselves, so that result +tuples are organized and ordered in the identical way in that the columns were +selected. The ORM :class:`_query.Query` already works this way, so this change +allows for greater cross-compatibility between the two, which is a key goal of +the 2.0 transition:: + + >>> from sqlalchemy import column, select + >>> c1, c2, c3, c4 = column('c1'), column('c2'), column('c3'), column('c4') + >>> stmt = select([c1, c2, c3.label('c2'), c2, c4]) + >>> print(stmt) + SELECT c1, c2, c3 AS c2, c2, c4 + +To support this change, the :class:`_expression.ColumnCollection` used by +:class:`_expression.SelectBase` as well as for derived FROM clauses such as subqueries +also support duplicate columns; this includes the new +:attr:`_expression.SelectBase.selected_columns` attribute, the deprecated ``SelectBase.c`` +attribute, as well as the :attr:`_expression.FromClause.c` attribute seen on constructs +such as :class:`.Subquery` and :class:`_expression.Alias`:: + + >>> list(stmt.selected_columns) + [ + <sqlalchemy.sql.elements.ColumnClause at 0x7fa540bcca20; c1>, + <sqlalchemy.sql.elements.ColumnClause at 0x7fa540bcc9e8; c2>, + <sqlalchemy.sql.elements.Label object at 0x7fa540b3e2e8>, + <sqlalchemy.sql.elements.ColumnClause at 0x7fa540bcc9e8; c2>, + <sqlalchemy.sql.elements.ColumnClause at 0x7fa540897048; c4> + ] + + >>> print(stmt.subquery().select()) + SELECT anon_1.c1, anon_1.c2, anon_1.c2, anon_1.c2, anon_1.c4 + FROM (SELECT c1, c2, c3 AS c2, c2, c4) AS anon_1 + +:class:`_expression.ColumnCollection` also allows access by integer index to support +when the string "key" is ambiguous:: + + >>> stmt.selected_columns[2] + <sqlalchemy.sql.elements.Label object at 0x7fa540b3e2e8> + +To suit the use of :class:`_expression.ColumnCollection` in objects such as +:class:`_schema.Table` and :class:`.PrimaryKeyConstraint`, the old "deduplicating" +behavior which is more critical for these objects is preserved in a new class +:class:`.DedupeColumnCollection`. + +The change includes that the familiar warning ``"Column %r on table %r being +replaced by %r, which has the same key. Consider use_labels for select() +statements."`` is **removed**; the :meth:`_expression.Select.apply_labels` is still +available and is still used by the ORM for all SELECT operations, however it +does not imply deduplication of column objects, although it does imply +deduplication of implicitly generated labels:: + + >>> from sqlalchemy import table + >>> user = table('user', column('id'), column('name')) + >>> stmt = select([user.c.id, user.c.name, user.c.id]).apply_labels() + >>> print(stmt) + SELECT "user".id AS user_id, "user".name AS user_name, "user".id AS id_1 + FROM "user" + +Finally, the change makes it easier to create UNION and other +:class:`_selectable.CompoundSelect` objects, by ensuring that the number and position +of columns in a SELECT statement mirrors what was given, in a use case such +as:: + + >>> s1 = select([user, user.c.id]) + >>> s2 = select([c1, c2, c3]) + >>> from sqlalchemy import union + >>> u = union(s1, s2) + >>> print(u) + SELECT "user".id, "user".name, "user".id + FROM "user" UNION SELECT c1, c2, c3 + + + +:ticket:`4753` + + + +.. _change_4449: + +Improved column labeling for simple column expressions using CAST or similar +---------------------------------------------------------------------------- + +A user pointed out that the PostgreSQL database has a convenient behavior when +using functions like CAST against a named column, in that the result column name +is named the same as the inner expression:: + + test=> SELECT CAST(data AS VARCHAR) FROM foo; + + data + ------ + 5 + (1 row) + +This allows one to apply CAST to table columns while not losing the column +name (above using the name ``"data"``) in the result row. Compare to +databases such as MySQL/MariaDB, as well as most others, where the column +name is taken from the full SQL expression and is not very portable:: + + MariaDB [test]> SELECT CAST(data AS CHAR) FROM foo; + +--------------------+ + | CAST(data AS CHAR) | + +--------------------+ + | 5 | + +--------------------+ + 1 row in set (0.003 sec) + + +In SQLAlchemy Core expressions, we never deal with a raw generated name like +the above, as SQLAlchemy applies auto-labeling to expressions like these, which +are up until now always a so-called "anonymous" expression:: + + >>> print(select([cast(foo.c.data, String)])) + SELECT CAST(foo.data AS VARCHAR) AS anon_1 # old behavior + FROM foo + +These anonymous expressions were necessary as SQLAlchemy's +:class:`_engine.ResultProxy` made heavy use of result column names in order to match +up datatypes, such as the :class:`.String` datatype which used to have +result-row-processing behavior, to the correct column, so most importantly the +names had to be both easy to determine in a database-agnostic manner as well as +unique in all cases. In SQLAlchemy 1.0 as part of :ticket:`918`, this +reliance on named columns in result rows (specifically the +``cursor.description`` element of the PEP-249 cursor) was scaled back to not be +necessary for most Core SELECT constructs; in release 1.4, the system overall +is becoming more comfortable with SELECT statements that have duplicate column +or label names such as in :ref:`change_4753`. So we now emulate PostgreSQL's +reasonable behavior for simple modifications to a single column, most +prominently with CAST:: + + >>> print(select([cast(foo.c.data, String)])) + SELECT CAST(foo.data AS VARCHAR) AS data + FROM foo + +For CAST against expressions that don't have a name, the previous logic is used +to generate the usual "anonymous" labels:: + + >>> print(select([cast('hi there,' + foo.c.data, String)])) + SELECT CAST(:data_1 + foo.data AS VARCHAR) AS anon_1 + FROM foo + +A :func:`.cast` against a :class:`.Label`, despite having to omit the label +expression as these don't render inside of a CAST, will nonetheless make use of +the given name:: + + >>> print(select([cast(('hi there,' + foo.c.data).label('hello_data'), String)])) + SELECT CAST(:data_1 + foo.data AS VARCHAR) AS hello_data + FROM foo + +And of course as was always the case, :class:`.Label` can be applied to the +expression on the outside to apply an "AS <name>" label directly:: + + >>> print(select([cast(('hi there,' + foo.c.data), String).label('hello_data')])) + SELECT CAST(:data_1 + foo.data AS VARCHAR) AS hello_data + FROM foo + + +:ticket:`4449` + +.. _change_4808: + +New "post compile" bound parameters used for LIMIT/OFFSET in Oracle, SQL Server +------------------------------------------------------------------------------- + +A major goal of the 1.4 series is to establish that all Core SQL constructs +are completely cacheable, meaning that a particular :class:`.Compiled` +structure will produce an identical SQL string regardless of any SQL parameters +used with it, which notably includes those used to specify the LIMIT and +OFFSET values, typically used for pagination and "top N" style results. + +While SQLAlchemy has used bound parameters for LIMIT/OFFSET schemes for many +years, a few outliers remained where such parameters were not allowed, including +a SQL Server "TOP N" statement, such as:: + + SELECT TOP 5 mytable.id, mytable.data FROM mytable + +as well as with Oracle, where the FIRST_ROWS() hint (which SQLAlchemy will +use if the ``optimize_limits=True`` parameter is passed to +:func:`_sa.create_engine` with an Oracle URL) does not allow them, +but also that using bound parameters with ROWNUM comparisons has been reported +as producing slower query plans:: + + SELECT anon_1.id, anon_1.data FROM ( + SELECT /*+ FIRST_ROWS(5) */ + anon_2.id AS id, + anon_2.data AS data, + ROWNUM AS ora_rn FROM ( + SELECT mytable.id, mytable.data FROM mytable + ) anon_2 + WHERE ROWNUM <= :param_1 + ) anon_1 WHERE ora_rn > :param_2 + +In order to allow for all statements to be unconditionally cacheable at the +compilation level, a new form of bound parameter called a "post compile" +parameter has been added, which makes use of the same mechanism as that +of "expanding IN parameters". This is a :func:`.bindparam` that behaves +identically to any other bound parameter except that parameter value will +be rendered literally into the SQL string before sending it to the DBAPI +``cursor.execute()`` method. The new parameter is used internally by the +SQL Server and Oracle dialects, so that the drivers receive the literal +rendered value but the rest of SQLAlchemy can still consider this as a +bound parameter. The above two statements when stringified using +``str(statement.compile(dialect=<dialect>))`` now look like:: + + SELECT TOP [POSTCOMPILE_param_1] mytable.id, mytable.data FROM mytable + +and:: + + SELECT anon_1.id, anon_1.data FROM ( + SELECT /*+ FIRST_ROWS([POSTCOMPILE__ora_frow_1]) */ + anon_2.id AS id, + anon_2.data AS data, + ROWNUM AS ora_rn FROM ( + SELECT mytable.id, mytable.data FROM mytable + ) anon_2 + WHERE ROWNUM <= [POSTCOMPILE_param_1] + ) anon_1 WHERE ora_rn > [POSTCOMPILE_param_2] + +The ``[POSTCOMPILE_<param>]`` format is also what is seen when an +"expanding IN" is used. + +When viewing the SQL logging output, the final form of the statement will +be seen:: + + SELECT anon_1.id, anon_1.data FROM ( + SELECT /*+ FIRST_ROWS(5) */ + anon_2.id AS id, + anon_2.data AS data, + ROWNUM AS ora_rn FROM ( + SELECT mytable.id AS id, mytable.data AS data FROM mytable + ) anon_2 + WHERE ROWNUM <= 8 + ) anon_1 WHERE ora_rn > 3 + + +The "post compile parameter" feature is exposed as public API through the +:paramref:`.bindparam.literal_execute` parameter, however is currently not +intended for general use. The literal values are rendered using the +:meth:`.TypeEngine.literal_processor` of the underlying datatype, which in +SQLAlchemy has **extremely limited** scope, supporting only integers and simple +string values. + +:ticket:`4808` + +.. _change_4712: + +Connection-level transactions can now be inactive based on subtransaction +------------------------------------------------------------------------- + +A :class:`_engine.Connection` now includes the behavior where a :class:`.Transaction` +can be made inactive due to a rollback on an inner transaction, however the +:class:`.Transaction` will not clear until it is itself rolled back. + +This is essentially a new error condition which will disallow statement +executions to proceed on a :class:`_engine.Connection` if an inner "sub" transaction +has been rolled back. The behavior works very similarly to that of the +ORM :class:`.Session`, where if an outer transaction has been begun, it needs +to be rolled back to clear the invalid transaction; this behavior is described +in :ref:`faq_session_rollback`. + +While the :class:`_engine.Connection` has had a less strict behavioral pattern than +the :class:`.Session`, this change was made as it helps to identify when +a subtransaction has rolled back the DBAPI transaction, however the external +code isn't aware of this and attempts to continue proceeding, which in fact +runs operations on a new transaction. The "test harness" pattern described +at :ref:`session_external_transaction` is the common place for this to occur. + +The "subtransaction" feature of Core and ORM is itself deprecated and will +no longer be present in version 2.0. As a result, this new error condition +is itself temporary as it will no longer apply once subtransactions are removed. + +In order to work with the 2.0 style behavior that does not include +subtransactions, use the :paramref:`_sa.create_engine.future` parameter +on :func:`_sa.create_engine`. + +The error message is described in the errors page at :ref:`error_8s2a`. + + + New Features - ORM ================== @@ -1133,6 +1603,53 @@ now the default behavior. :ticket:`5237` +.. _change_5150: + +cascade_backrefs behavior deprecated for removal in 2.0 +------------------------------------------------------- + +SQLAlchemy has long had a behavior of cascading objects into the +:class:`_orm.Session` based on backref assignment. Given ``User`` below +already in a :class:`_orm.Session`, assigning it to the ``Address.user`` +attribute of an ``Address`` object, assuming a bidrectional relationship +is set up, would mean that the ``Address`` also gets put into the +:class:`_orm.Session` at that point:: + + u1 = User() + session.add(u1) + + a1 = Address() + a1.user = u1 # <--- adds "a1" to the Session + +The above behavior was an unintended side effect of backref behavior, in that +since ``a1.user`` implies ``u1.addresses.append(a1)``, ``a1`` would get +cascaded into the :class:`_orm.Session`. This remains the default behavior +throughout 1.4. At some point, a new flag :paramref:`_orm.relationship.cascade_backrefs` +was added to disable to above behavior, as it can be surprising and also gets in +the way of some operations where the object would be placed in the :class:`_orm.Session` +too early and get prematurely flushed. + +In 2.0, the default behavior will be that "cascade_backrefs" is False, and +additionally there will be no "True" behavior as this is not generally a desirable +behavior. When 2.0 deprecation warnings are enabled, a warning will be emitted +when a "backref cascade" actually takes place. To get the new behavior, either +set :paramref:`_orm.relationship.cascade_backrefs` to ``False`` on the target +relationship, as is already supported in 1.3 and earlier, or alternatively make +use of the :paramref:`_orm.Session.future` flag to :term:`2.0-style` mode:: + + Session = sessionmaker(engine, future=True) + + with Session() as session: + u1 = User() + session.add(u1) + + a1 = Address() + a1.user = u1 # <--- will not add "a1" to the Session + + + +:ticket:`5150` + .. _change_1763: Eager loaders emit during unexpire operations @@ -1396,53 +1913,6 @@ configured to raise an exception using the Python warnings filter. :ticket:`4662` -.. _change_5150: - -cascade_backrefs behavior deprecated for removal in 2.0 -------------------------------------------------------- - -SQLAlchemy has long had a behavior of cascading objects into the -:class:`_orm.Session` based on backref assignment. Given ``User`` below -already in a :class:`_orm.Session`, assigning it to the ``Address.user`` -attribute of an ``Address`` object, assuming a bidrectional relationship -is set up, would mean that the ``Address`` also gets put into the -:class:`_orm.Session` at that point:: - - u1 = User() - session.add(u1) - - a1 = Address() - a1.user = u1 # <--- adds "a1" to the Session - -The above behavior was an unintended side effect of backref behavior, in that -since ``a1.user`` implies ``u1.addresses.append(a1)``, ``a1`` would get -cascaded into the :class:`_orm.Session`. This remains the default behavior -throughout 1.4. At some point, a new flag :paramref:`_orm.relationship.cascade_backrefs` -was added to disable to above behavior, as it can be surprising and also gets in -the way of some operations where the object would be placed in the :class:`_orm.Session` -too early and get prematurely flushed. - -In 2.0, the default behavior will be that "cascade_backrefs" is False, and -additionally there will be no "True" behavior as this is not generally a desirable -behavior. When 2.0 deprecation warnings are enabled, a warning will be emitted -when a "backref cascade" actually takes place. To get the new behavior, either -set :paramref:`_orm.relationship.cascade_backrefs` to ``False`` on the target -relationship, as is already supported in 1.3 and earlier, or alternatively make -use of the :paramref:`_orm.Session.future` flag to :term:`2.0-style` mode:: - - Session = sessionmaker(engine, future=True) - - with Session() as session: - u1 = User() - session.add(u1) - - a1 = Address() - a1.user = u1 # <--- will not add "a1" to the Session - - - -:ticket:`5150` - .. _change_4994: Persistence-related cascade operations disallowed with viewonly=True @@ -1592,416 +2062,6 @@ discriminator column:: :ticket:`5122` - -New Features - Core -==================== - -.. _change_4737: - - -Built-in FROM linting will warn for any potential cartesian products in a SELECT statement ------------------------------------------------------------------------------------------- - -As the Core expression language as well as the ORM are built on an "implicit -FROMs" model where a particular FROM clause is automatically added if any part -of the query refers to it, a common issue is the case where a SELECT statement, -either a top level statement or an embedded subquery, contains FROM elements -that are not joined to the rest of the FROM elements in the query, causing -what's referred to as a "cartesian product" in the result set, i.e. every -possible combination of rows from each FROM element not otherwise joined. In -relational databases, this is nearly always an undesirable outcome as it -produces an enormous result set full of duplicated, uncorrelated data. - -SQLAlchemy, for all of its great features, is particularly prone to this sort -of issue happening as a SELECT statement will have elements added to its FROM -clause automatically from any table seen in the other clauses. A typical -scenario looks like the following, where two tables are JOINed together, -however an additional entry in the WHERE clause that perhaps inadvertently does -not line up with these two tables will create an additional FROM entry:: - - address_alias = aliased(Address) - - q = session.query(User).\ - join(address_alias, User.addresses).\ - filter(Address.email_address == 'foo') - -The above query selects from a JOIN of ``User`` and ``address_alias``, the -latter of which is an alias of the ``Address`` entity. However, the -``Address`` entity is used within the WHERE clause directly, so the above would -result in the SQL:: - - SELECT - users.id AS users_id, users.name AS users_name, - users.fullname AS users_fullname, - users.nickname AS users_nickname - FROM addresses, users JOIN addresses AS addresses_1 ON users.id = addresses_1.user_id - WHERE addresses.email_address = :email_address_1 - -In the above SQL, we can see what SQLAlchemy developers term "the dreaded -comma", as we see "FROM addresses, users JOIN addresses" in the FROM clause -which is the classic sign of a cartesian product; where a query is making use -of JOIN in order to join FROM clauses together, however because one of them is -not joined, it uses a comma. The above query will return a full set of -rows that join the "user" and "addresses" table together on the "id / user_id" -column, and will then apply all those rows into a cartesian product against -every row in the "addresses" table directly. That is, if there are ten user -rows and 100 rows in addresses, the above query will return its expected result -rows, likely to be 100 as all address rows would be selected, multiplied by 100 -again, so that the total result size would be 10000 rows. - -The "table1, table2 JOIN table3" pattern is one that also occurs quite -frequently within the SQLAlchemy ORM due to either subtle mis-application of -ORM features particularly those related to joined eager loading or joined table -inheritance, as well as a result of SQLAlchemy ORM bugs within those same -systems. Similar issues apply to SELECT statements that use "implicit joins", -where the JOIN keyword is not used and instead each FROM element is linked with -another one via the WHERE clause. - -For some years there has been a recipe on the Wiki that applies a graph -algorithm to a :func:`_expression.select` construct at query execution time and inspects -the structure of the query for these un-linked FROM clauses, parsing through -the WHERE clause and all JOIN clauses to determine how FROM elements are linked -together and ensuring that all the FROM elements are connected in a single -graph. This recipe has now been adapted to be part of the :class:`.SQLCompiler` -itself where it now optionally emits a warning for a statement if this -condition is detected. The warning is enabled using the -:paramref:`_sa.create_engine.enable_from_linting` flag and is enabled by default. -The computational overhead of the linter is very low, and additionally it only -occurs during statement compilation which means for a cached SQL statement it -only occurs once. - -Using this feature, our ORM query above will emit a warning:: - - >>> q.all() - SAWarning: SELECT statement has a cartesian product between FROM - element(s) "addresses_1", "users" and FROM element "addresses". - Apply join condition(s) between each element to resolve. - -The linter feature accommodates not just for tables linked together through the -JOIN clauses but also through the WHERE clause Above, we can add a WHERE -clause to link the new ``Address`` entity with the previous ``address_alias`` -entity and that will remove the warning:: - - q = session.query(User).\ - join(address_alias, User.addresses).\ - filter(Address.email_address == 'foo').\ - filter(Address.id == address_alias.id) # resolve cartesian products, - # will no longer warn - -The cartesian product warning considers **any** kind of link between two -FROM clauses to be a resolution, even if the end result set is still -wasteful, as the linter is intended only to detect the common case of a -FROM clause that is completely unexpected. If the FROM clause is referred -to explicitly elsewhere and linked to the other FROMs, no warning is emitted:: - - q = session.query(User).\ - join(address_alias, User.addresses).\ - filter(Address.email_address == 'foo').\ - filter(Address.id > address_alias.id) # will generate a lot of rows, - # but no warning - -Full cartesian products are also allowed if they are explicitly stated; if we -wanted for example the cartesian product of ``User`` and ``Address``, we can -JOIN on :func:`.true` so that every row will match with every other; the -following query will return all rows and produce no warnings:: - - from sqlalchemy import true - - # intentional cartesian product - q = session.query(User).join(Address, true()) # intentional cartesian product - -The warning is only generated by default when the statement is compiled by the -:class:`_engine.Connection` for execution; calling the :meth:`_expression.ClauseElement.compile` -method will not emit a warning unless the linting flag is supplied:: - - >>> from sqlalchemy.sql import FROM_LINTING - >>> print(q.statement.compile(linting=FROM_LINTING)) - SAWarning: SELECT statement has a cartesian product between FROM element(s) "addresses" and FROM element "users". Apply join condition(s) between each element to resolve. - SELECT users.id, users.name, users.fullname, users.nickname - FROM addresses, users JOIN addresses AS addresses_1 ON users.id = addresses_1.user_id - WHERE addresses.email_address = :email_address_1 - -:ticket:`4737` - - - -Behavior Changes - Core -======================== - -.. _change_4753: - -SELECT objects and derived FROM clauses allow for duplicate columns and column labels -------------------------------------------------------------------------------------- - -This change allows that the :func:`_expression.select` construct now allows for duplicate -column labels as well as duplicate column objects themselves, so that result -tuples are organized and ordered in the identical way in that the columns were -selected. The ORM :class:`_query.Query` already works this way, so this change -allows for greater cross-compatibility between the two, which is a key goal of -the 2.0 transition:: - - >>> from sqlalchemy import column, select - >>> c1, c2, c3, c4 = column('c1'), column('c2'), column('c3'), column('c4') - >>> stmt = select([c1, c2, c3.label('c2'), c2, c4]) - >>> print(stmt) - SELECT c1, c2, c3 AS c2, c2, c4 - -To support this change, the :class:`_expression.ColumnCollection` used by -:class:`_expression.SelectBase` as well as for derived FROM clauses such as subqueries -also support duplicate columns; this includes the new -:attr:`_expression.SelectBase.selected_columns` attribute, the deprecated ``SelectBase.c`` -attribute, as well as the :attr:`_expression.FromClause.c` attribute seen on constructs -such as :class:`.Subquery` and :class:`_expression.Alias`:: - - >>> list(stmt.selected_columns) - [ - <sqlalchemy.sql.elements.ColumnClause at 0x7fa540bcca20; c1>, - <sqlalchemy.sql.elements.ColumnClause at 0x7fa540bcc9e8; c2>, - <sqlalchemy.sql.elements.Label object at 0x7fa540b3e2e8>, - <sqlalchemy.sql.elements.ColumnClause at 0x7fa540bcc9e8; c2>, - <sqlalchemy.sql.elements.ColumnClause at 0x7fa540897048; c4> - ] - - >>> print(stmt.subquery().select()) - SELECT anon_1.c1, anon_1.c2, anon_1.c2, anon_1.c2, anon_1.c4 - FROM (SELECT c1, c2, c3 AS c2, c2, c4) AS anon_1 - -:class:`_expression.ColumnCollection` also allows access by integer index to support -when the string "key" is ambiguous:: - - >>> stmt.selected_columns[2] - <sqlalchemy.sql.elements.Label object at 0x7fa540b3e2e8> - -To suit the use of :class:`_expression.ColumnCollection` in objects such as -:class:`_schema.Table` and :class:`.PrimaryKeyConstraint`, the old "deduplicating" -behavior which is more critical for these objects is preserved in a new class -:class:`.DedupeColumnCollection`. - -The change includes that the familiar warning ``"Column %r on table %r being -replaced by %r, which has the same key. Consider use_labels for select() -statements."`` is **removed**; the :meth:`_expression.Select.apply_labels` is still -available and is still used by the ORM for all SELECT operations, however it -does not imply deduplication of column objects, although it does imply -deduplication of implicitly generated labels:: - - >>> from sqlalchemy import table - >>> user = table('user', column('id'), column('name')) - >>> stmt = select([user.c.id, user.c.name, user.c.id]).apply_labels() - >>> print(stmt) - SELECT "user".id AS user_id, "user".name AS user_name, "user".id AS id_1 - FROM "user" - -Finally, the change makes it easier to create UNION and other -:class:`_selectable.CompoundSelect` objects, by ensuring that the number and position -of columns in a SELECT statement mirrors what was given, in a use case such -as:: - - >>> s1 = select([user, user.c.id]) - >>> s2 = select([c1, c2, c3]) - >>> from sqlalchemy import union - >>> u = union(s1, s2) - >>> print(u) - SELECT "user".id, "user".name, "user".id - FROM "user" UNION SELECT c1, c2, c3 - - - -:ticket:`4753` - - - -.. _change_4449: - -Improved column labeling for simple column expressions using CAST or similar ----------------------------------------------------------------------------- - -A user pointed out that the PostgreSQL database has a convenient behavior when -using functions like CAST against a named column, in that the result column name -is named the same as the inner expression:: - - test=> SELECT CAST(data AS VARCHAR) FROM foo; - - data - ------ - 5 - (1 row) - -This allows one to apply CAST to table columns while not losing the column -name (above using the name ``"data"``) in the result row. Compare to -databases such as MySQL/MariaDB, as well as most others, where the column -name is taken from the full SQL expression and is not very portable:: - - MariaDB [test]> SELECT CAST(data AS CHAR) FROM foo; - +--------------------+ - | CAST(data AS CHAR) | - +--------------------+ - | 5 | - +--------------------+ - 1 row in set (0.003 sec) - - -In SQLAlchemy Core expressions, we never deal with a raw generated name like -the above, as SQLAlchemy applies auto-labeling to expressions like these, which -are up until now always a so-called "anonymous" expression:: - - >>> print(select([cast(foo.c.data, String)])) - SELECT CAST(foo.data AS VARCHAR) AS anon_1 # old behavior - FROM foo - -These anonymous expressions were necessary as SQLAlchemy's -:class:`_engine.ResultProxy` made heavy use of result column names in order to match -up datatypes, such as the :class:`.String` datatype which used to have -result-row-processing behavior, to the correct column, so most importantly the -names had to be both easy to determine in a database-agnostic manner as well as -unique in all cases. In SQLAlchemy 1.0 as part of :ticket:`918`, this -reliance on named columns in result rows (specifically the -``cursor.description`` element of the PEP-249 cursor) was scaled back to not be -necessary for most Core SELECT constructs; in release 1.4, the system overall -is becoming more comfortable with SELECT statements that have duplicate column -or label names such as in :ref:`change_4753`. So we now emulate PostgreSQL's -reasonable behavior for simple modifications to a single column, most -prominently with CAST:: - - >>> print(select([cast(foo.c.data, String)])) - SELECT CAST(foo.data AS VARCHAR) AS data - FROM foo - -For CAST against expressions that don't have a name, the previous logic is used -to generate the usual "anonymous" labels:: - - >>> print(select([cast('hi there,' + foo.c.data, String)])) - SELECT CAST(:data_1 + foo.data AS VARCHAR) AS anon_1 - FROM foo - -A :func:`.cast` against a :class:`.Label`, despite having to omit the label -expression as these don't render inside of a CAST, will nonetheless make use of -the given name:: - - >>> print(select([cast(('hi there,' + foo.c.data).label('hello_data'), String)])) - SELECT CAST(:data_1 + foo.data AS VARCHAR) AS hello_data - FROM foo - -And of course as was always the case, :class:`.Label` can be applied to the -expression on the outside to apply an "AS <name>" label directly:: - - >>> print(select([cast(('hi there,' + foo.c.data), String).label('hello_data')])) - SELECT CAST(:data_1 + foo.data AS VARCHAR) AS hello_data - FROM foo - - -:ticket:`4449` - -.. _change_4808: - -New "post compile" bound parameters used for LIMIT/OFFSET in Oracle, SQL Server -------------------------------------------------------------------------------- - -A major goal of the 1.4 series is to establish that all Core SQL constructs -are completely cacheable, meaning that a particular :class:`.Compiled` -structure will produce an identical SQL string regardless of any SQL parameters -used with it, which notably includes those used to specify the LIMIT and -OFFSET values, typically used for pagination and "top N" style results. - -While SQLAlchemy has used bound parameters for LIMIT/OFFSET schemes for many -years, a few outliers remained where such parameters were not allowed, including -a SQL Server "TOP N" statement, such as:: - - SELECT TOP 5 mytable.id, mytable.data FROM mytable - -as well as with Oracle, where the FIRST_ROWS() hint (which SQLAlchemy will -use if the ``optimize_limits=True`` parameter is passed to -:func:`_sa.create_engine` with an Oracle URL) does not allow them, -but also that using bound parameters with ROWNUM comparisons has been reported -as producing slower query plans:: - - SELECT anon_1.id, anon_1.data FROM ( - SELECT /*+ FIRST_ROWS(5) */ - anon_2.id AS id, - anon_2.data AS data, - ROWNUM AS ora_rn FROM ( - SELECT mytable.id, mytable.data FROM mytable - ) anon_2 - WHERE ROWNUM <= :param_1 - ) anon_1 WHERE ora_rn > :param_2 - -In order to allow for all statements to be unconditionally cacheable at the -compilation level, a new form of bound parameter called a "post compile" -parameter has been added, which makes use of the same mechanism as that -of "expanding IN parameters". This is a :func:`.bindparam` that behaves -identically to any other bound parameter except that parameter value will -be rendered literally into the SQL string before sending it to the DBAPI -``cursor.execute()`` method. The new parameter is used internally by the -SQL Server and Oracle dialects, so that the drivers receive the literal -rendered value but the rest of SQLAlchemy can still consider this as a -bound parameter. The above two statements when stringified using -``str(statement.compile(dialect=<dialect>))`` now look like:: - - SELECT TOP [POSTCOMPILE_param_1] mytable.id, mytable.data FROM mytable - -and:: - - SELECT anon_1.id, anon_1.data FROM ( - SELECT /*+ FIRST_ROWS([POSTCOMPILE__ora_frow_1]) */ - anon_2.id AS id, - anon_2.data AS data, - ROWNUM AS ora_rn FROM ( - SELECT mytable.id, mytable.data FROM mytable - ) anon_2 - WHERE ROWNUM <= [POSTCOMPILE_param_1] - ) anon_1 WHERE ora_rn > [POSTCOMPILE_param_2] - -The ``[POSTCOMPILE_<param>]`` format is also what is seen when an -"expanding IN" is used. - -When viewing the SQL logging output, the final form of the statement will -be seen:: - - SELECT anon_1.id, anon_1.data FROM ( - SELECT /*+ FIRST_ROWS(5) */ - anon_2.id AS id, - anon_2.data AS data, - ROWNUM AS ora_rn FROM ( - SELECT mytable.id AS id, mytable.data AS data FROM mytable - ) anon_2 - WHERE ROWNUM <= 8 - ) anon_1 WHERE ora_rn > 3 - - -The "post compile parameter" feature is exposed as public API through the -:paramref:`.bindparam.literal_execute` parameter, however is currently not -intended for general use. The literal values are rendered using the -:meth:`.TypeEngine.literal_processor` of the underlying datatype, which in -SQLAlchemy has **extremely limited** scope, supporting only integers and simple -string values. - -:ticket:`4808` - -.. _change_4712: - -Connection-level transactions can now be inactive based on subtransaction -------------------------------------------------------------------------- - -A :class:`_engine.Connection` now includes the behavior where a :class:`.Transaction` -can be made inactive due to a rollback on an inner transaction, however the -:class:`.Transaction` will not clear until it is itself rolled back. - -This is essentially a new error condition which will disallow statement -executions to proceed on a :class:`_engine.Connection` if an inner "sub" transaction -has been rolled back. The behavior works very similarly to that of the -ORM :class:`.Session`, where if an outer transaction has been begun, it needs -to be rolled back to clear the invalid transaction; this behavior is described -in :ref:`faq_session_rollback` - -While the :class:`_engine.Connection` has had a less strict behavioral pattern than -the :class:`.Session`, this change was made as it helps to identify when -a subtransaction has rolled back the DBAPI transaction, however the external -code isn't aware of this and attempts to continue proceeding, which in fact -runs operations on a new transaction. The "test harness" pattern described -at :ref:`session_external_transaction` is the common place for this to occur. - -The new behavior is described in the errors page at :ref:`error_8s2a`. - - Dialect Changes =============== diff --git a/doc/build/changelog/migration_20.rst b/doc/build/changelog/migration_20.rst index ce26e7891..d6077462e 100644 --- a/doc/build/changelog/migration_20.rst +++ b/doc/build/changelog/migration_20.rst @@ -530,7 +530,7 @@ ResultProxy replaced with Result which has more refined methods and behaviors Review the new future API for result sets: - :class:`_future.Result` + :class:`_engine.Result` A major goal of SQLAlchemy 2.0 is to unify how "results" are handled between @@ -747,18 +747,22 @@ pattern which basically does the same thing. ORM Query Unified with Core Select ================================== -.. admonition:: Certainty: tentative - - Tenative overall, however there will almost definitely be - architectural changes in :class:`_query.Query` that move it closer to - :func:`_expression.select`. +.. admonition:: Certainty: definite - The ``session.query(<cls>)`` pattern itself will likely **not** be fully - removed. As this pattern is extremely prevalent and numerous within any - individual application, and that it does not intrinsically suggest an - "antipattern" from a development standpoint, at the moment we are hoping - that a transition to 2.0 won't require a rewrite of every ``session.query()`` - call, however it will be a legacy pattern that may warn as such. + This is now implemented in 1.4. The :class:`_orm.Query` object now + generates a :class:`_sql.Select` object, which is then executed + via :meth:`_orm.Session.execute`. The API to instead use :class:`_sql.Select` + and :meth:`_orm.Session.execute` directly, foregoing the usage of + :class:`_orm.Query` altogether, is fully available in 1.4. Most internal + ORM systems for loading and refreshing objects has been transitioned to + use :class:`_sql.Select` directly. + + The ``session.query(<cls>)`` pattern itself will likely **not** be fully + removed. As this pattern is extremely prevalent and numerous within any + individual application, and that it does not intrinsically suggest an + "antipattern" from a development standpoint, at the moment we are hoping + that a transition to 2.0 won't require a rewrite of every ``session.query()`` + call, however it will be a legacy pattern that may warn as such. Ever wonder why SQLAlchemy :func:`_expression.select` uses :meth:`_expression.Select.where` to add a WHERE clause and :class:`_query.Query` uses :meth:`_query.Query.filter` ? Same here! @@ -1082,9 +1086,10 @@ The above query will disambiguate the ``.id`` column of ``User`` and Transparent Statement Compilation Caching replaces "Baked" queries, works in Core ================================================================================== -.. admonition:: Certainty: tentative +.. admonition:: Certainty: definite - Pending further architectural prototyping and performance testing + This is now implemented in 1.4. The migration notes at :ref:`change_4639` + detail the change. A major restructuring of the Core internals as well as of that of the ORM :class:`_query.Query` will be reorganizing the major statement objects to have very @@ -1136,13 +1141,15 @@ it will be fully transparent. Applications that wish to reduce statement building latency even further to the levels currently offered by the "baked" system can opt to use the "lambda" constructs. -Uniquifying ORM Rows -==================== +ORM Rows not uniquified by default +=================================== -.. admonition:: Certainty: tentative +.. admonition:: Certainty: likely - However this is a widely requested behavior so - it's likely something will have to happen in this regard + This is now partially implemented for the :term:`2.0 style` use of ORM + queries, in that rows are not automatically uniquified unless unique() is + called. However we have yet to receive user feedback (or + complaints) on this change. ORM rows returned by ``session.execute(stmt)`` are no longer automatically "uniqued"; this must be called explicitly:: @@ -1171,10 +1178,10 @@ will now be the same. Tuples, Scalars, single-row results with ORM / Core results made consistent ============================================================================ -.. admonition:: Certainty: tentative +.. admonition:: Certainty: likely - Again this is an often requested behavior - at the ORM level so something will have to happen in this regard + This is also implemented for :term:`2.0 style` ORM use however we don't + have user feedback yet. The :meth:`.future.Result.all` method now delivers named-tuple results in all cases, even for an ORM select that is against a single entity. This @@ -1240,73 +1247,132 @@ The same pattern is needed for "dynamic" relationships:: user.addresses.where(Address.id > 10).execute().all() -What about asyncio??? +Asyncio Support ===================== -.. admonition:: Certainty: tentative - - Not much is really being proposed here except a willingness to continue - working with third-party extensions and contributors who want to work on - the problem, as well as hopefully making the task of integration a little - bit more straightforward. - -How can SQLAlchemy do a whole re-think for Python 3 only and not take into -account asyncio? The current thinking here is going to be mixed for fans -of asyncio-everything, here are the bulletpoints: - -* As is likely well known SQLAlchemy developers maintain that `asyncio with - SQL queries usually not that compelling of an - idea <https://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/>`_ - -* There's almost no actual advantage to having an "asyncio" version of - SQLAlchemy other than personal preference and arguably interoperability - with existing asyncio code (however thread executors remain probably a - better option). Database connections do not - usually fit the criteria of the kind of socket connection that benefits - by being accessed in a non-blocking way, since they are usually local, - fast services that are accessed on a connection-limited scale. This is - in complete contrast to the use case for non-blocking IO which is massively - scaled connections to sockets that are arbitrarily slow and/or sleepy. - -* Nevertheless, lots of Python programmers like the asyncio approach and feel - more comfortable working with requests in the inherently "callback" - style of event-based programming. SQLAlchemy has every desire for these - people to be happy. - -* Making things complicated is that Python doesn't have a `spec for an asyncio - DBAPI <https://discuss.python.org/t/asynchronous-dbapi/2206/>`_ as of yet, which - makes it pretty tough for DBAPIs to exist without them all being dramatically - different in how they work and would be integrated. - -* There are however a few DBAPIs for PostgreSQL that are truly non-blocking, - as well as at least one for MySQL that works with non-blocking IO. It's not - known if any such system exists for SQLite, Oracle, ODBC datasources, SQL - Server, etc. - -* There are (more than one?) extensions of SQLAlchemy right now which basically - pick and choose a few parts of the compilation APIs and then reimplement - their own engine implementation completely, such as `aiopg <https://github.com/aio-libs/aiopg/blob/master/aiopg/sa/connection.py>`_. - -* These implementations appear to be useful for users however they aren't able - to keep up with SQLAlchemy's own capabilities and they likely don't really - work for lots of existing use cases either. - -* Essentially, it is hoped that the re-architecting of :class:`_engine.Connection` - to no longer support things like "autocommit" and "connectionless" - execution, as well as the changes to how result fetching will work with the - ``Result`` which is hoped to be simpler in how it interacts with - the cursor, will make it **much easier** to build async versions of - SQLAlchemy's :class:`_engine.Connection`. The simplified model of - ``Connection.execute()`` and ``Session.execute()`` as the single point of - invocation of queries should also make things easier. - -* SQLAlchemy has always remained `fully open - <https://github.com/sqlalchemy/sqlalchemy/issues/3414>`_ to having a real - asyncio extension present as part of SQLAlchemy itself. However this would - require **dedicated, long term maintainers** in order for it to be a thing. - -* It's probably better that such approaches remain third party, however it - is hoped that architectural changes in SQLAlchemy will make such approaches - more straightforward to implement and track SQLAlchemy's capabilities. +.. admonition:: Certainty: definite + A surprising development will allow asyncio support including with the + ORM to be fully implemented. There will even be a **completely optional** + path to having lazy loading be available, for those willing to make use of + some "controversial" patterns. + +There was an entire section here detailing how asyncio is a nice to have, +but not really necessary, there are some approaches already, and maybe +third parties can keep doing it. + +What's changed is that there is now an approach to doing this in SQLAlchemy +directly that does not impact the existing library nor does it imply an +entirely separate version of everything be maintained. What has *not* changed +is that asyncio is not very necessary for relational databases but **that's +fine, we will have asyncio, no more need to debate :) :) :)**. + +The proof of concept at https://gist.github.com/zzzeek/4e89ce6226826e7a8df13e1b573ad354 +illustrates how to write an asyncio application that makes use of a pure asyncio +driver (asyncpg), with part of the code **in between** remaining as sync code +without the use of any await/async keywords. The central technique involves +minimal use of a greenlet (e.g. stackless Python) to perform the necessary +context switches when an "await" occurs. The approach has been vetted +both with asyncio developers as well as greenlet developers, the latter +of which contributed a great degree of simplification the already simple recipe +such that can context switch async coroutines with no decrease in performance. + +The proof of concept has then been expanded to work within SQLAlchemy Core +and is presently in a Gerrit review. A SQLAlchemy dialect for the asyncpg +driver has been written and it passes most tests. + +Example ORM use will look similar to the following; this example is already +runnable with the in-review codebase:: + + import asyncio + + from sqlalchemy.asyncio import create_async_engine + from sqlalchemy.asyncio import AsyncSession + # ... other imports ... + + async def async_main(): + engine = create_async_engine( + "postgresql+asyncpg://scott:tiger@localhost/test", echo=True, + ) + + + # assume a typical ORM model with classes A and B + + session = AsyncSession(engine) + session.add_all( + [ + A(bs=[B(), B()], data="a1"), + A(bs=[B()], data="a2"), + A(bs=[B(), B()], data="a3"), + ] + ) + await session.commit() + stmt = select(A).options(selectinload(A.bs)) + result = await session.execute(stmt) + for a1 in result.scalars(): + print(a1) + for b1 in a1.bs: + print(b1) + + result = await session.execute(select(A).order_by(A.id)) + + a1 = result.scalars().first() + a1.data = "new data" + await session.commit() + + asyncio.run(async_main()) + +The "controversial" feature, if provided, would include that the "greenlet" +context would be supplied as front-facing API. This would allow an asyncio +application to spawn a greenlet that contains sync-code, which could use the +Core and ORM in a fully traditional manner including that lazy loading +for columns and relationships would be present. This mode of use is +somewhat similar to running an application under an event-based +programming library such as gevent or eventlet, however the underyling +network calls would be within a pure asyncio context, i.e. like that of the +asyncpg driver. An example of this use, which is also runnable with +the in-review codebase:: + + import asyncio + + from sqlalchemy.asyncio import greenlet_spawn + + from sqlalchemy import create_engine + from sqlalchemy.orm import Session + # ... other imports ... + + def main(): + # standard "sync" engine with the "async" driver. + engine = create_engine( + "postgresql+asyncpg://scott:tiger@localhost/test", echo=True, + ) + + # assume a typical ORM model with classes A and B + + session = Session(engine) + session.add_all( + [ + A(bs=[B(), B()], data="a1"), + A(bs=[B()], data="a2"), + A(bs=[B(), B()], data="a3"), + ] + ) + session.commit() + for a1 in session.query(A).all(): + print("a: %s" % a1) + print("bs: %s" % (a1.bs)) # emits a lazyload. + + asyncio.run(greenlet_spawn(main)) + + +Above, we see a ``main()`` function that contains within it a 100% normal +looking Python program using the SQLAlchemy ORM, using plain ORM imports and +basically absolutely nothing out of the ordinary. It just happens to be called +from inside of an ``asyncio.run()`` call rather than directly, and it uses a +DBAPI that is only compatible with asyncio. There is no "monkeypatching" or +anything else like that involved. Any asyncio program can opt +to place it's database-related business methods into the above pattern, +if preferred, rather than using the asyncio SQLAlchemy API directly. Or not. +The greenlet technique, which is also being ported to other frameworks +suich as Flask, has now made it so that it **no longer matters**. diff --git a/doc/build/changelog/unreleased_14/4472.rst b/doc/build/changelog/unreleased_14/4472.rst index 6de5058c1..35fa9204a 100644 --- a/doc/build/changelog/unreleased_14/4472.rst +++ b/doc/build/changelog/unreleased_14/4472.rst @@ -13,7 +13,6 @@ :ref:`loader_option_criteria` - :func:`_orm.with_loader_criteria` + :ref:`do_orm_execute_global_criteria` - .. TODO: add links to new examples section and session-related - documentation involving do_orm_execute event when merged
\ No newline at end of file + :func:`_orm.with_loader_criteria` diff --git a/doc/build/conf.py b/doc/build/conf.py index 710ab58fe..13d573296 100644 --- a/doc/build/conf.py +++ b/doc/build/conf.py @@ -37,7 +37,7 @@ extensions = [ "changelog", "sphinx_paramlinks", ] -needs_extensions = {"zzzeeksphinx": "1.1.5"} +needs_extensions = {"zzzeeksphinx": "1.1.6"} # Add any paths that contain templates here, relative to this directory. # not sure why abspath() is needed here, some users @@ -130,6 +130,7 @@ zzzeeksphinx_module_prefixes = { "_types": "sqlalchemy.types", "_expression": "sqlalchemy.sql.expression", "_sql": "sqlalchemy.sql.expression", + "_dml": "sqlalchemy.sql.expression", "_functions": "sqlalchemy.sql.functions", "_pool": "sqlalchemy.pool", "_event": "sqlalchemy.event", @@ -137,7 +138,7 @@ zzzeeksphinx_module_prefixes = { "_exc": "sqlalchemy.exc", "_reflection": "sqlalchemy.engine.reflection", "_orm": "sqlalchemy.orm", - "_query": "sqlalchemy.orm.query", + "_query": "sqlalchemy.orm", "_ormevent": "sqlalchemy.orm.event", "_ormexc": "sqlalchemy.orm.exc", "_baked": "sqlalchemy.ext.baked", diff --git a/doc/build/core/connections.rst b/doc/build/core/connections.rst index aa41a868a..0e0170d43 100644 --- a/doc/build/core/connections.rst +++ b/doc/build/core/connections.rst @@ -560,8 +560,8 @@ value as it uses bound parameters. Subsequent invocations of the above within the scope of the ``connection.execute()`` call for enhanced performance. .. note:: it is important to note that the SQL compilation cache is caching - the **SQL string that is passed to the database only**, and **not** the - results returned by a query. It is in no way a data cache and does not + the **SQL string that is passed to the database only**, and **not the data** + returned by a query. It is in no way a data cache and does not impact the results returned for a particular SQL statement nor does it imply any memory use linked to fetching of result rows. @@ -866,7 +866,7 @@ The cache can also be disabled with this argument by sending a value of Using Lambdas to add significant speed gains to statement production -------------------------------------------------------------------- -.. warning:: This technique is generally non-essential except in very performance +.. deepalchemy:: This technique is generally non-essential except in very performance intensive scenarios, and intended for experienced Python programmers. While fairly straightforward, it involves metaprogramming concepts that are not appropriate for novice Python developers. The lambda approach can be diff --git a/doc/build/glossary.rst b/doc/build/glossary.rst index fddc535ed..82e81aad9 100644 --- a/doc/build/glossary.rst +++ b/doc/build/glossary.rst @@ -16,7 +16,7 @@ Glossary These terms are new in SQLAlchemy 1.4 and refer to the SQLAlchemy 1.4-> 2.0 transition plan, described at :ref:`migration_20_toplevel`. The term "1.x style" refers to an API used in the way it's been documented - throughout the 1.x series of SQLAlhcemy and earlier (e.g. 1.3, 1.2, etc) + throughout the 1.x series of SQLAlchemy and earlier (e.g. 1.3, 1.2, etc) and the term "2.0 style" refers to the way an API will look in version 2.0. Version 1.4 implements nearly all of 2.0's API in so-called "transition mode". @@ -25,6 +25,52 @@ Glossary :ref:`migration_20_toplevel` + **Enabling 2.0 style usage** + + When using code from a documentation example that indicates + :term:`2.0-style`, the :class:`_engine.Engine` as well as the + :class:`_orm.Session` in use should make use of "future" mode, + via the :paramref:`_sa.create_engine.future` and + :paramref:`_orm.Session.future` flags:: + + from sqlalchemy import create_engine + from sqlalchemy.orm import sessionmaker + + + engine = create_engine("mysql://user:pass:host/dbname", future=True) + Session = sessionmaker(bind=engine, future=True) + + **ORM Queries in 2.0 style** + + Besides the above changes to :class:`_engine.Engine` and + :class:`_orm.Session`, probably the most major API change implied by + 1.x->2.0 is the migration from using the :class:`_orm.Query` object for + ORM SELECT statements and instead using the :func:`_sql.select` + construct in conjunction with the :meth:`_orm.Session.execute` method. + The general change looks like the following. Given a + :class:`_orm.Session` and a :class:`_orm.Query` against that + :class:`_orm.Session`:: + + list_of_users = session.query(User).join(User.addresses).all() + + The new style constructs the query separately from the + :class:`_orm.Session` using the :func:`_sql.select` construct; when + populated with ORM entities like the ``User`` class from the :ref:`ORM + Tutorial <ormtutorial_toplevel>`, the resulting :class:`_sql.Select` + construct receives additional "plugin" state that allows it to work + like the :class:`_orm.Query`:: + + + from sqlalchemy import select + + stmt = select(User).join(User.addresses) + + session = Session() # make sure future=True is used for 1.4 + + result = session.execute(stmt) + + list_of_users = result.scalars().all() + relational relational algebra diff --git a/doc/build/index.rst b/doc/build/index.rst index cb7b8aa6d..6afef5083 100644 --- a/doc/build/index.rst +++ b/doc/build/index.rst @@ -44,8 +44,7 @@ of Python objects, proceed first to the tutorial. * **ORM Usage:** :doc:`Session Usage and Guidelines <orm/session>` | - :doc:`Loading Objects <orm/loading_objects>` | - :doc:`Cached Query Extension <orm/extensions/baked>` + :doc:`Loading Objects <orm/loading_objects>` * **Extending the ORM:** :doc:`ORM Events and Internals <orm/extending>` diff --git a/doc/build/orm/events.rst b/doc/build/orm/events.rst index ecf0cc65b..1db1137e0 100644 --- a/doc/build/orm/events.rst +++ b/doc/build/orm/events.rst @@ -10,36 +10,101 @@ For an introduction to the most commonly used ORM events, see the section at :ref:`event_toplevel`. Non-ORM events such as those regarding connections and low-level statement execution are described in :ref:`core_event_toplevel`. -.. _orm_attribute_events: +Session Events +-------------- -Attribute Events ----------------- +The most basic event hooks are available at the level of the ORM +:class:`_orm.Session` object. The types of things that are intercepted +here include: + +* **Persistence Operations** - the ORM flush process that sends changes to the + database can be extended using events that fire off at different parts of the + flush, to augment or modify the data being sent to the database or to allow + other things to happen when persistence occurs. Read more about persistence + events at :ref:`session_persistence_events`. -.. autoclass:: sqlalchemy.orm.events.AttributeEvents +* **Object lifecycle events** - hooks when objects are added, persisted, + deleted from sessions. Read more about these at + :ref:`session_lifecycle_events`. + +* **Execution Events** - Part of the :term:`2.0 style` execution model, all + SELECT statements against ORM entities emitted, as well as bulk UPDATE + and DELETE statements outside of the flush process, are intercepted + from the :meth:`_orm.Session.execute` method using the + :meth:`_orm.SessionEvents.do_orm_execute` method. Read more about this + event at :ref:`session_execute_events`. + +Be sure to read the :ref:`session_events_toplevel` chapter for context +on these events. + +.. autoclass:: sqlalchemy.orm.SessionEvents :members: Mapper Events ------------- -.. autoclass:: sqlalchemy.orm.events.MapperEvents +Mapper event hooks encompass things that happen as related to individual +or multiple :class:`_orm.Mapper` objects, which are the central configurational +object that maps a user-defined class to a :class:`_schema.Table` object. +Types of things which occur at the :class:`_orm.Mapper` level include: + +* **Per-object persistence operations** - the most popular mapper hooks are the + unit-of-work hooks such as :meth:`_orm.MapperEvents.before_insert`, + :meth:`_orm.MapperEvents.after_update`, etc. These events are contrasted to + the more coarse grained session-level events such as + :meth:`_orm.SessionEvents.before_flush` in that they occur within the flush + process on a per-object basis; while finer grained activity on an object is + more straightforward, availability of :class:`_orm.Session` features is + limited. + +* **Mapper configuration events** - the other major class of mapper hooks are + those which occur as a class is mapped, as a mapper is finalized, and when + sets of mappers are configured to refer to each other. These events include + :meth:`_orm.MapperEvents.instrument_class`, + :meth:`_orm.MapperEvents.before_mapper_configured` and + :meth:`_orm.MapperEvents.mapper_configured` at the individual + :class:`_orm.Mapper` level, and :meth:`_orm.MapperEvents.before_configured` + and :meth:`_orm.MapperEvents.after_configured` at the level of collections of + :class:`_orm.Mapper` objects. + +.. autoclass:: sqlalchemy.orm.MapperEvents :members: Instance Events --------------- -.. autoclass:: sqlalchemy.orm.events.InstanceEvents +Instance events are focused on the construction of ORM mapped instances, +including when they are instantiated as :term:`transient` objects, +when they are loaded from the database and become :term:`persistent` objects, +as well as when database refresh or expiration operations occur on the object. + +.. autoclass:: sqlalchemy.orm.InstanceEvents :members: -Session Events --------------- -.. autoclass:: sqlalchemy.orm.events.SessionEvents + +.. _orm_attribute_events: + +Attribute Events +---------------- + +Attribute events are triggered as things occur on individual attributes of +ORM mapped objects. These events form the basis for things like +:ref:`custom validation functions <simple_validators>` as well as +:ref:`backref handlers <relationships_backref>`. + +.. seealso:: + + :ref:`mapping_attributes_toplevel` + +.. autoclass:: sqlalchemy.orm.AttributeEvents :members: + Query Events ------------ -.. autoclass:: sqlalchemy.orm.events.QueryEvents +.. autoclass:: sqlalchemy.orm.QueryEvents :members: Instrumentation Events @@ -47,6 +112,6 @@ Instrumentation Events .. automodule:: sqlalchemy.orm.instrumentation -.. autoclass:: sqlalchemy.orm.events.InstrumentationEvents +.. autoclass:: sqlalchemy.orm.InstrumentationEvents :members: diff --git a/doc/build/orm/examples.rst b/doc/build/orm/examples.rst index e8bb894fd..7a79104b9 100644 --- a/doc/build/orm/examples.rst +++ b/doc/build/orm/examples.rst @@ -147,6 +147,13 @@ Horizontal Sharding Extending the ORM ================= +.. _examples_session_orm_events: + +ORM Query Events +----------------- + +.. automodule:: examples.extending_query + .. _examples_caching: Dogpile Caching diff --git a/doc/build/orm/extending.rst b/doc/build/orm/extending.rst index 31e543a85..04800ffc0 100644 --- a/doc/build/orm/extending.rst +++ b/doc/build/orm/extending.rst @@ -2,6 +2,11 @@ Events and Internals ==================== +The SQLAlchemy ORM as well as Core are extended generally through the use +of event hooks. Be sure to review the use of the event system in general +at :ref:`event_toplevel`. + + .. toctree:: :maxdepth: 2 diff --git a/doc/build/orm/extensions/baked.rst b/doc/build/orm/extensions/baked.rst index e8651dbaa..4751fef36 100644 --- a/doc/build/orm/extensions/baked.rst +++ b/doc/build/orm/extensions/baked.rst @@ -26,7 +26,7 @@ the caching of the SQL calls and result sets themselves is available in action taken by the user, using the system described at :ref:`sql_caching`. -.. note:: +.. deepalchemy:: The :mod:`sqlalchemy.ext.baked` extension is **not for beginners**. Using it correctly requires a good high level understanding of how SQLAlchemy, the diff --git a/doc/build/orm/internals.rst b/doc/build/orm/internals.rst index 08434d3bb..1a06b73b8 100644 --- a/doc/build/orm/internals.rst +++ b/doc/build/orm/internals.rst @@ -85,6 +85,10 @@ sections, are listed here. .. autodata:: sqlalchemy.orm.interfaces.NOT_EXTENSION +.. autofunction:: sqlalchemy.orm.loading.merge_result + +.. autofunction:: sqlalchemy.orm.loading.merge_frozen_result + .. autodata:: sqlalchemy.orm.interfaces.ONETOMANY diff --git a/doc/build/orm/mapped_attributes.rst b/doc/build/orm/mapped_attributes.rst index b8a0f89c9..a8711d2e6 100644 --- a/doc/build/orm/mapped_attributes.rst +++ b/doc/build/orm/mapped_attributes.rst @@ -1,3 +1,5 @@ +.. _mapping_attributes_toplevel: + .. currentmodule:: sqlalchemy.orm Changing Attribute Behavior diff --git a/doc/build/orm/query.rst b/doc/build/orm/query.rst index ed45a65e7..592004e86 100644 --- a/doc/build/orm/query.rst +++ b/doc/build/orm/query.rst @@ -18,16 +18,16 @@ The Query Object Following is the full interface for the :class:`_query.Query` object. -.. autoclass:: sqlalchemy.orm.query.Query +.. autoclass:: sqlalchemy.orm.Query :members: - .. automethod:: sqlalchemy.orm.query.Query.prefix_with + .. automethod:: sqlalchemy.orm.Query.prefix_with - .. automethod:: sqlalchemy.orm.query.Query.suffix_with + .. automethod:: sqlalchemy.orm.Query.suffix_with - .. automethod:: sqlalchemy.orm.query.Query.with_hint + .. automethod:: sqlalchemy.orm.Query.with_hint - .. automethod:: sqlalchemy.orm.query.Query.with_statement_hint + .. automethod:: sqlalchemy.orm.Query.with_statement_hint ORM-Specific Query Constructs ============================= diff --git a/doc/build/orm/session_api.rst b/doc/build/orm/session_api.rst index bad816967..ada035e95 100644 --- a/doc/build/orm/session_api.rst +++ b/doc/build/orm/session_api.rst @@ -35,6 +35,21 @@ Session and sessionmaker() arguments that will assist in determining amongst a set of database connections which one should be used to invoke this statement. + .. attribute:: local_execution_options + + Dictionary view of the execution options passed to the + :meth:`.Session.execute` method. This does not include options + that may be associated with the statement being invoked. + + .. seealso:: + + :attr:`_orm.ORMExecuteState.execution_options` + + .. attribute:: execution_options + The complete dictionary of current execution options. + + This is a merge of the statement level options with the + locally passed execution options. .. autoclass:: Session :members: diff --git a/doc/build/orm/session_basics.rst b/doc/build/orm/session_basics.rst index afa9ae23d..f63b7abd0 100644 --- a/doc/build/orm/session_basics.rst +++ b/doc/build/orm/session_basics.rst @@ -41,7 +41,6 @@ another :class:`.Session` when you want to work with them again, so that they can resume their normal task of representing database state. - Basics of Using a Session ========================= @@ -49,8 +48,8 @@ The most basic :class:`.Session` use patterns are presented here. .. _session_getting: -Instantiating -------------- +Opening and Closing a Session +----------------------------- The :class:`_orm.Session` may be constructed on its own or by using the :class:`_orm.sessionmaker` class. It typically is passed a single @@ -119,6 +118,8 @@ can be used by any number of functions and threads simultaenously. :class:`_orm.Session` +.. _session_querying_1x: + Querying (1.x Style) -------------------- @@ -159,6 +160,8 @@ The :class:`_query.Query` object is introduced in great detail in :ref:`query_api_toplevel` +.. _session_querying_20: + Querying (2.0 style) -------------------- @@ -166,7 +169,7 @@ Querying (2.0 style) SQLAlchemy 2.0 will standardize the production of SELECT statements across both Core and ORM by making direct use of the :class:`_sql.Select` object within the -ORM, removing the need for there to be a separate :class:`_orm.query.Query` +ORM, removing the need for there to be a separate :class:`_orm.Query` object. This mode of operation is available in SQLAlchemy 1.4 right now to support applications that will be migrating to 2.0. The :class:`_orm.Session` must be instantiated with the diff --git a/doc/build/orm/session_events.rst b/doc/build/orm/session_events.rst index 066fe7c24..0040f6f47 100644 --- a/doc/build/orm/session_events.rst +++ b/doc/build/orm/session_events.rst @@ -1,7 +1,7 @@ .. _session_events_toplevel: -Tracking Object and Session Changes with Events -=============================================== +Tracking queries, object and Session Changes with Events +========================================================= SQLAlchemy features an extensive :ref:`Event Listening <event_toplevel>` system used throughout the Core and ORM. Within the ORM, there are a @@ -12,6 +12,236 @@ as some older events that aren't as relevant as they once were. This section will attempt to introduce the major event hooks and when they might be used. +.. _session_execute_events: + +Execute Events +--------------- + +.. versionadded:: 1.4 The :class:`_orm.Session` now features a single + comprehensive hook designed to intercept all SELECT statements made + on behalf of the ORM as well as bulk UPDATE and DELETE statements. + This hook supersedes the previous :meth:`_orm.QueryEvents.before_compile` + event as well :meth:`_orm.QueryEvents.before_compile_update` and + :meth:`_orm.QueryEvents.before_compile_delete`. + +:class:`_orm.Session` features a comprehensive system by which all queries +invoked via the :meth:`_orm.Session.execute` method, which includes all +SELECT statements emitted by :class:`_orm.Query` as well as all SELECT +statements emitted on behalf of column and relationship loaders, may +be intercepted and modified. The system makes use of the +:meth:`_orm.SessionEvents.do_orm_execute` event hook as well as the +:class:`_orm.ORMExecuteState` object to represent the event state. + + +Basic Query Interception +^^^^^^^^^^^^^^^^^^^^^^^^^ + +:meth:`_orm.SessionEvents.do_orm_execute` is firstly useful for any kind of +interception of a query, which includes those emitted by +:class:`_orm.Query` with :term:`1.x style` as well as when an ORM-enabled +:term:`2.0 style` :func:`_sql.select`, +:func:`_sql.update` or :func:`_sql.delete` construct is delivered to +:meth:`_orm.Session.execute`. The :class:`_orm.ORMExecuteState` construct +provides accessors to allow modifications to statements, parameters, and +options:: + + Session = sessionmaker(engine, future=True) + + @event.listens_for(Session, "do_orm_execute") + def _do_orm_execute(orm_execute_state): + if orm_execute_state.is_select: + # add populate_existing for all SELECT statements + + orm_execute_state.update_execution_options(populate_existing=True) + + # check if the SELECT is against a certain entity and add an + # ORDER BY if so + col_descriptions = orm_execute_state.statement.column_descriptions + + if col_descriptions[0]['entity'] is MyEntity: + orm_execute_state.statement = statement.order_by(MyEntity.name) + +The above example illustrates some simple modifications to SELECT statements. +At this level, the :meth:`_orm.SessionEvents.do_orm_execute` event hook intends +to replace the previous use of the :meth:`_orm.QueryEvents.before_compile` event, +which was not fired off consistently for various kinds of loaders; additionally, +the :meth:`_orm.QueryEvents.before_compile` only applies to :term:`1.x style` +use with :class:`_orm.Query` and not with :term:`2.0 style` use of +:meth:`_orm.Session.execute`. + + +.. _do_orm_execute_global_criteria: + +Adding global WHERE / ON criteria +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +One of the most requested query-extension features is the ability to add WHERE +criteria to all occurrences of an entity in all queries. This is achievable +by making use of the :func:`_orm.with_loader_criteria` query option, which +may be used on its own, or is ideally suited to be used within the +:meth:`_orm.SessionEvents.do_orm_execute` event:: + + from sqlalchemy.orm import with_loader_criteria + + Session = sessionmaker(engine, future=True) + + @event.listens_for(Session, "do_orm_execute") + def _do_orm_execute(orm_execute_state): + if orm_execute_state.is_select: + orm_execute_state.statement = orm_execute_state.statement.options( + with_loader_criteria(MyEntity.public == True) + ) + +Above, an option is added to all SELECT statements that will limit all queries +against ``MyEntity`` to filter on ``public == True``. The criteria +will be applied to **all** loads of that class within the scope of the +immediate query as well as subsequent relationship loads, which includes +lazy loads, selectinloads, etc. + +For a series of classes that all feature some common column structure, +if the classes are composed using a :ref:`declarative mixin <declarative_mixins>`, +the mixin class itself may be used in conjunction with the :func:`_orm.with_loader_criteria` +option by making use of a Python lambda. The Python lambda will be invoked at +query compilation time against the specific entities which match the criteria. +Given a series of classes based on a mixin called ``HasTimestamp``:: + + import datetime + + class HasTimestamp(object): + timestamp = Column(DateTime, default=datetime.datetime.now) + + + class SomeEntity(HasTimestamp, Base): + __tablename__ = "some_entity" + id = Column(Integer, primary_key=True) + + class SomeOtherEntity(HasTimestamp, Base): + __tablename__ = "some_entity" + id = Column(Integer, primary_key=True) + + +The above classes ``SomeEntity`` and ``SomeOtherEntity`` will each have a column +``timestamp`` that defaults to the current date and time. An event may be used +to intercept all objects that extend from ``HasTimestamp`` and filter their +``timestamp`` column on a date that is no older than one month ago:: + + @event.listens_for(Session, "do_orm_execute") + def _do_orm_execute(orm_execute_state): + if orm_execute_state.is_select: + one_month_ago = datetime.datetime.today() - datetime.timedelta(months=1) + + orm_execute_state.statement = orm_execute_state.statement.options( + with_loader_criteria( + HasTimestamp, + lambda cls: cls.timestamp >= one_month_ago, + include_aliases=True + ) + ) + +.. seealso:: + + :ref:`examples_session_orm_events` - includes working examples of the + above :func:`_orm.with_loader_criteria` recipes. + +.. _do_orm_execute_re_executing: + +Re-Executing Statements +^^^^^^^^^^^^^^^^^^^^^^^ + +.. deepalchemy:: the statement re-execution feature involves a slightly + intricate recursive sequence, and is intended to solve the fairly hard + problem of being able to re-route the execution of a SQL statement into + various non-SQL contexts. The twin examples of "dogpile caching" and + "horizontal sharding", linked below, should be used as a guide for when this + rather advanced feature is appropriate to be used. + +The :class:`_orm.ORMExecuteState` is capable of controlling the execution of +the given statement; this includes the ability to either not invoke the +statement at all, allowing a pre-constructed result set retrieved from a cache to +be returned instead, as well as the ability to invoke the same statement +repeatedly with different state, such as invoking it against multiple database +connections and then merging the results together in memory. Both of these +advanced patterns are demonstrated in SQLAlchemy's example suite as detailed +below. + +When inside the :meth:`_orm.SessionEvents.do_orm_execute` event hook, the +:meth:`_orm.ORMExecuteState.invoke_statement` method may be used to invoke +the statement using a new nested invocation of :meth:`_orm.Session.execute`, +which will then preempt the subsequent handling of the current execution +in progress and instead return the :class:`_engine.Result` returned by the +inner execution. The event handlers thus far invoked for the +:meth:`_orm.SessionEvents.do_orm_execute` hook within this process will +be skipped within this nested call as well. + +The :meth:`_orm.ORMExecuteState.invoke_statement` method returns a +:class:`_engine.Result` object; this object then features the ability for it to +be "frozen" into a cacheable format and "unfrozen" into a new +:class:`_engine.Result` object, as well as for its data to be merged with +that of other :class:`_engine.Result` objects. + +E.g., using :meth:`_orm.SessionEvents.do_orm_execute` to implement a cache:: + + from sqlalchemy.orm import loading + + cache = {} + + @event.listens_for(Session, "do_orm_execute") + def _do_orm_execute(orm_execute_state): + if "my_cache_key" in orm_execute_state.execution_options: + cache_key = orm_execute_state.execution_options["my_cache_key"] + + if cache_key in cache: + frozen_result = cache[cache_key] + else: + frozen_result = orm_execute_state.invoke_statement().freeze() + cache[cache_key] = frozen_result + + return loading.merge_frozen_result( + orm_execute_state.session, + orm_execute_state.statement, + frozen_result, + load=False, + ) + +With the above hook in place, an example of using the cache would look like:: + + stmt = select(User).where(User.name == 'sandy').execution_options(my_cache_key="key_sandy") + + result = session.execute(stmt) + +Above, a custom execution option is passed to +:meth:`_sql.Select.execution_options` in order to establish a "cache key" that +will then be intercepted by the :meth:`_orm.SessionEvents.do_orm_execute` hook. This +cache key is then matched to a :class:`_engine.FrozenResult` object that may be +present in the cache, and if present, the object is re-used. The recipe makes +use of the :meth:`_engine.Result.freeze` method to "freeze" a +:class:`_engine.Result` object, which above will contain ORM results, such that +it can be stored in a cache and used multiple times. In order to return a live +result from the "frozen" result, the :func:`_orm.loading.merge_frozen_result` +function is used to merge the "frozen" data from the result object into the +current session. + +The above example is implemented as a complete example in :ref:`examples_caching`. + +The :meth:`_orm.ORMExecuteState.invoke_statement` method may also be called +multiple times, passing along different information to the +:paramref:`_orm.ORMExecuteState.invoke_statement.bind_arguments` parameter such +that the :class:`_orm.Session` will make use of different +:class:`_engine.Engine` objects each time. This will return a different +:class:`_engine.Result` object each time; these results can be merged together +using the :meth:`_engine.Result.merge` method. This is the technique employed +by the :ref:`horizontal_sharding_toplevel` extension; see the source code to +familiarize. + +.. seealso:: + + :ref:`examples_caching` + + :ref:`examples_sharding` + + + + .. _session_persistence_events: Persistence Events diff --git a/doc/build/orm/session_transaction.rst b/doc/build/orm/session_transaction.rst index c5f47697b..6bef0cee6 100644 --- a/doc/build/orm/session_transaction.rst +++ b/doc/build/orm/session_transaction.rst @@ -654,9 +654,9 @@ entire database interaction is rolled back. .. versionchanged:: 1.4 This section introduces a new version of the "join into an external transaction" recipe that will work equally well - for both "future" and "non-future" engines and sessions. The recipe - here from previous versions such as 1.3 will also continue to work for - "non-future" engines and sessions. + for both :term:`2.0 style` and :term:`1.x style`engines and sessions. + The recipe here from previous versions such as 1.3 will also continue to + work for 1.x engines and sessions. The recipe works by establishing a :class:`_engine.Connection` within a diff --git a/doc/build/orm/tutorial.rst b/doc/build/orm/tutorial.rst index a6ef57bed..dbad10b6f 100644 --- a/doc/build/orm/tutorial.rst +++ b/doc/build/orm/tutorial.rst @@ -1375,9 +1375,10 @@ and ``Address`` because there's only one foreign key between them. If there were no foreign keys, or several, :meth:`_query.Query.join` works better when one of the following forms are used:: - query.join(Address, User.id==Address.user_id) # explicit condition - query.join(User.addresses) # specify relationship from left to right - query.join(Address, User.addresses) # same, with explicit target + query.join(Address, User.id==Address.user_id) # explicit condition + query.join(User.addresses) # specify relationship from left to right + query.join(Address, User.addresses) # same, with explicit target + query.join(User.addresses.and_(Address.name != 'foo')) # use relationship + additional ON criteria As you would expect, the same idea is used for "outer" joins, using the :meth:`_query.Query.outerjoin` function:: diff --git a/examples/dogpile_caching/__init__.py b/examples/dogpile_caching/__init__.py index de4a339a7..e5ea52ea0 100644 --- a/examples/dogpile_caching/__init__.py +++ b/examples/dogpile_caching/__init__.py @@ -1,31 +1,37 @@ """ Illustrates how to embed -`dogpile.cache <https://dogpilecache.readthedocs.io/>`_ -functionality within the :class:`.Query` object, allowing full cache control +`dogpile.cache <https://dogpilecache.sqlalchemy.org/>`_ +functionality with ORM queries, allowing full cache control as well as the ability to pull "lazy loaded" attributes from long term cache. In this demo, the following techniques are illustrated: -* Using custom subclasses of :class:`.Query` -* Basic technique of circumventing Query to pull from a +* Using the :meth:`_orm.SessionEvents.do_orm_execute` event hook +* Basic technique of circumventing :meth:`_orm.Session.execute` to pull from a custom cache source instead of the database. * Rudimental caching with dogpile.cache, using "regions" which allow global control over a fixed set of configurations. -* Using custom :class:`.MapperOption` objects to configure options on - a Query, including the ability to invoke the options - deep within an object graph when lazy loads occur. +* Using custom :class:`.UserDefinedOption` objects to configure options in + a statement object. + +.. seealso:: + + :ref:`do_orm_execute_re_executing` - includes a general example of the + technique presented here. E.g.:: # query for Person objects, specifying cache - q = Session.query(Person).options(FromCache("default")) + stmt = select(Person).options(FromCache("default")) # specify that each Person's "addresses" collection comes from # cache too - q = q.options(RelationshipCache(Person.addresses, "default")) + stmt = stmt.options(RelationshipCache(Person.addresses, "default")) + + # execute and results + result = session.execute(stmt) - # query - print(q.all()) + print(result.scalars.all()) To run, both SQLAlchemy and dogpile.cache must be installed or on the current PYTHONPATH. The demo will create a local diff --git a/examples/extending_query/__init__.py b/examples/extending_query/__init__.py new file mode 100644 index 000000000..b939c268c --- /dev/null +++ b/examples/extending_query/__init__.py @@ -0,0 +1,17 @@ +""" +Recipes which illustrate augmentation of ORM SELECT behavior as used by +:meth:`_orm.Session.execute` with :term:`2.0 style` use of +:func:`_sql.select`, as well as the :term:`1.x style` :class:`_orm.Query` +object. + +Examples include demonstrations of the :func:`_orm.with_loader_criteria` +option as well as the :meth:`_orm.SessionEvents.do_orm_execute` hook. + +As of SQLAlchemy 1.4, the :class:`_orm.Query` construct is unified +with the :class:`_expression.Select` construct, so that these two objects +are mostly the same. + + +.. autosource:: + +""" # noqa diff --git a/examples/extending_query/filter_public.py b/examples/extending_query/filter_public.py new file mode 100644 index 000000000..b6d4ec0db --- /dev/null +++ b/examples/extending_query/filter_public.py @@ -0,0 +1,200 @@ +"""Illustrates a global criteria applied to entities of a particular type. + +The example here is the "public" flag, a simple boolean that indicates +the rows are part of a publicly viewable subcategory. Rows that do not +include this flag are not shown unless a special option is passed to the +query. + +Uses for this kind of recipe include tables that have "soft deleted" rows +marked as "deleted" that should be skipped, rows that have access control rules +that should be applied on a per-request basis, etc. + + +""" + +from sqlalchemy import Boolean +from sqlalchemy import Column +from sqlalchemy import event +from sqlalchemy import orm +from sqlalchemy import true +from sqlalchemy.orm import Session + + +@event.listens_for(Session, "do_orm_execute") +def _add_filtering_criteria(execute_state): + """Intercept all ORM queries. Add a with_loader_criteria option to all + of them. + + This option applies to SELECT queries and adds a global WHERE criteria + (or as appropriate ON CLAUSE criteria for join targets) + to all objects of a certain class or superclass. + + """ + + # the with_loader_criteria automatically applies itself to + # relationship loads as well including lazy loads. So if this is + # a relationship load, assume the option was set up from the top level + # query. + + if ( + not execute_state.is_relationship_load + and not execute_state.execution_options.get("include_private", False) + ): + execute_state.statement = execute_state.statement.options( + orm.with_loader_criteria( + HasPrivate, + lambda cls: cls.public == true(), + include_aliases=True, + ) + ) + + +class HasPrivate(object): + """Mixin that identifies a class as having private entities""" + + public = Column(Boolean, nullable=False) + + +if __name__ == "__main__": + + from sqlalchemy import Integer, Column, String, ForeignKey, Boolean + from sqlalchemy import select + from sqlalchemy import create_engine + from sqlalchemy.orm import relationship, sessionmaker + from sqlalchemy.ext.declarative import declarative_base + + Base = declarative_base() + + class User(HasPrivate, Base): + __tablename__ = "user" + + id = Column(Integer, primary_key=True) + name = Column(String) + addresses = relationship("Address", back_populates="user") + + class Address(HasPrivate, Base): + __tablename__ = "address" + + id = Column(Integer, primary_key=True) + email = Column(String) + user_id = Column(Integer, ForeignKey("user.id")) + + user = relationship("User", back_populates="addresses") + + engine = create_engine("sqlite://", echo=True) + Base.metadata.create_all(engine) + + Session = sessionmaker(bind=engine, future=True) + + sess = Session() + + sess.add_all( + [ + User( + name="u1", + public=True, + addresses=[ + Address(email="u1a1", public=True), + Address(email="u1a2", public=True), + ], + ), + User( + name="u2", + public=True, + addresses=[ + Address(email="u2a1", public=False), + Address(email="u2a2", public=True), + ], + ), + User( + name="u3", + public=False, + addresses=[ + Address(email="u3a1", public=False), + Address(email="u3a2", public=False), + ], + ), + User( + name="u4", + public=False, + addresses=[ + Address(email="u4a1", public=False), + Address(email="u4a2", public=True), + ], + ), + User( + name="u5", + public=True, + addresses=[ + Address(email="u5a1", public=True), + Address(email="u5a2", public=False), + ], + ), + ] + ) + + sess.commit() + + # now querying Address or User objects only gives us the public ones + for u1 in sess.query(User).options(orm.selectinload(User.addresses)): + assert u1.public + + # the addresses collection will also be "public only", which works + # for all relationship loaders including joinedload + for address in u1.addresses: + assert address.public + + # works for columns too + cols = ( + sess.query(User.id, Address.id) + .join(User.addresses) + .order_by(User.id, Address.id) + .all() + ) + assert cols == [(1, 1), (1, 2), (2, 4), (5, 9)] + + cols = ( + sess.query(User.id, Address.id) + .join(User.addresses) + .order_by(User.id, Address.id) + .execution_options(include_private=True) + .all() + ) + assert cols == [ + (1, 1), + (1, 2), + (2, 3), + (2, 4), + (3, 5), + (3, 6), + (4, 7), + (4, 8), + (5, 9), + (5, 10), + ] + + # count all public addresses + assert sess.query(Address).count() == 5 + + # count all addresses public and private + assert ( + sess.query(Address).execution_options(include_private=True).count() + == 10 + ) + + # load an Address that is public, but its parent User is private + # (2.0 style query) + a1 = sess.execute(select(Address).filter_by(email="u4a2")).scalar() + + # assuming the User isn't already in the Session, it returns None + assert a1.user is None + + # however, if that user is present in the session, then a many-to-one + # does a simple get() and it will be present + sess.expire(a1, ["user"]) + u1 = sess.execute( + select(User) + .filter_by(name="u4") + .execution_options(include_private=True) + ).scalar() + assert a1.user is u1 diff --git a/examples/extending_query/temporal_range.py b/examples/extending_query/temporal_range.py new file mode 100644 index 000000000..370687746 --- /dev/null +++ b/examples/extending_query/temporal_range.py @@ -0,0 +1,136 @@ +"""Illustrates a custom per-query criteria that will be applied +to selected entities. + + +""" + +import datetime + +from sqlalchemy import Column +from sqlalchemy import DateTime +from sqlalchemy import orm + + +class HasTemporal(object): + """Mixin that identifies a class as having a timestamp column""" + + timestamp = Column( + DateTime, default=datetime.datetime.utcnow, nullable=False + ) + + +def temporal_range(range_lower, range_upper): + return orm.with_loader_criteria( + HasTemporal, + lambda cls: cls.timestamp.between(range_lower, range_upper), + include_aliases=True, + ) + + +if __name__ == "__main__": + + from sqlalchemy import Integer, Column, ForeignKey + from sqlalchemy import select + from sqlalchemy import create_engine + from sqlalchemy.orm import relationship, sessionmaker, selectinload + from sqlalchemy.ext.declarative import declarative_base + + Base = declarative_base() + + class Parent(HasTemporal, Base): + __tablename__ = "parent" + id = Column(Integer, primary_key=True) + children = relationship("Child") + + class Child(HasTemporal, Base): + __tablename__ = "child" + id = Column(Integer, primary_key=True) + parent_id = Column(Integer, ForeignKey("parent.id"), nullable=False) + + engine = create_engine("sqlite://", echo=True) + Base.metadata.create_all(engine) + + Session = sessionmaker(bind=engine, future=True) + + sess = Session() + + c1, c2, c3, c4, c5 = [ + Child(timestamp=datetime.datetime(2009, 10, 15, 12, 00, 00)), + Child(timestamp=datetime.datetime(2009, 10, 17, 12, 00, 00)), + Child(timestamp=datetime.datetime(2009, 10, 20, 12, 00, 00)), + Child(timestamp=datetime.datetime(2009, 10, 12, 12, 00, 00)), + Child(timestamp=datetime.datetime(2009, 10, 17, 12, 00, 00)), + ] + + p1 = Parent( + timestamp=datetime.datetime(2009, 10, 15, 12, 00, 00), + children=[c1, c2, c3], + ) + p2 = Parent( + timestamp=datetime.datetime(2009, 10, 17, 12, 00, 00), + children=[c4, c5], + ) + + sess.add_all([p1, p2]) + sess.commit() + + # use populate_existing() to ensure the range option takes + # place for elements already in the identity map + + parents = ( + sess.query(Parent) + .populate_existing() + .options( + temporal_range( + datetime.datetime(2009, 10, 16, 12, 00, 00), + datetime.datetime(2009, 10, 18, 12, 00, 00), + ) + ) + .all() + ) + + assert parents[0] == p2 + assert parents[0].children == [c5] + + sess.expire_all() + + # try it with eager load + parents = ( + sess.query(Parent) + .options( + temporal_range( + datetime.datetime(2009, 10, 16, 12, 00, 00), + datetime.datetime(2009, 10, 18, 12, 00, 00), + ) + ) + .options(selectinload(Parent.children)) + .all() + ) + + assert parents[0] == p2 + assert parents[0].children == [c5] + + sess.expire_all() + + # illustrate a 2.0 style query + print("------------------") + parents = ( + sess.execute( + select(Parent) + .execution_options(populate_existing=True) + .options( + temporal_range( + datetime.datetime(2009, 10, 15, 11, 00, 00), + datetime.datetime(2009, 10, 18, 12, 00, 00), + ) + ) + .join(Parent.children) + .filter(Child.id == 2) + ) + .scalars() + .all() + ) + + assert parents[0] == p1 + print("-------------------") + assert parents[0].children == [c1, c2] diff --git a/examples/sharding/__init__.py b/examples/sharding/__init__.py index eb8e10686..90cf6cc6c 100644 --- a/examples/sharding/__init__.py +++ b/examples/sharding/__init__.py @@ -4,21 +4,28 @@ databases. The basic components of a "sharded" mapping are: -* multiple databases, each assigned a 'shard id' +* multiple :class:`_engine.Engine` instances, each assigned a "shard id". + These :class:`_engine.Engine` instances may refer to different databases, + or different schemas / accounts within the same database, or they can + even be differentiated only by options that will cause them to access + different schemas or tables when used. + * a function which can return a single shard id, given an instance to be saved; this is called "shard_chooser" + * a function which can return a list of shard ids which apply to a particular instance identifier; this is called "id_chooser".If it returns all shard ids, all shards will be searched. + * a function which can return a list of shard ids to try, given a particular Query ("query_chooser"). If it returns all shard ids, all shards will be queried and the results joined together. -In this example, four sqlite databases will store information about weather -data on a database-per-continent basis. We provide example shard_chooser, -id_chooser and query_chooser functions. The query_chooser illustrates -inspection of the SQL expression element in order to attempt to determine a -single shard being requested. +In these examples, different kinds of shards are used against the same basic +example which accommodates weather data on a per-continent basis. We provide +example shard_chooser, id_chooser and query_chooser functions. The +query_chooser illustrates inspection of the SQL expression element in order to +attempt to determine a single shard being requested. The construction of generic sharding routines is an ambitious approach to the issue of organizing instances among multiple databases. For a diff --git a/examples/sharding/attribute_shard.py b/examples/sharding/separate_databases.py index 7b8f87d90..95f12fa72 100644 --- a/examples/sharding/attribute_shard.py +++ b/examples/sharding/separate_databases.py @@ -1,3 +1,5 @@ +"""Illustrates sharding using distinct SQLite databases.""" + import datetime from sqlalchemy import Column @@ -7,6 +9,7 @@ from sqlalchemy import Float from sqlalchemy import ForeignKey from sqlalchemy import inspect from sqlalchemy import Integer +from sqlalchemy import select from sqlalchemy import String from sqlalchemy import Table from sqlalchemy.ext.declarative import declarative_base @@ -26,15 +29,15 @@ db4 = create_engine("sqlite://", echo=echo) # create session function. this binds the shard ids # to databases within a ShardedSession and returns it. -create_session = sessionmaker(class_=ShardedSession) - -create_session.configure( +Session = sessionmaker( + class_=ShardedSession, + future=True, shards={ "north_america": db1, "asia": db2, "europe": db3, "south_america": db4, - } + }, ) @@ -54,7 +57,7 @@ ids = Table("ids", Base.metadata, Column("nextid", Integer, nullable=False)) def id_generator(ctx): # in reality, might want to use a separate transaction for this. with db1.connect() as conn: - nextid = conn.scalar(ids.select(for_update=True)) + nextid = conn.scalar(ids.select().with_for_update()) conn.execute(ids.update(values={ids.c.nextid: ids.c.nextid + 1})) return nextid @@ -99,11 +102,11 @@ class Report(Base): # create tables for db in (db1, db2, db3, db4): - Base.metadata.drop_all(db) Base.metadata.create_all(db) # establish initial "id" in db1 -db1.execute(ids.insert(), nextid=1) +with db1.begin() as conn: + conn.execute(ids.insert(), nextid=1) # step 5. define sharding functions. @@ -199,18 +202,7 @@ def _get_query_comparisons(query): def visit_bindparam(bind): # visit a bind parameter. - # check in _params for it first - if bind.key in query._params: - value = query._params[bind.key] - elif bind.callable: - # some ORM functions (lazy loading) - # place the bind's value as a - # callable for deferred evaluation. - value = bind.callable() - else: - # just use .value - value = bind.value - + value = bind.effective_value binds[bind] = value def visit_column(column): @@ -230,9 +222,9 @@ def _get_query_comparisons(query): # here we will traverse through the query's criterion, searching # for SQL constructs. We will place simple column comparisons # into a list. - if query._criterion is not None: - visitors.traverse_depthfirst( - query._criterion, + if query.whereclause is not None: + visitors.traverse( + query.whereclause, {}, { "bindparam": visit_bindparam, @@ -244,7 +236,7 @@ def _get_query_comparisons(query): # further configure create_session to use these functions -create_session.configure( +Session.configure( shard_chooser=shard_chooser, id_chooser=id_chooser, query_chooser=query_chooser, @@ -264,36 +256,47 @@ tokyo.reports.append(Report(80.0)) newyork.reports.append(Report(75)) quito.reports.append(Report(85)) -sess = create_session() +with Session() as sess: -sess.add_all([tokyo, newyork, toronto, london, dublin, brasilia, quito]) + sess.add_all([tokyo, newyork, toronto, london, dublin, brasilia, quito]) -sess.commit() + sess.commit() -t = sess.query(WeatherLocation).get(tokyo.id) -assert t.city == tokyo.city -assert t.reports[0].temperature == 80.0 + t = sess.get(WeatherLocation, tokyo.id) + assert t.city == tokyo.city + assert t.reports[0].temperature == 80.0 -north_american_cities = sess.query(WeatherLocation).filter( - WeatherLocation.continent == "North America" -) -assert {c.city for c in north_american_cities} == {"New York", "Toronto"} + north_american_cities = sess.execute( + select(WeatherLocation).filter( + WeatherLocation.continent == "North America" + ) + ).scalars() -asia_and_europe = sess.query(WeatherLocation).filter( - WeatherLocation.continent.in_(["Europe", "Asia"]) -) -assert {c.city for c in asia_and_europe} == {"Tokyo", "London", "Dublin"} + assert {c.city for c in north_american_cities} == {"New York", "Toronto"} + + asia_and_europe = sess.execute( + select(WeatherLocation).filter( + WeatherLocation.continent.in_(["Europe", "Asia"]) + ) + ).scalars() -# the Report class uses a simple integer primary key. So across two databases, -# a primary key will be repeated. The "identity_token" tracks in memory -# that these two identical primary keys are local to different databases. -newyork_report = newyork.reports[0] -tokyo_report = tokyo.reports[0] + assert {c.city for c in asia_and_europe} == {"Tokyo", "London", "Dublin"} -assert inspect(newyork_report).identity_key == (Report, (1,), "north_america") -assert inspect(tokyo_report).identity_key == (Report, (1,), "asia") + # the Report class uses a simple integer primary key. So across two + # databases, a primary key will be repeated. The "identity_token" tracks + # in memory that these two identical primary keys are local to different + # databases. + newyork_report = newyork.reports[0] + tokyo_report = tokyo.reports[0] + + assert inspect(newyork_report).identity_key == ( + Report, + (1,), + "north_america", + ) + assert inspect(tokyo_report).identity_key == (Report, (1,), "asia") -# the token representing the originating shard is also available directly + # the token representing the originating shard is also available directly -assert inspect(newyork_report).identity_token == "north_america" -assert inspect(tokyo_report).identity_token == "asia" + assert inspect(newyork_report).identity_token == "north_america" + assert inspect(tokyo_report).identity_token == "asia" diff --git a/examples/sharding/separate_tables.py b/examples/sharding/separate_tables.py new file mode 100644 index 000000000..f24dde288 --- /dev/null +++ b/examples/sharding/separate_tables.py @@ -0,0 +1,316 @@ +"""Illustrates sharding using a single SQLite database, that will however +have multiple tables using a naming convention.""" + +import datetime + +from sqlalchemy import Column +from sqlalchemy import create_engine +from sqlalchemy import DateTime +from sqlalchemy import event +from sqlalchemy import Float +from sqlalchemy import ForeignKey +from sqlalchemy import inspect +from sqlalchemy import Integer +from sqlalchemy import select +from sqlalchemy import String +from sqlalchemy import Table +from sqlalchemy.ext.declarative import declarative_base +from sqlalchemy.ext.horizontal_shard import ShardedSession +from sqlalchemy.orm import relationship +from sqlalchemy.orm import sessionmaker +from sqlalchemy.sql import operators +from sqlalchemy.sql import visitors + + +echo = True +engine = create_engine("sqlite://", echo=echo) + +db1 = engine.execution_options(table_prefix="north_america") +db2 = engine.execution_options(table_prefix="asia") +db3 = engine.execution_options(table_prefix="europe") +db4 = engine.execution_options(table_prefix="south_america") + + +@event.listens_for(engine, "before_cursor_execute", retval=True) +def before_cursor_execute( + conn, cursor, statement, parameters, context, executemany +): + table_prefix = context.execution_options.get("table_prefix", None) + if table_prefix: + statement = statement.replace("_prefix_", table_prefix) + return statement, parameters + + +# create session function. this binds the shard ids +# to databases within a ShardedSession and returns it. +Session = sessionmaker( + class_=ShardedSession, + future=True, + shards={ + "north_america": db1, + "asia": db2, + "europe": db3, + "south_america": db4, + }, +) + + +# mappings and tables +Base = declarative_base() + +# we need a way to create identifiers which are unique across all databases. +# one easy way would be to just use a composite primary key, where one value +# is the shard id. but here, we'll show something more "generic", an id +# generation function. we'll use a simplistic "id table" stored in database +# #1. Any other method will do just as well; UUID, hilo, application-specific, +# etc. + +ids = Table("ids", Base.metadata, Column("nextid", Integer, nullable=False)) + + +def id_generator(ctx): + # in reality, might want to use a separate transaction for this. + with engine.connect() as conn: + nextid = conn.scalar(ids.select().with_for_update()) + conn.execute(ids.update(values={ids.c.nextid: ids.c.nextid + 1})) + return nextid + + +# table setup. we'll store a lead table of continents/cities, and a secondary +# table storing locations. a particular row will be placed in the database +# whose shard id corresponds to the 'continent'. in this setup, secondary rows +# in 'weather_reports' will be placed in the same DB as that of the parent, but +# this can be changed if you're willing to write more complex sharding +# functions. + + +class WeatherLocation(Base): + __tablename__ = "_prefix__weather_locations" + + id = Column(Integer, primary_key=True, default=id_generator) + continent = Column(String(30), nullable=False) + city = Column(String(50), nullable=False) + + reports = relationship("Report", backref="location") + + def __init__(self, continent, city): + self.continent = continent + self.city = city + + +class Report(Base): + __tablename__ = "_prefix__weather_reports" + + id = Column(Integer, primary_key=True) + location_id = Column( + "location_id", Integer, ForeignKey("_prefix__weather_locations.id") + ) + temperature = Column("temperature", Float) + report_time = Column( + "report_time", DateTime, default=datetime.datetime.now + ) + + def __init__(self, temperature): + self.temperature = temperature + + +# create tables +for db in (db1, db2, db3, db4): + Base.metadata.create_all(db) + +# establish initial "id" in db1 +with db1.begin() as conn: + conn.execute(ids.insert(), nextid=1) + + +# step 5. define sharding functions. + +# we'll use a straight mapping of a particular set of "country" +# attributes to shard id. +shard_lookup = { + "North America": "north_america", + "Asia": "asia", + "Europe": "europe", + "South America": "south_america", +} + + +def shard_chooser(mapper, instance, clause=None): + """shard chooser. + + looks at the given instance and returns a shard id + note that we need to define conditions for + the WeatherLocation class, as well as our secondary Report class which will + point back to its WeatherLocation via its 'location' attribute. + + """ + if isinstance(instance, WeatherLocation): + return shard_lookup[instance.continent] + else: + return shard_chooser(mapper, instance.location) + + +def id_chooser(query, ident): + """id chooser. + + given a primary key, returns a list of shards + to search. here, we don't have any particular information from a + pk so we just return all shard ids. often, you'd want to do some + kind of round-robin strategy here so that requests are evenly + distributed among DBs. + + """ + if query.lazy_loaded_from: + # if we are in a lazy load, we can look at the parent object + # and limit our search to that same shard, assuming that's how we've + # set things up. + return [query.lazy_loaded_from.identity_token] + else: + return ["north_america", "asia", "europe", "south_america"] + + +def query_chooser(query): + """query chooser. + + this also returns a list of shard ids, which can + just be all of them. but here we'll search into the Query in order + to try to narrow down the list of shards to query. + + """ + ids = [] + + # we'll grab continent names as we find them + # and convert to shard ids + for column, operator, value in _get_query_comparisons(query): + # "shares_lineage()" returns True if both columns refer to the same + # statement column, adjusting for any annotations present. + # (an annotation is an internal clone of a Column object + # and occur when using ORM-mapped attributes like + # "WeatherLocation.continent"). A simpler comparison, though less + # accurate, would be "column.key == 'continent'". + if column.shares_lineage(WeatherLocation.__table__.c.continent): + if operator == operators.eq: + ids.append(shard_lookup[value]) + elif operator == operators.in_op: + ids.extend(shard_lookup[v] for v in value) + + if len(ids) == 0: + return ["north_america", "asia", "europe", "south_america"] + else: + return ids + + +def _get_query_comparisons(query): + """Search an orm.Query object for binary expressions. + + Returns expressions which match a Column against one or more + literal values as a list of tuples of the form + (column, operator, values). "values" is a single value + or tuple of values depending on the operator. + + """ + binds = {} + clauses = set() + comparisons = [] + + def visit_bindparam(bind): + # visit a bind parameter. + + value = bind.effective_value + binds[bind] = value + + def visit_column(column): + clauses.add(column) + + def visit_binary(binary): + if binary.left in clauses and binary.right in binds: + comparisons.append( + (binary.left, binary.operator, binds[binary.right]) + ) + + elif binary.left in binds and binary.right in clauses: + comparisons.append( + (binary.right, binary.operator, binds[binary.left]) + ) + + # here we will traverse through the query's criterion, searching + # for SQL constructs. We will place simple column comparisons + # into a list. + if query.whereclause is not None: + visitors.traverse( + query.whereclause, + {}, + { + "bindparam": visit_bindparam, + "binary": visit_binary, + "column": visit_column, + }, + ) + return comparisons + + +# further configure create_session to use these functions +Session.configure( + shard_chooser=shard_chooser, + id_chooser=id_chooser, + query_chooser=query_chooser, +) + +# save and load objects! + +tokyo = WeatherLocation("Asia", "Tokyo") +newyork = WeatherLocation("North America", "New York") +toronto = WeatherLocation("North America", "Toronto") +london = WeatherLocation("Europe", "London") +dublin = WeatherLocation("Europe", "Dublin") +brasilia = WeatherLocation("South America", "Brasila") +quito = WeatherLocation("South America", "Quito") + +tokyo.reports.append(Report(80.0)) +newyork.reports.append(Report(75)) +quito.reports.append(Report(85)) + +with Session() as sess: + + sess.add_all([tokyo, newyork, toronto, london, dublin, brasilia, quito]) + + sess.commit() + + t = sess.get(WeatherLocation, tokyo.id) + assert t.city == tokyo.city + assert t.reports[0].temperature == 80.0 + + north_american_cities = sess.execute( + select(WeatherLocation).filter( + WeatherLocation.continent == "North America" + ) + ).scalars() + + assert {c.city for c in north_american_cities} == {"New York", "Toronto"} + + asia_and_europe = sess.execute( + select(WeatherLocation).filter( + WeatherLocation.continent.in_(["Europe", "Asia"]) + ) + ).scalars() + + assert {c.city for c in asia_and_europe} == {"Tokyo", "London", "Dublin"} + + # the Report class uses a simple integer primary key. So across two + # databases, a primary key will be repeated. The "identity_token" tracks + # in memory that these two identical primary keys are local to different + # databases. + newyork_report = newyork.reports[0] + tokyo_report = tokyo.reports[0] + + assert inspect(newyork_report).identity_key == ( + Report, + (1,), + "north_america", + ) + assert inspect(tokyo_report).identity_key == (Report, (1,), "asia") + + # the token representing the originating shard is also available directly + + assert inspect(newyork_report).identity_token == "north_america" + assert inspect(tokyo_report).identity_token == "asia" diff --git a/lib/sqlalchemy/engine/interfaces.py b/lib/sqlalchemy/engine/interfaces.py index f925df6c5..1baef29af 100644 --- a/lib/sqlalchemy/engine/interfaces.py +++ b/lib/sqlalchemy/engine/interfaces.py @@ -953,7 +953,7 @@ class CreateEnginePlugin(object): engine = create_engine( "mysql+pymysql://scott:tiger@localhost/test?plugin=myplugin") - Alternatively, the :paramref:`.create_engine.plugins" argument may be + Alternatively, the :paramref:`_sa.create_engine.plugins" argument may be passed as a list to :func:`_sa.create_engine`:: engine = create_engine( diff --git a/lib/sqlalchemy/engine/result.py b/lib/sqlalchemy/engine/result.py index db546380e..621cba674 100644 --- a/lib/sqlalchemy/engine/result.py +++ b/lib/sqlalchemy/engine/result.py @@ -649,6 +649,11 @@ class Result(InPlaceGenerative): it will produce a new :class:`_engine.Result` object each time against its stored set of rows. + .. seealso:: + + :ref:`do_orm_execute_re_executing` - example usage within the + ORM to implement a result-set cache. + """ return FrozenResult(self) @@ -1173,12 +1178,28 @@ class FrozenResult(object): frozen = result.freeze() - r1 = frozen() - r2 = frozen() + unfrozen_result_one = frozen() + + for row in unfrozen_result_one: + print(row) + + unfrozen_result_two = frozen() + rows = unfrozen_result_two.all() + # ... etc .. versionadded:: 1.4 + .. seealso:: + + .. seealso:: + + :ref:`do_orm_execute_re_executing` - example usage within the + ORM to implement a result-set cache. + + :func:`_orm.loading.merge_frozen_result` - ORM function to merge + a frozen result back into a :class:`_orm.Session`. + """ def __init__(self, result): diff --git a/lib/sqlalchemy/event/api.py b/lib/sqlalchemy/event/api.py index 6bd63ceca..b36c448ce 100644 --- a/lib/sqlalchemy/event/api.py +++ b/lib/sqlalchemy/event/api.py @@ -61,9 +61,6 @@ def listen(target, identifier, fn, *args, **kw): event.listen(Mapper, "before_configure", on_config, once=True) - .. versionadded:: 0.9.4 Added ``once=True`` to :func:`.event.listen` - and :func:`.event.listens_for`. - .. warning:: The ``once`` argument does not imply automatic de-registration of the listener function after it has been invoked a first time; a listener entry will remain associated with the target object. @@ -128,9 +125,6 @@ def listens_for(target, identifier, *args, **kw): do_config() - .. versionadded:: 0.9.4 Added ``once=True`` to :func:`.event.listen` - and :func:`.event.listens_for`. - .. warning:: The ``once`` argument does not imply automatic de-registration of the listener function after it has been invoked a first time; a listener entry will remain associated with the target object. @@ -173,8 +167,6 @@ def remove(target, identifier, fn): propagated to subclasses of ``SomeMappedClass``; the :func:`.remove` function will revert all of these operations. - .. versionadded:: 0.9.0 - .. note:: The :func:`.remove` function cannot be called at the same time @@ -206,8 +198,6 @@ def remove(target, identifier, fn): def contains(target, identifier, fn): """Return True if the given target/ident/fn is set up to listen. - .. versionadded:: 0.9.0 - """ return _event_key(target, identifier, fn).contains() diff --git a/lib/sqlalchemy/event/legacy.py b/lib/sqlalchemy/event/legacy.py index f63c7d101..14115d377 100644 --- a/lib/sqlalchemy/event/legacy.py +++ b/lib/sqlalchemy/event/legacy.py @@ -96,8 +96,7 @@ def _standard_listen_example(dispatch_collection, sample_target, fn): else: current_since = None text = ( - "from sqlalchemy import event\n\n" - "# standard decorator style%(current_since)s\n" + "from sqlalchemy import event\n\n\n" "@event.listens_for(%(sample_target)s, '%(event_name)s')\n" "def receive_%(event_name)s(" "%(named_event_arguments)s%(has_kw_arguments)s):\n" @@ -105,17 +104,6 @@ def _standard_listen_example(dispatch_collection, sample_target, fn): "\n # ... (event handling logic) ...\n" ) - if len(dispatch_collection.arg_names) > 3: - text += ( - "\n# named argument style (new in 0.9)\n" - "@event.listens_for(" - "%(sample_target)s, '%(event_name)s', named=True)\n" - "def receive_%(event_name)s(**kw):\n" - " \"listen for the '%(event_name)s' event\"\n" - "%(example_kw_arg)s\n" - "\n # ... (event handling logic) ...\n" - ) - text %= { "current_since": " (arguments as of %s)" % current_since if current_since diff --git a/lib/sqlalchemy/orm/__init__.py b/lib/sqlalchemy/orm/__init__.py index 458103838..199ae11e5 100644 --- a/lib/sqlalchemy/orm/__init__.py +++ b/lib/sqlalchemy/orm/__init__.py @@ -272,6 +272,14 @@ def eagerload(*args, **kwargs): contains_alias = public_factory(AliasOption, ".orm.contains_alias") +if True: + from .events import AttributeEvents # noqa + from .events import MapperEvents # noqa + from .events import InstanceEvents # noqa + from .events import InstrumentationEvents # noqa + from .events import QueryEvents # noqa + from .events import SessionEvents # noqa + def __go(lcls): global __all__ diff --git a/lib/sqlalchemy/orm/events.py b/lib/sqlalchemy/orm/events.py index 217aa76c7..ec907c63e 100644 --- a/lib/sqlalchemy/orm/events.py +++ b/lib/sqlalchemy/orm/events.py @@ -1427,7 +1427,27 @@ class SessionEvents(event.Events): .. seealso:: - :class:`.ORMExecuteState` + :ref:`session_execute_events` - top level documentation on how + to use :meth:`_orm.SessionEvents.do_orm_execute` + + :class:`.ORMExecuteState` - the object passed to the + :meth:`_orm.SessionEvents.do_orm_execute` event which contains + all information about the statement to be invoked. It also + provides an interface to extend the current statement, options, + and parameters as well as an option that allows programmatic + invocation of the statement at any point. + + :ref:`examples_session_orm_events` - includes examples of using + :meth:`_orm.SessionEvents.do_orm_execute` + + :ref:`examples_caching` - an example of how to integrate + Dogpile caching with the ORM :class:`_orm.Session` making use + of the :meth:`_orm.SessionEvents.do_orm_execute` event hook. + + :ref:`examples_sharding` - the Horizontal Sharding example / + extension relies upon the + :meth:`_orm.SessionEvents.do_orm_execute` event hook to invoke a + SQL statement on multiple backends and return a merged result. .. versionadded:: 1.4 @@ -2585,12 +2605,8 @@ class QueryEvents(event.Events): """Represent events within the construction of a :class:`_query.Query` object. - The events here are intended to be used with an as-yet-unreleased - inspection system for :class:`_query.Query`. Some very basic operations - are possible now, however the inspection system is intended to allow - complex query manipulations to be automated. - - .. versionadded:: 1.0.0 + The :class:`_orm.QueryEvents` hooks are now superseded by the + :meth:`_orm.SessionEvents.do_orm_execute` event hook. """ @@ -2602,6 +2618,17 @@ class QueryEvents(event.Events): object before it is composed into a core :class:`_expression.Select` object. + .. deprecated:: 1.4 The :meth:`_orm.QueryEvents.before_compile` event + is superseded by the much more capable + :meth:`_orm.SessionEvents.do_orm_execute` hook. In version 1.4, + the :meth:`_orm.QueryEvents.before_compile` event is **no longer + used** for ORM-level attribute loads, such as loads of deferred + or expired attributes as well as relationship loaders. See the + new examples in :ref:`examples_session_orm_events` which + illustrate new ways of intercepting and modifying ORM queries + for the most common purpose of adding arbitrary filter criteria. + + This event is intended to allow changes to the query given:: @event.listens_for(Query, "before_compile", retval=True) @@ -2656,6 +2683,10 @@ class QueryEvents(event.Events): """Allow modifications to the :class:`_query.Query` object within :meth:`_query.Query.update`. + .. deprecated:: 1.4 The :meth:`_orm.QueryEvents.before_compile_update` + event is superseded by the much more capable + :meth:`_orm.SessionEvents.do_orm_execute` hook. + Like the :meth:`.QueryEvents.before_compile` event, if the event is to be used to alter the :class:`_query.Query` object, it should be configured with ``retval=True``, and the modified @@ -2702,6 +2733,10 @@ class QueryEvents(event.Events): """Allow modifications to the :class:`_query.Query` object within :meth:`_query.Query.delete`. + .. deprecated:: 1.4 The :meth:`_orm.QueryEvents.before_compile_delete` + event is superseded by the much more capable + :meth:`_orm.SessionEvents.do_orm_execute` hook. + Like the :meth:`.QueryEvents.before_compile` event, this event should be configured with ``retval=True``, and the modified :class:`_query.Query` object returned, as in :: diff --git a/lib/sqlalchemy/orm/loading.py b/lib/sqlalchemy/orm/loading.py index 601393156..2eb3e1368 100644 --- a/lib/sqlalchemy/orm/loading.py +++ b/lib/sqlalchemy/orm/loading.py @@ -141,6 +141,21 @@ def instances(cursor, context): @util.preload_module("sqlalchemy.orm.context") def merge_frozen_result(session, statement, frozen_result, load=True): + """Merge a :class:`_engine.FrozenResult` back into a :class:`_orm.Session`, + returning a new :class:`_engine.Result` object with :term:`persistent` + objects. + + See the section :ref:`do_orm_execute_re_executing` for an example. + + .. seealso:: + + :ref:`do_orm_execute_re_executing` + + :meth:`_engine.Result.freeze` + + :class:`_engine.FrozenResult` + + """ querycontext = util.preloaded.orm_context if load: @@ -184,6 +199,11 @@ def merge_frozen_result(session, statement, frozen_result, load=True): session.autoflush = autoflush +@util.deprecated( + "2.0", + "The :func:`_orm.merge_result` method is superseded by the " + ":func:`_orm.merge_frozen_result` function.", +) @util.preload_module("sqlalchemy.orm.context") def merge_result(query, iterator, load=True): """Merge a result into this :class:`.Query` object's Session.""" diff --git a/lib/sqlalchemy/orm/session.py b/lib/sqlalchemy/orm/session.py index e9d4ac2c6..ed1af0a80 100644 --- a/lib/sqlalchemy/orm/session.py +++ b/lib/sqlalchemy/orm/session.py @@ -107,6 +107,10 @@ class ORMExecuteState(util.MemoizedSlots): .. versionadded:: 1.4 + .. seealso:: + + :ref:`session_execute_events` - top level documentation on how + to use :meth:`_orm.SessionEvents.do_orm_execute` """ @@ -158,14 +162,24 @@ class ORMExecuteState(util.MemoizedSlots): bind_arguments=None, ): """Execute the statement represented by this - :class:`.ORMExecuteState`, without re-invoking events. - - This method essentially performs a re-entrant execution of the - current statement for which the :meth:`.SessionEvents.do_orm_execute` - event is being currently invoked. The use case for this is - for event handlers that want to override how the ultimate results - object is returned, such as for schemes that retrieve results from - an offline cache or which concatenate results from multiple executions. + :class:`.ORMExecuteState`, without re-invoking events that have + already proceeded. + + This method essentially performs a re-entrant execution of the current + statement for which the :meth:`.SessionEvents.do_orm_execute` event is + being currently invoked. The use case for this is for event handlers + that want to override how the ultimate + :class:`_engine.Result` object is returned, such as for schemes that + retrieve results from an offline cache or which concatenate results + from multiple executions. + + When the :class:`_engine.Result` object is returned by the actual + handler function within :meth:`_orm.SessionEvents.do_orm_execute` and + is propagated to the calling + :meth:`_orm.Session.execute` method, the remainder of the + :meth:`_orm.Session.execute` method is preempted and the + :class:`_engine.Result` object is returned to the caller of + :meth:`_orm.Session.execute` immediately. :param statement: optional statement to be invoked, in place of the statement currently represented by :attr:`.ORMExecuteState.statement`. @@ -970,13 +984,30 @@ class Session(_SessionClassMethods): transaction will load from the most recent database state. :param future: if True, use 2.0 style behavior for the - :meth:`_orm.Session.execute` method. This includes that the - :class:`_engine.Result` object returned will return new-style - tuple rows, as well as that Core constructs such as - :class:`_sql.Select`, - :class:`_sql.Update` and :class:`_sql.Delete` will be interpreted - in an ORM context if they are made against ORM entities rather than - plain :class:`.Table` metadata objects. + :meth:`_orm.Session.execute` method. Future mode includes the + following behaviors: + + * The :class:`_engine.Result` object returned by the + :meth:`_orm.Session.execute` method will return new-style tuple + :class:`_engine.Row` objects + + * The :meth:`_orm.Session.execute` method will invoke ORM style + queries given objects like :class:`_sql.Select`, + :class:`_sql.Update` and :class:`_sql.Delete` against ORM entities + + * The :class:`_orm.Session` will not use "bound" metadata in order + to locate an :class:`_engine.Engine`; the engine or engines in use + must be specified to the constructor of :class:`_orm.Session` or + otherwise be configured against the :class:`_orm.sessionmaker` + in use + + * The "subtransactions" feature of :meth:`_orm.Session.begin` is + removed in version 2.0 and is disabled when the future flag is + set. + + * The behavior of the :paramref:`_orm.relationship.cascade_backrefs` + flag on a :func:`_orm.relationship` will always assume + "False" behavior. The "future" flag is also available on a per-execution basis using the :paramref:`_orm.Session.execute.future` flag. diff --git a/test/base/test_utils.py b/test/base/test_utils.py index bba0cc16c..d5dece9a6 100644 --- a/test/base/test_utils.py +++ b/test/base/test_utils.py @@ -187,6 +187,15 @@ class ImmutableDictTest(fixtures.TestBase): eq_(d, {1: 2, 3: 4}) eq_(d2, {1: 2, 3: 5, 7: 12}) + def _dont_test_union_kw(self): + d = util.immutabledict({"a": "b", "c": "d"}) + + d2 = d.union(e="f", g="h") + assert isinstance(d2, util.immutabledict) + + eq_(d, {"a": "b", "c": "d"}) + eq_(d2, {"a": "b", "c": "d", "e": "f", "g": "h"}) + def test_union_tuples(self): d = util.immutabledict({1: 2, 3: 4}) |