diff options
author | Jordan Cook <jordan.cook@pioneer.com> | 2021-04-03 13:33:22 -0500 |
---|---|---|
committer | Jordan Cook <jordan.cook@pioneer.com> | 2021-04-03 15:30:01 -0500 |
commit | 177e65644253667c2e0827ded6e3e16aa89a317d (patch) | |
tree | 70bae7d8aa892f6e7b375dabcf677be43119bee5 | |
parent | 8854ae6982aeca12349536bcecf16eb0a8973c45 (diff) | |
download | requests-cache-177e65644253667c2e0827ded6e3e16aa89a317d.tar.gz |
Make Readme more concise again, and split main usage docs into 'Quickstart' (Readme), 'User Guide', and 'Advanced Usage' sections
* Add more details and formatting to changelog
* Add some more reference links to classes, methods, and functions mentioned in docs
-rw-r--r-- | CONTRIBUTING.md | 2 | ||||
-rw-r--r-- | HISTORY.md | 95 | ||||
-rw-r--r-- | README.md | 180 | ||||
-rw-r--r-- | docs/advanced_usage.rst | 233 | ||||
-rw-r--r-- | docs/api.rst | 6 | ||||
-rw-r--r-- | docs/contributing.rst | 2 | ||||
-rw-r--r-- | docs/index.rst | 11 | ||||
-rw-r--r-- | docs/related_projects.rst | 17 | ||||
-rw-r--r-- | docs/user_guide.rst | 281 |
9 files changed, 480 insertions, 347 deletions
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 8417d97..d1698af 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -11,7 +11,7 @@ If there is a new feature you would like to see, the best way to make that happe for it! ## Bug Reports & Feedback -If you discover a bug, want to propose new feature, or have other feedback about requests-cache, please +If you discover a bug, want to propose a new feature, or have other feedback about requests-cache, please [create an issue](https://github.com/reclosedev/requests-cache/issues/new/choose)! ## Project Discussion @@ -3,23 +3,32 @@ ## 0.6.0 (2021-04-TBD) [See all included issues and PRs](https://github.com/reclosedev/requests-cache/milestone/1?closed=1) -### General -* Drop support for python <= 3.5 -* Add `CacheMixin` class to make the features of `CachedSession` usable as a mixin class, - for compatibility with other `requests`-based libraries -* Add `CachedResponse` class to wrapped cached `requests.Response` objects, - which makes additional cache information available to client code +### Serialization +**Note:** Due to the following changes, responses cached with previous versions of requests-cache +will be invalid. These **old responses will be treated as expired**, and will be refreshed the +next time they are requested. They can also be manually converted or removed, if needed (see notes below). + +* Add [example script](https://github.com/reclosedev/requests-cache/blob/master/examples/convert_cache.py) + to convert an existing cache from previous serialization format to new one +* When running `remove_expired_responses()`, also remove responses that are invalid due to updated + serialization format +* Add `CachedResponse` class to wrap cached `requests.Response` objects, which makes additional + cache information available to client code +* Add `CachedHTTPResponse` class to wrap `urllib3.response.HTTPResponse` objects, available via `CachedResponse.raw` + * Re-construct the raw response on demand to avoid storing extra data in the cache + * Improve emulation of raw request behavior used for iteration, streaming requests, etc. * Add `BaseCache.urls` property to get all URLs persisted in the cache * Add optional support for `itsdangerous` for more secure serialization -* Add `HEAD` to default `allowable_methods` -* Remove invalid responses when running `remove_expired_responses()` (in case an update in - requests-cache or one of its dependencies breaks backwards-compatibility with old cache data) -* Handle additional edge cases with request normalization for cache keys (to avoid duplicate cached responses) ### Cache Expiration +* Cached responses are now stored with an absolute expiration time, so `CachedSession.expire_after` + no longer applies retroactively. To revalidate previously cached items with a new expiration time, + see below: +* Add support for overriding original expiration (i.e., revalidating) in `CachedSession.remove_expired_responses()` * Add support for setting expiration for individual requests * Add support for setting expiration based on URL glob patterns -* Add support for overriding original expiration (i.e., revalidating) in `CachedSession.remove_expired_responses()` +* Add support for setting expiration as a `datetime` +* Add support for explicitly disabling expiration with `-1` (Since `None` may be ambiguous in some cases) ### Backends * SQLite: Allow passing user paths (`~/path-to-cache`) to database file with `db_path` param @@ -29,6 +38,8 @@ ### Bugfixes * Fix caching requests with data specified in `json` parameter * Fix caching requests with `verify` parameter +* Fix duplicate cached responses due to some unhandled variations in URL format + * To support this, the `url-normalize` library has been added to dependencies * Fix usage of backend-specific params when used in place of `cache_name` * Fix potential TypeError with `DbPickleDict` initialization * Fix usage of `CachedSession.cache_disabled` if used within another contextmanager @@ -37,19 +48,28 @@ requests-cache is not installed * Update usage of deprecated MongoClient `save()` method +### General +* Drop support for python <= 3.5 +* Add `CacheMixin` class to make the features of `CachedSession` usable as a mixin class, + for [compatibility with other requests-based libraries](https://requests-cache.readthedocs.io/en/latest/advanced_usage.html#library-compatibility). +* Add `HEAD` to default `allowable_methods` + ### Docs & Tests * Add type annotations to main functions/methods in public API, and include in documentation on [readthedocs](https://requests-cache.readthedocs.io/en/latest/) * Add [Contributing Guide](https://requests-cache.readthedocs.io/en/latest/contributing.html), [Security](https://requests-cache.readthedocs.io/en/latest/security.html) info, - and more examples & detailed usage info in an - [Advanced Usage](https://requests-cache.readthedocs.io/en/latest/advanced_usage.html#) section. -* Increased test coverage, and added containerized backends for both local and CI integration testing - -## 0.5.2 (2019-08-14) + and more examples & detailed usage info in + [User Guide](https://requests-cache.readthedocs.io/en/latest/user_guide.html) and + [Advanced Usage](https://requests-cache.readthedocs.io/en/latest/advanced_usage.html) sections. +* Increase test coverage and rewrite most tests using pytest +* Add containerized backends for both local and CI integration testing + +----- +### 0.5.2 (2019-08-14) * Fix DeprecationWarning from collections #140 -## 0.5.1 (2019-08-13) +### 0.5.1 (2019-08-13) * Remove Python 2.6 Testing from travis #133 * Fix DeprecationWarning from collections #131 * vacuum the sqlite database after clearing a table #134 @@ -65,52 +85,51 @@ Project is now added to [Code Shelter](https://www.codeshelter.co) * Fix remove_expired_responses missed in __init__.py #93 * Fix deprecation warnings #122, thanks to mbarkhau -## 0.4.13 (2016-12-23) +----- +### 0.4.13 (2016-12-23) * Support PyMongo3, thanks to @craigls #72 * Fix streaming releate issue #68 -## 0.4.12 (2016-03-19) +### 0.4.12 (2016-03-19) * Fix ability to pass backend instance in `install_cache` #61 - -## 0.4.11 (2016-03-07) +### 0.4.11 (2016-03-07) * `ignore_parameters` feature, thanks to @themiurgo and @YetAnotherNerd (#52, #55) * More informative message for missing backend dependencies, thanks to @Garrett-R (#60) -## 0.4.10 (2015-04-28) +### 0.4.10 (2015-04-28) * Better transactional handling in sqlite #50, thanks to @rgant * Compatibility with streaming in requests >= 2.6.x -## 0.4.9 (2015-01-17) +### 0.4.9 (2015-01-17) * `expire_after` now also accepts `timedelta`, thanks to @femtotrader * Added Ability to include headers to cache key (`include_get_headers` option) * Added string representation for `CachedSession` -## 0.4.8 (2014-12-13) +### 0.4.8 (2014-12-13) * Fix bug in reading cached streaming response -## 0.4.7 (2014-12-06) +### 0.4.7 (2014-12-06) * Fix compatibility with Requests > 2.4.1 (json arg, response history) -## 0.4.6 (2014-10-13) +### 0.4.6 (2014-10-13) * Monkey patch now uses class instead lambda (compatibility with rauth) * Normalize (sort) parameters passed as builtin dict -## 0.4.5 (2014-08-22) +### 0.4.5 (2014-08-22) * Requests==2.3.0 compatibility, thanks to @gwillem -## 0.4.4 (2013-10-31) +### 0.4.4 (2013-10-31) * Check for backend availability in install_cache(), not at the first request * Default storage fallbacks to memory if `sqlite` is not available -## 0.4.3 (2013-09-12) +### 0.4.3 (2013-09-12) * Fix `response.from_cache` not set in hooks -## 0.4.2 (2013-08-25) +### 0.4.2 (2013-08-25) * Fix `UnpickleableError` for gzip responses - -## 0.4.1 (2013-08-19) +### 0.4.1 (2013-08-19) * `requests_cache.enabled()` context manager * Compatibility with Requests 1.2.3 cookies handling @@ -118,28 +137,30 @@ Project is now added to [Code Shelter](https://www.codeshelter.co) * Redis backend. Thanks to @michaelbeaumont * Fix for changes in Requests 1.2.0 hooks dispatching - +----- ## 0.3.0 (2013-02-24) * Support for `Requests` 1.x.x * `CachedSession` * Many backward incompatible changes -## 0.2.1 (2013-01-13) +----- +### 0.2.1 (2013-01-13) * Fix broken PyPi package ## 0.2.0 (2013-01-12) * Last backward compatible version for `Requests` 0.14.2 -## 0.1.3 (2012-05-04) +----- +### 0.1.3 (2012-05-04) * Thread safety for default `sqlite` backend * Take into account the POST parameters when cache is configured with 'POST' in `allowable_methods` -## 0.1.2 (2012-05-02) +### 0.1.2 (2012-05-02) * Reduce number of `sqlite` database write operations * `fast_save` option for `sqlite` backend -## 0.1.1 (2012-04-11) +### 0.1.1 (2012-04-11) * Fix: restore responses from response.history * Internal refactoring (`MemoryCache` -> `BaseCache`, `reduce_response` and `restore_response` moved to `BaseCache`) @@ -8,164 +8,70 @@ [![Code Shelter](https://www.codeshelter.co/static/badges/badge-flat.svg)](https://www.codeshelter.co/) ## Summary -**requests-cache** is a transparent persistent HTTP cache for the python [requests](http://python-requests.org) -library. It is especially useful for web scraping, consuming REST APIs, slow or rate-limited -sites, or any other scenario in which you're making lots of requests that are likely to be sent -more than once. - -Several storage backends are included: **SQLite**, **Redis**, **MongoDB**, and **DynamoDB**. +**requests-cache** is a transparent, persistent HTTP cache for the python [requests](http://python-requests.org) +library. It's a convenient tool to use with web scraping, consuming REST APIs, slow or rate-limited +sites, or any other scenario in which you're making lots of requests that are expensive and/or +likely to be sent more than once. See full project documentation at: https://requests-cache.readthedocs.io -## Installation -Install with pip: +## Features +* **Ease of use:** Use as a [drop-in replacement](https://requests-cache.readthedocs.io/en/latest/api.html#sessions) + for `requests.Session`, or [install globally](https://requests-cache.readthedocs.io/en/latest/user_guide.html#patching) + to add caching to all `requests` functions +* **Customization:** Works out of the box with zero config, but with plenty of options available + for customizing cache + [expiration](https://requests-cache.readthedocs.io/en/latest/user_guide.html#cache-expiration) + and other [behavior](https://requests-cache.readthedocs.io/en/latest/user_guide.html#cache-options) +* **Persistence:** Includes several [storage backends](https://requests-cache.readthedocs.io/en/latest/user_guide.html#cache-backends): + SQLite, Redis, MongoDB, and DynamoDB. +* **Compatibility:** Can be used alongside + [other popular libraries based on requests](https://requests-cache.readthedocs.io/en/latest/advanced_usage.html#library-compatibility) + +# Quickstart +First, install with pip: ```bash pip install requests-cache ``` -**Requirements:** -* Requires python 3.6+. -* You may need additional dependencies depending on which backend you want to use. To install with - extra dependencies for all supported backends: - - ```bash - pip install requests-cache[backends] - ``` - -**Optional Setup Steps:** -* See [Security](https://requests-cache.readthedocs.io/en/latest/security.html) for recommended - setup steps for more secure cache serialization. -* See [Contributing Guide](https://requests-cache.readthedocs.io/en/latest/contributing.html) - for setup info for local development. - -## General Usage -There are two main ways of using requests-cache: -* [Sessions](https://requests-cache.readthedocs.io/en/latest/api.html#sessions): - Use `requests_cache.CachedSession` in place of - [requests.Session](https://requests.readthedocs.io/en/master/user/advanced/#session-objects) (recommended) -* [Patching](https://requests-cache.readthedocs.io/en/latest/api.html#patching): - Globally patch `requests` using `requests_cache.install_cache()`. - -### Sessions -The `CachedSession` class is a drop-in replacement for `requests.Session` that adds caching features. - -Basic example: -```python -from requests_cache import CachedSession - -session = CachedSession('demo_cache', backend='sqlite') -for i in range(100): - session.get('http://httpbin.org/delay/1') -``` -The URL in this example adds a delay of 1 second, but all 100 requests will complete in just over 1 -second. The response will be fetched once, saved to `demo_cache.sqlite`, and subsequent requests -will return the cached response near-instantly. +Next, use [requests_cache.CachedSession](https://requests-cache.readthedocs.io/en/latest/api.html#sessions) +to send and cache requests. To quickly demonstrate how to use it: -### Patching -Using `requests_cache.install_cache()` will add caching to all `requests` functions: +**This takes ~1 minute:** ```python import requests -import requests_cache -requests_cache.install_cache() -requests.get('http://httpbin.org/get') session = requests.Session() -session.get('http://httpbin.org/get') -``` - -`install_cache()` takes all the same parameters as `CachedSession`. It can be temporarily disabled -with `disabled()`, and completely removed with `uninstall_cache()`: -```python -# Neither of these requests will use the cache -with requests_cache.disabled(): - requests.get('http://httpbin.org/get') - -requests_cache.uninstall_cache() -requests.get('http://httpbin.org/get') -``` - -**Limitations:** - -Like any other utility that uses global patching, there are some scenarios where you won't want to -use this: -* In a multi-threaded or multiprocess application -* In an application that uses other packages that extend or modify `requests.Session` -* In a package that will be used by other packages or applications - -### Cache Backends -Several [cache backends](https://requests-cache.readthedocs.io/en/latest/modules/requests_cache.backends.html) -are included, which can be selected with the `backend` parameter to `CachedSession` or `install_cache()`: - -* `'memory'` : Not persistent, just stores responses with an in-memory dict -* `'sqlite'` : [SQLite](https://www.sqlite.org) database (**default**) -* `'redis'` : [Redis](https://redis.io/) cache (requires `redis`) -* `'mongodb'` : [MongoDB](https://www.mongodb.com/) database (requires `pymongo`) -* `'dynamodb'` : [Amazon DynamoDB](https://aws.amazon.com/dynamodb/) database (requires `boto3`) - -### Cache Expiration -By default, cached responses will be stored indefinitely. There are a number of ways you can handle -cache expiration. The simplest is using the `expire_after` param with a value in seconds: -```python -# Expire after 30 seconds -session = CachedSession(expire_after=30) +for i in range(60): + session.get('http://httpbin.org/delay/1') ``` -Or a `timedelta`: +**This takes ~1 second:** ```python -from datetime import timedelta - -# Expire after 30 days -session = CachedSession(expire_after=timedelta(days=30)) -``` +import requests_cache -You can also set expiration on a per-request basis, which will override any session settings: -```python -# Expire after 6 minutes -session.get('http://httpbin.org/get', expire_after=360) +session = requests_cache.CachedSession('demo_cache') +for i in range(60): + session.get('http://httpbin.org/delay/1') ``` -If a per-session expiration is set but you want to temporarily disable it, use `-1`: -```python -# Never expire -session.get('http://httpbin.org/get', expire_after=-1) -``` +The URL in this example adds a delay of 1 second, simulating a slow or rate-limited website. +With caching, the response will be fetched once, saved to `demo_cache.sqlite`, and subsequent +requests will return the cached response near-instantly. -For better performance, expired responses won't be removed immediately, but will be removed -(or replaced) the next time they are accessed. To manually clear all expired responses: +If you don't want to manage a session object, requests-cache can also be installed globally: ```python -session.remove_expired_responses() -``` -Or, when using patching: -```python -requests_cache.remove_expired_responses() +requests_cache.install_cache('demo_cache') +requests.get('http://httpbin.org/delay/1') ``` -Or, to revalidate the cache with a new expiration: -```python -session.remove_expired_responses(expire_after=360) -``` +## Next Steps +To find out more about what you can do with requests-cache, see: -## More Features & Examples -* You can find a working example at Real Python: +* The + [User Guide](https://requests-cache.readthedocs.io/en/latest/user_guide.html) and + [Advanced Usage](https://requests-cache.readthedocs.io/en/latest/advanced_usage.html) sections +* A working example at Real Python: [Caching External API Requests](https://realpython.com/blog/python/caching-external-api-requests) -* There are some additional examples in the [examples/](https://github.com/reclosedev/requests-cache/tree/master/examples) folder -* See [Advanced Usage](https://requests-cache.readthedocs.io/en/latest/advanced_usage.html) for - details on customizing cache behavior and other features beyond the basics. - -## Related Projects -If `requests-cache` isn't quite what you need, you can help make it better! See the -[Contributing Guide](https://requests-cache.readthedocs.io/en/latest/contributing.html) -for details. - -You can also check out these other python cache projects: - -* [CacheControl](https://github.com/ionrock/cachecontrol): An HTTP cache for `requests` that caches - according to HTTP headers -* [diskcache](https://github.com/grantjenks/python-diskcache): A general-purpose (not HTTP-specific) - file-based cache built on SQLite -* [aiohttp-client-cache](https://github.com/JWCook/aiohttp-client-cache): An async HTTP cache for - `aiohttp`, based on `requests-cache` -* [aiohttp-cache](https://github.com/cr0hn/aiohttp-cache): A server-side async HTTP cache for the - `aiohttp` web server -* [aiocache](https://github.com/aio-libs/aiocache): General-purpose (not HTTP-specific) async cache - backends +* More examples in the + [examples/](https://github.com/reclosedev/requests-cache/tree/master/examples) folder diff --git a/docs/advanced_usage.rst b/docs/advanced_usage.rst index ba25c67..310d7e3 100644 --- a/docs/advanced_usage.rst +++ b/docs/advanced_usage.rst @@ -2,52 +2,17 @@ Advanced Usage ============== +This section covers some more advanced and use-case-specific features. + .. contents:: :local: -CachedSession Options ---------------------- -See :py:class:`.CachedSession` for a full list of parameters. - -Cache Name -~~~~~~~~~~ -The ``cache_name`` parameter will be used as follows depending on the backend: - -* ``sqlite``: Cache filename, e.g ``my_cache.sqlite`` -* ``dynamodb``: Table name -* ``mongodb`` and ``gridfs``: Database name -* ``redis``: Namespace, meaning all keys will be prefixed with ``'cache_name:'`` - -Cache Keys -~~~~~~~~~~ -The cache key is a hash created from request information, and is used as an index for cached -responses. There are a couple ways you can customize what information is used to create this key: - -* Use ``include_get_headers`` if you want headers to be included in the cache key. In other - words, this will create separate cache items for responses with different headers. -* Use ``ignored_parameters`` to exclude specific request params from the cache key. This is - useful, for example, if you request the same resource with different credentials or access - tokens. - -HTTP methods and status codes -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -You can choose which request HTTP methods and response status codes you want to cache using the -parameters ``allowable_methods`` and ``allowable_codes``, respectively. By default, only GET and HEAD -requests and 200 responses are cached. Example: - - >>> from requests_cache import CachedSession - >>> - >>> session = CachedSession( - >>> allowable_methods=('GET', 'POST'), - >>> allowable_codes=(200, 418), - >>> ) - -Custom response filter -~~~~~~~~~~~~~~~~~~~~~~ -If you need more advanced behaviour for determining what to cache, you can provide a custom filtering -function via the ``filter_fn`` param. This function that takes a :py:class:`requests.Response` object -and returns a boolean indicating whether or not that response should be cached. It will be applied to -both new responses (on write) and previously cached responses (on read). Example: +Custom Response Filtering +------------------------- +If you need more advanced behavior for determining what to cache, you can provide a custom filtering +function via the ``filter_fn`` param. This can by any function that takes a :py:class:`requests.Response` +object and returns a boolean indicating whether or not that response should be cached. It will be applied +to both new responses (on write) and previously cached responses (on read). Example: >>> from sys import getsizeof >>> from requests_cache import CachedSession @@ -58,48 +23,6 @@ both new responses (on write) and previously cached responses (on read). Example >>> >>> session = CachedSession(filter_fn=filter_by_size) -Cache Expiration -~~~~~~~~~~~~~~~~ -Use ``expire_after`` to specify how long responses will be cached. This can be: - -* A positive number (in seconds) -* ``-1`` (to never expire) -* A :py:class:`~datetime.timedelta` -* A :py:class:`~datetime.datetime` - -This will only apply to responses cached in the current session; to apply a different expiration -to previously cached responses, see :py:meth:`remove_expired_responses`. - -Expiration can also be set on a per-URL or per request basis. The following order of precedence -is used: - -1. Per-request expiration (``expire_after`` argument for :py:meth:`.CachedSession.request`) -2. Per-URL expiration (``urls_expire_after`` argument for ``CachedSession``) -3. Per-session expiration (``expire_after`` argument for ``CachedSession``) - -URL Patterns -~~~~~~~~~~~~ -You can use ``urls_expire_after`` to set different expiration times for different requests, based on -URL glob patterns. This allows you to customize caching based on what you know about the resources -you're requesting. For example, you might request one resource that gets updated frequently, another -that changes infrequently, and another that never changes. Example: - - >>> urls_expire_after = { - >>> '*.site_1.com': 30, - >>> 'site_2.com/resource_1': 60 * 2, - >>> 'site_2.com/resource_2': 60 * 60 * 24, - >>> 'site_2.com/static': -1, - >>> } - -**Notes:** - -* ``urls_expire_after`` should be a dict in the format ``{'pattern': expire_after}`` -* ``expire_after`` accepts the same types as ``CachedSession.expire_after`` -* Patterns will match request **base URLs**, so the pattern ``site.com/resource/`` is equivalent to - ``http*://site.com/resource/**`` -* If there is more than one match, the first match will be used in the order they are defined -* If no patterns match a request, ``expire_after`` will be used as a default. - Cache Inspection ---------------- Here are some ways to get additional information out of the cache session, backend, and responses: @@ -108,8 +31,8 @@ Response Attributes ~~~~~~~~~~~~~~~~~~~ The following attributes are available on responses: * ``from_cache``: indicates if the response came from the cache -* ``created_at``: ``datetime`` of when the cached response was created or last updated -* ``expires``: ``datetime`` after which the cached response will expire +* ``created_at``: :py:class:`~datetime.datetime` of when the cached response was created or last updated +* ``expires``: :py:class:`~datetime.datetime` after which the cached response will expire * ``is_expired``: indicates if the cached response is expired (if an old response was returned due to a request error) Examples: @@ -151,8 +74,8 @@ responses they redirect to. Custom Backends --------------- -If the built-in :py:mod:`Cache Backends <requests_cache.backends>` don't suit your needs and you want to create your own, you can create -subclasses of :py:class:`.BaseCache` and :py:class:`.BaseStorage`: +If the built-in :py:mod:`Cache Backends <requests_cache.backends>` don't suit your needs, you can create your own by +making subclasses of :py:class:`.BaseCache` and :py:class:`.BaseStorage`: >>> from requests_cache import CachedSession >>> from requests_cache.backends import BaseCache, BaseStorage @@ -167,7 +90,7 @@ subclasses of :py:class:`.BaseCache` and :py:class:`.BaseStorage`: >>> class MyStorage(BaseStorage): >>> """Lower-level backend storage operations""" -You can then use your custom backend in a ``CachedSession`` with the ``backend`` parameter: +You can then use your custom backend in a :py:class:`.CachedSession` with the ``backend`` parameter: >>> session = CachedSession(backend=MyCache()) @@ -204,62 +127,62 @@ Streaming Requests If you use `streaming requests <https://2.python-requests.org/en/master/user/advanced/#id9>`_, you can use the same code to iterate over both cached and non-cached requests. A cached request will, of course, have already been read, but will use a file-like object containing the content. -Example:: - - from requests_cache import CachedSession +Example: - session = CachedSession() - for i in range(2): - r = session.get('https://httpbin.org/stream/20', stream=True) - for chunk in r.iter_lines(): - print(chunk.decode('utf-8')) + >>> from requests_cache import CachedSession + >>> + >>> session = CachedSession() + >>> for i in range(2): + ... r = session.get('https://httpbin.org/stream/20', stream=True) + ... for chunk in r.iter_lines(): + ... print(chunk.decode('utf-8')) .. _library_compatibility: Usage with other requests-based libraries ----------------------------------------- -This library works by patching and/or extending ``requests.Session``. Many other libraries out there +This library works by patching and/or extending :py:class:`requests.Session`. Many other libraries out there do the same thing, making it potentially difficult to combine them. For that scenario, a mixin class -is provided, so you can create a custom class with behavior from multiple Session-modifying libraries:: +is provided, so you can create a custom class with behavior from multiple Session-modifying libraries: - from requests import Session - from requests_cache import CacheMixin - from some_other_lib import SomeOtherMixin - - class CustomSession(CacheMixin, SomeOtherMixin ClientSession): - """Session class with features from both requests-html and requests-cache""" + >>> from requests import Session + >>> from requests_cache import CacheMixin + >>> from some_other_lib import SomeOtherMixin + >>> + >>> class CustomSession(CacheMixin, SomeOtherMixin ClientSession): + ... """Session class with features from both requests-html and requests-cache""" Requests-HTML ~~~~~~~~~~~~~ -Example with `requests-html <https://github.com/psf/requests-html>`_:: - - import requests - from requests_cache import CacheMixin, install_cache - from requests_html import HTMLSession - - class CachedHTMLSession(CacheMixin, HTMLSession): - """Session with features from both CachedSession and HTMLSession""" +Example with `requests-html <https://github.com/psf/requests-html>`_: - session = CachedHTMLSession() - r = session.get("https://github.com/") - print(r.from_cache, r.html.links) + >>> import requests + >>> from requests_cache import CacheMixin, install_cache + >>> from requests_html import HTMLSession + >>> + >>> class CachedHTMLSession(CacheMixin, HTMLSession): + ... """Session with features from both CachedSession and HTMLSession""" + >>> + >>> session = CachedHTMLSession() + >>> r = session.get('https://github.com/') + >>> print(r.from_cache, r.html.links) -Or, using the monkey-patch method:: +Or, using the monkey-patch method: - install_cache(session_factory=CachedHTMLSession) - r = requests.get("https://github.com/") - print(r.from_cache, r.html.links) + >>> install_cache(session_factory=CachedHTMLSession) + >>> r = requests.get('https://github.com/') + >>> print(r.from_cache, r.html.links) -The same approach can be used with other libraries that subclass ``requests.Session``. +The same approach can be used with other libraries that subclass :py:class:`requests.Session`. Requests-futures ~~~~~~~~~~~~~~~~ Example with `requests-futures <https://github.com/ross/requests-futures>`_: -Some libraries, including `requests-futures`, support wrapping an existing session object. +Some libraries, including ``requests-futures``, support wrapping an existing session object: - session = FutureSession(session=CachedSession()) + >>> session = FutureSession(session=CachedSession()) In this case, ``FutureSession`` must wrap ``CachedSession`` rather than the other way around, since ``FutureSession`` returns (as you might expect) futures rather than response objects. @@ -271,44 +194,36 @@ Example with `requests-mock <https://github.com/jamielennox/requests-mock>`_: Requests-mock works a bit differently. It has multiple methods of mocking requests, and the method most compatible with requests-cache is attaching its -`adapter <https://requests-mock.readthedocs.io/en/latest/adapter.html>`_ to a CachedSession:: - - import requests - from requests_mock import Adapter - from requests_cache import CachedSession - - # Set up a CachedSession that will make mock requests where it would normally make real requests - adapter = Adapter() - adapter.register_uri( - 'GET', - 'mock://some_test_url', - headers={'Content-Type': 'text/plain'}, - text='mock response', - status_code=200, - ) - session = CachedSession() - session.mount('mock://', adapter) - - session.get('mock://some_test_url', text='mock_response') - response = session.get('mock://some_test_url') - print(response.text) +`adapter <https://requests-mock.readthedocs.io/en/latest/adapter.html>`_ to a CachedSession: + + >>> import requests + >>> from requests_mock import Adapter + >>> from requests_cache import CachedSession + >>> + >>> # Set up a CachedSession that will make mock requests where it would normally make real requests + >>> adapter = Adapter() + >>> adapter.register_uri( + ... 'GET', + ... 'mock://some_test_url', + ... headers={'Content-Type': 'text/plain'}, + ... text='mock response', + ... status_code=200, + ... ) + >>> session = CachedSession() + >>> session.mount('mock://', adapter) + >>> + >>> session.get('mock://some_test_url', text='mock_response') + >>> response = session.get('mock://some_test_url') + >>> print(response.text) Internet Archive ~~~~~~~~~~~~~~~~ Example with `internetarchive <https://github.com/jjjake/internetarchive>`_: -Usage is the same as other libraries that subclass `requests.Session`:: - - from requests_cache import CacheMixin - from internetarchive.session import ArchiveSession +Usage is the same as other libraries that subclass `requests.Session`: - class CachedArchiveSession(CacheMixin, ArchiveSession): - """Session with features from both CachedSession and ArchiveSession""" - -Potential Issues ----------------- -* Version updates of ``requests``, ``urllib3`` or ``requests-cache`` itself may not be compatible with - previously cached data (see issues `#56 <https://github.com/reclosedev/requests-cache/issues/56>`_ - and `#102 <https://github.com/reclosedev/requests-cache/issues/102>`_). - The best way to prevent this is to use a virtualenv and pin your dependency versions. -* See :ref:`security` for notes on serialization security + >>> from requests_cache import CacheMixin + >>> from internetarchive.session import ArchiveSession + >>> + >>> class CachedArchiveSession(CacheMixin, ArchiveSession): + ... """Session with features from both CachedSession and ArchiveSession""" diff --git a/docs/api.rst b/docs/api.rst index 2602292..b45c73f 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -1,9 +1,9 @@ .. Note: backend module docs are auto-generated with apidoc; the remaining modules are manually added here for more custom formatting. -API -=== -This section covers all the public interfaces of ``requests-cache`` +API Reference +============= +This section covers all the public interfaces of requests-cache. .. contents:: Table of Contents :depth: 2 diff --git a/docs/contributing.rst b/docs/contributing.rst index 4fc5016..a0ad0a8 100644 --- a/docs/contributing.rst +++ b/docs/contributing.rst @@ -1 +1,3 @@ +.. _contributing: + .. mdinclude:: ../CONTRIBUTING.md diff --git a/docs/index.rst b/docs/index.rst index 2b49f96..892ff8a 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -2,22 +2,23 @@ .. Include Readme contents (except for the link to readthedocs, since we're already on readthedocs!) .. mdinclude:: ../README.md - :end-line: 17 - -.. _general-usage: + :end-line: 15 .. mdinclude:: ../README.md - :start-line: 19 + :start-line: 17 +Contents +======== .. toctree:: - :caption: Contents :maxdepth: 2 + user_guide advanced_usage security api contributing contributors + related_projects history Indices and tables diff --git a/docs/related_projects.rst b/docs/related_projects.rst new file mode 100644 index 0000000..5ceb81b --- /dev/null +++ b/docs/related_projects.rst @@ -0,0 +1,17 @@ +Related Projects +================ +If requests-cache isn't quite what you need, you can help make it better! See the +:ref:`Contributing Guide <contributing>` for details. + +You can also check out these other python cache projects: + +* `CacheControl <https://github.com/ionrock/cachecontrol>`_: An HTTP cache for ``requests`` that caches + according to HTTP headers +* `diskcache <https://github.com/grantjenks/python-diskcache>`_: A general-purpose (not HTTP-specific) + file-based cache built on SQLite +* `aiohttp-client-cache <https://github.com/JWCook/aiohttp-client-cache>`_: An async HTTP cache for + ``aiohttp``, based on `requests-cache` +* `aiohttp-cache <https://github.com/cr0hn/aiohttp-cache>`_: A server-side async HTTP cache for the + ``aiohttp`` web server +* `aiocache <https://github.com/aio-libs/aiocache>`_: General-purpose (not HTTP-specific) async cache + backends diff --git a/docs/user_guide.rst b/docs/user_guide.rst index f556a63..9f539cd 100644 --- a/docs/user_guide.rst +++ b/docs/user_guide.rst @@ -1,7 +1,278 @@ -:orphan: - -.. This is just a placeholder in case anyone had this page bookmarked - User Guide ========== -This page has moved. Please see :ref:`index:general usage` and :ref:`advanced_usage` sections. +This section covers the main features of requests-cache. + +.. contents:: + :local: + :depth: 2 + +Installation +------------ +Install with pip: + + $ pip install requests-cache + +Requirements +~~~~~~~~~~~~ +* Requires python 3.6+. +* You may need additional dependencies depending on which backend you want to use. To install with + extra dependencies for all supported backends: + + $ pip install requests-cache[backends] + +Optional Setup Steps +~~~~~~~~~~~~~~~~~~~~ +* See :ref:`security` for recommended setup steps for more secure cache serialization. +* See :ref:`Contributing Guide <contributing:dev installation>` for setup steps for local development. + +General Usage +------------- +There are two main ways of using requests-cache: + +* **Sessions:** (recommended) Use :py:class:`.CachedSession` to send your requests +* **Patching:** Globally patch ``requests`` using :py:func:`.install_cache()` + +Sessions +~~~~~~~~ +:py:class:`.CachedSession` can be used as a drop-in replacement for :py:class:`requests.Session`. +Basic usage looks like this: + + >>> from requests_cache import CachedSession + >>> + >>> session = CachedSession() + >>> for i in range(60): + ... session.get('http://httpbin.org/delay/1') + +Any :py:class:`requests.Session` method can be used (but see :ref:`user_guide:http methods` section +below for config details): + + >>> session.request('GET', 'http://httpbin.org/get') + >>> session.head('http://httpbin.org/get') + +Caching can be temporarily disabled with :py:meth:`.CachedSession.cache_disabled`: + + >>> with session.cache_disabled(): + ... session.get('http://httpbin.org/get') + +The best way to clean up your cache is through :ref:`user_guide:cache expiration`, but you can also +clear out everything with :py:meth:`.BaseCache.clear`: + + >>> session.cache.clear() + +Patching +~~~~~~~~ +In some situations, it may not be possible or convenient to manage your own session object. In those +cases, you can use :py:func:`.install_cache` to add caching to all ``requests`` functions: + + >>> import requests + >>> import requests_cache + >>> + >>> requests_cache.install_cache() + >>> requests.get('http://httpbin.org/get') + +As well as session methods: + + >>> session = requests.Session() + >>> session.get('http://httpbin.org/get') + +:py:func:`.install_cache` accepts all the same parameters as :py:class:`.CachedSession`: + + >>> requests_cache.install_cache(expire_after=360, allowable_methods=('GET', 'POST')) + +It can be temporarily :py:func:`.enabled`: + + >>> with requests_cache.enabled(): + ... requests.get('http://httpbin.org/get') # Will be cached + +Or temporarily :py:func:`.disabled`: + + >>> requests_cache.install_cache() + >>> with requests_cache.disabled(): + ... requests.get('http://httpbin.org/get') # Will not be cached + +Or completely removed with :py:func:`.uninstall_cache`: + + >>> requests_cache.uninstall_cache() + >>> requests.get('http://httpbin.org/get') + +You can also clear out all responses in the cache with :py:func:`.clear`, and check if +requests-cache is currently installed with :py:func:`.is_installed`. + +Limitations +^^^^^^^^^^^ +Like any other utility that uses global patching, there are some scenarios where you won't want to +use :py:func:`.install_cache`: + +* In a multi-threaded or multiprocess application +* In an application that uses other packages that extend or modify :py:class:`requests.Session` +* In a package that will be used by other packages or applications + +Cache Backends +-------------- +Several cache backends are included, which can be selected with +the ``backend`` parameter for either :py:class:`.CachedSession` or :py:func:`.install_cache`: + +* ``'sqlite'``: `SQLite <https://www.sqlite.org>`_ database (**default**) +* ``'redis'``: `Redis <https://redis.io>`_ cache (requires ``redis``) +* ``'mongodb'``: `MongoDB <https://www.mongodb.com>`_ database (requires ``pymongo``) +* ``'gridfs'``: `GridFS <https://docs.mongodb.com/manual/core/gridfs/>`_ collections on a MongoDB database (requires ``pymongo``) +* ``'dynamodb'``: `Amazon DynamoDB <https://aws.amazon.com/dynamodb>`_ database (requires ``boto3``) +* ``'memory'`` : A non-persistent cache that just stores responses in memory + +A backend can be specified either by name, class or instance: + + >>> from requests_cache.backends import RedisCache + >>> from requests_cache import CachedSession + >>> + >>> # Backend name + >>> session = CachedSession(backend='redis', namespace='my-cache') + + >>> # Backend class + >>> session = CachedSession(backend=RedisCache, namespace='my-cache') + + >>> # Backend instance + >>> session = CachedSession(backend=RedisCache(namespace='my-cache')) + +See :py:mod:`requests_cache.backends` for more backend-specific usage details, and see +:ref:`advanced_usage:custom backends` for details on creating your own implementation. + +Cache Name +~~~~~~~~~~ +The ``cache_name`` parameter will be used as follows depending on the backend: + +* ``sqlite``: Database path, e.g ``~/.cache/my_cache.sqlite`` +* ``dynamodb``: Table name +* ``mongodb`` and ``gridfs``: Database name +* ``redis``: Namespace, meaning all keys will be prefixed with ``'<cache_name>:'`` + +Cache Options +------------- +A number of options are available to modify which responses are cached and how they are cached. + +HTTP Methods +~~~~~~~~~~~~ +By default, only GET and HEAD requests are cached. To cache additional HTTP methods, specify them +with ``allowable_methods``. For example, caching POST requests can be used to ensure you don't send +the same data multiple times: + + >>> session = CachedSession(allowable_methods=('GET', 'POST')) + >>> session.post('http://httpbin.org/post', json={'param': 'value'}) + +Status Codes +~~~~~~~~~~~~ +By default, only responses with a 200 status code are cached. To cache additional status codes, +specify them with ``allowable_codes``" + + >>> session = CachedSession(allowable_codes=(200, 418)) + >>> session.get('http://httpbin.org/teapot') + +Request Parameters +~~~~~~~~~~~~~~~~~~ +By default, all request parameters are taken into account when caching responses. In some cases, +there may be request parameters that don't affect the response data, for example authentication tokens +or credentials. If you want to ignore specific parameters, specify them with ``ignored_parameters``: + + >>> session = CachedSession(ignored_parameters=['auth-token']) + >>> # Only the first request will be sent + >>> session.get('http://httpbin.org/get', params={'auth-token': '2F63E5DF4F44'}) + >>> session.get('http://httpbin.org/get', params={'auth-token': 'D9FAEB3449D3'}) + +Request Headers +~~~~~~~~~~~~~~~ +By default, request headers are not taken into account when caching responses. In some cases, +different headers may result in different response data, so you may want to cache them separately. +To enable this, use ``include_get_headers``: + + >>> session = CachedSession(include_get_headers=True) + >>> # Both of these requests will be sent and cached separately + >>> session.get('http://httpbin.org/headers', {'Accept': 'text/plain'}) + >>> session.get('http://httpbin.org/headers', {'Accept': 'application/json'}) + +Cache Expiration +---------------- +By default, cached responses will be stored indefinitely. You can initialize the cache with an +``expire_after`` value to specify how long responses will be cached. + +Expiration Types +~~~~~~~~~~~~~~~~ +``expire_after`` can be any of the following: + +* ``-1`` (to never expire) +* A positive number (in seconds) +* A :py:class:`~datetime.timedelta` +* A :py:class:`~datetime.datetime` + +Examples: + + >>> # Set expiration for the session using a value in seconds + >>> session = CachedSession(expire_after=360) + + >>> # To specify a different unit of time, use a timedelta + >>> from datetime import timedelta + >>> session = CachedSession(expire_after=timedelta(days=30)) + + >>> # Update an existing session to disable expiration (i.e., store indefinitely) + >>> session.expire_after = -1 + +Expiration Scopes +~~~~~~~~~~~~~~~~~ +Passing ``expire_after`` to :py:class:`.CachedSession` will set the expiration for the duration of that session. +Expiration can also be set on a per-URL or per-request basis. The following order of precedence +is used: + +1. Per-request expiration (``expire_after`` argument for :py:meth:`.CachedSession.request`) +2. Per-URL expiration (``urls_expire_after`` argument for :py:class:`.CachedSession`) +3. Per-session expiration (``expire_after`` argument for :py:class:`.CachedSession`) + +To set expiration for a single request: + + >>> session.get('http://httpbin.org/get', expire_after=360) + +URL Patterns +~~~~~~~~~~~~ +You can use ``urls_expire_after`` to set different expiration values for different requests, based on +URL glob patterns. This allows you to customize caching based on what you know about the resources +you're requesting. For example, you might request one resource that gets updated frequently, another +that changes infrequently, and another that never changes. Example: + + >>> urls_expire_after = { + ... '*.site_1.com': 30, + ... 'site_2.com/resource_1': 60 * 2, + ... 'site_2.com/resource_2': 60 * 60 * 24, + ... 'site_2.com/static': -1, + ... } + >>> session = CachedSession(urls_expire_after=urls_expire_after) + +**Notes:** + +* ``urls_expire_after`` should be a dict in the format ``{'pattern': expire_after}`` +* ``expire_after`` accepts the same types as ``CachedSession.expire_after`` +* Patterns will match request **base URLs**, so the pattern ``site.com/resource/`` is equivalent to + ``http*://site.com/resource/**`` +* If there is more than one match, the first match will be used in the order they are defined +* If no patterns match a request, ``CachedSession.expire_after`` will be used as a default. + +Removing Expired Responses +~~~~~~~~~~~~~~~~~~~~~~~~~~ +For better performance, expired responses won't be removed immediately, but will be removed +(or replaced) the next time they are requested. To manually clear all expired responses, use +:py:meth:`.CachedSession.remove_expired_responses`: + + >>> session.remove_expired_responses() + +Or, when using patching: + + >>> requests_cache.remove_expired_responses() + +You can also apply a different ``expire_after`` to previously cached responses, which will +revalidate the cache with the new expiration time: + + >>> session.remove_expired_responses(expire_after=timedelta(days=30)) + +Potential Issues +---------------- +* Version updates of ``requests``, ``urllib3`` or ``requests-cache`` itself may not be compatible with + previously cached data (see issues `#56 <https://github.com/reclosedev/requests-cache/issues/56>`_ + and `#102 <https://github.com/reclosedev/requests-cache/issues/102>`_). + The best way to prevent this is to use a virtualenv and pin your dependency versions. +* See :ref:`security` for notes on serialization security |