diff options
author | Jordan Cook <jordan.cook@pioneer.com> | 2021-07-07 15:04:41 -0500 |
---|---|---|
committer | Jordan Cook <jordan.cook@pioneer.com> | 2021-07-09 20:07:25 -0500 |
commit | 7be287ec477e38b3611e35ab23123d8b00d10593 (patch) | |
tree | 3aa10e3957d7065f8c23cb7d5a750e326f247071 /docs/user_guide.md | |
parent | b4766ff2c2b5c6cf637cac163b39d7adc499faaa (diff) | |
download | requests-cache-7be287ec477e38b3611e35ab23123d8b00d10593.tar.gz |
Convert docs from rST to MyST, and switch to Furo theme
Diffstat (limited to 'docs/user_guide.md')
-rw-r--r-- | docs/user_guide.md | 381 |
1 files changed, 381 insertions, 0 deletions
diff --git a/docs/user_guide.md b/docs/user_guide.md new file mode 100644 index 0000000..68cd2a1 --- /dev/null +++ b/docs/user_guide.md @@ -0,0 +1,381 @@ +# User Guide +This section covers the main features of requests-cache. + +## Installation +Install with pip: +``` +$ pip install requests-cache +``` + +### Requirements +- Requires python 3.6+. +- You may need additional dependencies depending on which backend you want to use. To install with + extra dependencies for all supported {ref}`user_guide:cache backends`: + ``` + $ pip install requests-cache\[backends\] + ``` +### Optional Setup Steps +- See {ref}`security` for recommended setup steps for more secure cache serialization. +- See {ref}`Contributing Guide <contributing:dev installation>` for setup steps for local development. + +## General Usage +There are two main ways of using requests-cache: +- **Sessions:** (recommended) Use {py:class}`.CachedSession` to send your requests +- **Patching:** Globally patch `requests` using {py:func}`.install_cache()` + +### Sessions +{py:class}`.CachedSession` can be used as a drop-in replacement for {py:class}`requests.Session`. +Basic usage looks like this: +```python +>>> from requests_cache import CachedSession +>>> +>>> session = CachedSession() +>>> session.get('http://httpbin.org/get') +``` + +Any {py:class}`requests.Session` method can be used (but see {ref}`user_guide:http methods` section +below for config details): +```python +>>> session.request('GET', 'http://httpbin.org/get') +>>> session.head('http://httpbin.org/get') +``` + +Caching can be temporarily disabled with {py:meth}`.CachedSession.cache_disabled`: +```python +>>> with session.cache_disabled(): +... session.get('http://httpbin.org/get') +``` + +The best way to clean up your cache is through {ref}`user_guide:cache expiration`, but you can also +clear out everything at once with {py:meth}`.BaseCache.clear`: +```python +>>> session.cache.clear() +``` + +### Patching +In some situations, it may not be possible or convenient to manage your own session object. In those +cases, you can use {py:func}`.install_cache` to add caching to all `requests` functions: +```python +>>> import requests +>>> import requests_cache +>>> +>>> requests_cache.install_cache() +>>> requests.get('http://httpbin.org/get') +``` + +As well as session methods: +```python +>>> session = requests.Session() +>>> session.get('http://httpbin.org/get') +``` + +{py:func}`.install_cache` accepts all the same parameters as {py:class}`.CachedSession`: +```python +>>> requests_cache.install_cache(expire_after=360, allowable_methods=('GET', 'POST')) +``` + +It can be temporarily {py:func}`.enabled`: +```python +>>> with requests_cache.enabled(): +... requests.get('http://httpbin.org/get') # Will be cached +``` + +Or temporarily {py:func}`.disabled`: +```python +>>> requests_cache.install_cache() +>>> with requests_cache.disabled(): +... requests.get('http://httpbin.org/get') # Will not be cached +``` + +Or completely removed with {py:func}`.uninstall_cache`: +```python +>>> requests_cache.uninstall_cache() +>>> requests.get('http://httpbin.org/get') +``` + +You can also clear out all responses in the cache with {py:func}`.clear`, and check if +requests-cache is currently installed with {py:func}`.is_installed`. + +#### Limitations +Like any other utility that uses global patching, there are some scenarios where you won't want to +use {py:func}`.install_cache`: +- In a multi-threaded or multiprocess application +- In an application that uses other packages that extend or modify {py:class}`requests.Session` +- In a package that will be used by other packages or applications + +## Cache Backends +Several cache backends are included, which can be selected with +the `backend` parameter for either {py:class}`.CachedSession` or {py:func}`.install_cache`: + +- `'sqlite'`: [SQLite](https://www.sqlite.org) database (**default**) +- `'redis'`: [Redis](https://redis.io) cache (requires `redis`) +- `'mongodb'`: [MongoDB](https://www.mongodb.com) database (requires `pymongo`) +- `'gridfs'`: [GridFS](https://docs.mongodb.com/manual/core/gridfs/) collections on a MongoDB database (requires `pymongo`) +- `'dynamodb'`: [Amazon DynamoDB](https://aws.amazon.com/dynamodb) database (requires `boto3`) +- `'filesystem'`: Stores responses as files on the local filesystem +- `'memory'` : A non-persistent cache that just stores responses in memory + +A backend can be specified either by name, class or instance: +```python +>>> from requests_cache.backends import RedisCache +>>> from requests_cache import CachedSession + +>>> # Backend name +>>> session = CachedSession(backend='redis', namespace='my-cache') + +>>> # Backend class +>>> session = CachedSession(backend=RedisCache, namespace='my-cache') + +>>> # Backend instance +>>> session = CachedSession(backend=RedisCache(namespace='my-cache')) +``` +See {py:mod}`requests_cache.backends` for more backend-specific usage details, and see +{ref}`advanced_usage:custom backends` for details on creating your own implementation. + +### Cache Name +The `cache_name` parameter will be used as follows depending on the backend: +- `sqlite`: Database path, e.g `~/.cache/my_cache.sqlite` +- `dynamodb`: Table name +- `mongodb` and `gridfs`: Database name +- `redis`: Namespace, meaning all keys will be prefixed with `'<cache_name>:'` +- `filesystem`: Cache directory + +## Cache Options +A number of options are available to modify which responses are cached and how they are cached. + +### HTTP Methods +By default, only GET and HEAD requests are cached. To cache additional HTTP methods, specify them +with `allowable_methods`. For example, caching POST requests can be used to ensure you don't send +the same data multiple times: +```python +>>> session = CachedSession(allowable_methods=('GET', 'POST')) +>>> session.post('http://httpbin.org/post', json={'param': 'value'}) +``` + +### Status Codes +By default, only responses with a 200 status code are cached. To cache additional status codes, +specify them with `allowable_codes`" +```python +>>> session = CachedSession(allowable_codes=(200, 418)) +>>> session.get('http://httpbin.org/teapot') +``` + +### Request Parameters +By default, all request parameters are taken into account when caching responses. In some cases, +there may be request parameters that don't affect the response data, for example authentication tokens +or credentials. If you want to ignore specific parameters, specify them with `ignored_parameters`: +```python +>>> session = CachedSession(ignored_parameters=\['auth-token'\]) +>>> # Only the first request will be sent +>>> session.get('http://httpbin.org/get', params={'auth-token': '2F63E5DF4F44'}) +>>> session.get('http://httpbin.org/get', params={'auth-token': 'D9FAEB3449D3'}) +``` + +In addition to allowing the cache to ignore these parameters when fetching cached results, these +parameters will also be removed from the cache data, including in the request headers. +This makes `ignored_parameters` a good way to prevent key material or other secrets from being +saved in the cache backend. + +### Request Headers +In some cases, different headers may result in different response data, so you may want to cache +them separately. To enable this, use `include_get_headers`: +```python +>>> session = CachedSession(include_get_headers=True) +>>> # Both of these requests will be sent and cached separately +>>> session.get('http://httpbin.org/headers', {'Accept': 'text/plain'}) +>>> session.get('http://httpbin.org/headers', {'Accept': 'application/json'}) +``` + +## Cache Expiration +By default, cached responses will be stored indefinitely. There are a number of options for +specifying how long to store responses. The simplest option is to initialize the cache with an +`expire_after` value: +```python +>>> # Set expiration for the session using a value in seconds +>>> session = CachedSession(expire_after=360) +``` + +### Expiration Precedence +Expiration can be set on a per-session, per-URL, or per-request basis, in addition to cache +headers (see sections below for usage details). When there are multiple values provided for a given +request, the following order of precedence is used: +1. Cache-Control request headers (if enabled) +2. Cache-Control response headers (if enabled) +3. Per-request expiration (`expire_after` argument for {py:meth}`.CachedSession.request`) +4. Per-URL expiration (`urls_expire_after` argument for {py:class}`.CachedSession`) +5. Per-session expiration (`expire_after` argument for {py:class}`.CacheBackend`) + +### Expiration Values +`expire_after` can be any of the following: +- `-1` (to never expire) +- `0` (to "expire immediately," e.g. bypass the cache) +- A positive number (in seconds) +- A {py:class}`~datetime.timedelta` +- A {py:class}`~datetime.datetime` + +Examples: +```python +> >>> # To specify a unit of time other than seconds, use a timedelta +> >>> from datetime import timedelta +> >>> session = CachedSession(expire_after=timedelta(days=30)) +> +> >>> # Update an existing session to disable expiration (i.e., store indefinitely) +> >>> session.expire_after = -1 +> +> >>> # Disable caching by default, unless enabled by other settings +> >>> session = CachedSession(expire_after=0) +``` + +### URL Patterns +You can use `urls_expire_after` to set different expiration values for different requests, based on +URL glob patterns. This allows you to customize caching based on what you know about the resources +you're requesting. For example, you might request one resource that gets updated frequently, another +that changes infrequently, and another that never changes. Example: +```python +>>> urls_expire_after = { +... '\*.site_1.com': 30, +... 'site_2.com/resource_1': 60 * 2, +... 'site_2.com/resource_2': 60 * 60 * 24, +... 'site_2.com/static': -1, +... } +>>> session = CachedSession(urls_expire_after=urls_expire_after) +``` + +You can also use this to define a cache whitelist, so only the patterns you define will be cached: +```python +>>> urls_expire_after = { +... '\*.site_1.com': 30, +... 'site_2.com/static': -1, +... '\*': 0, # Every other non-matching URL: do not cache +... } +``` + +**Notes:** +- `urls_expire_after` should be a dict in the format `{'pattern': expire_after}` +- `expire_after` accepts the same types as `CachedSession.expire_after` +- Patterns will match request **base URLs**, so the pattern `site.com/resource/` is equivalent to + `http*://site.com/resource/**` +- If there is more than one match, the first match will be used in the order they are defined +- If no patterns match a request, `CachedSession.expire_after` will be used as a default. + +### Cache-Control +:::{warning} +This is **not** intended to be a thorough or strict implementation of header-based HTTP caching, +e.g. according to RFC 2616. +::: + +Optional support is included for a simplified subset of +[Cache-Control](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control) +and other cache headers in both requests and responses. To enable this behavior, use the +`cache_control` option: +```python +>>> session = CachedSession(cache_control=True) +``` + +**Supported request headers:** +- `Cache-Control: max-age`: Used as the expiration time in seconds +- `Cache-Control: no-cache`: Skips reading response data from the cache +- `Cache-Control: no-store`: Skips reading and writing response data from/to the cache + +**Supported response headers:** +- `Cache-Control: max-age`: Used as the expiration time in seconds +- `Cache-Control: no-store` Skips writing response data to the cache +- `Expires`: Used as an absolute expiration time + +**Notes:** +- Unlike a browser or proxy cache, `max-age=0` does not currently clear previously cached responses. +- If enabled, Cache-Control directives will take priority over any other `expire_after` value. + See {ref}`user_guide:expiration precedence` for the full order of precedence. + +### Removing Expired Responses +For better performance, expired responses won't be removed immediately, but will be removed +(or replaced) the next time they are requested. To manually clear all expired responses, use +{py:meth}`.CachedSession.remove_expired_responses`: +```python +>>> session.remove_expired_responses() +``` + +Or, when using patching: +```python +>>> requests_cache.remove_expired_responses() +``` + +You can also apply a different `expire_after` to previously cached responses, which will +revalidate the cache with the new expiration time: +```python +>>> session.remove_expired_responses(expire_after=timedelta(days=30)) +``` + +(serializers)= +## Serializers +By default, responses are serialized using {py:mod}`pickle`. Some other options are also available: + +:::{note} +These features require python 3.7+ and additional dependencies +::: + +### JSON Serializer +Storing responses as JSON gives you the benefit of making them human-readable, in exchange for a +slight reduction in performance. This can be especially useful in combination with the filesystem +backend. + +:::{admonition} Example JSON-serialized Response +:class: toggle +```{literalinclude} sample_response.json +:language: JSON +``` +::: + +You can install the extra dependencies for this serializer with: +```bash +pip install requests-cache[json] +``` + +### BSON Serializer +[BSON](https://www.mongodb.com/json-and-bson) is a serialization format originally created for +MongoDB, but it can also be used independently. Compared to JSON, it has better performance +(although still not as fast as `pickle`), and adds support for additional data types. It is not +human-readable, but some tools support reading and editing it directly +(for example, [bson-converter](https://atom.io/packages/bson-converter) for Atom). + +You can install the extra dependencies for this serializer with: +```bash +pip install requests-cache[mongo] +``` + +Or if you would like to use the standalone BSON codec for a different backend, without installing +MongoDB dependencies: +```bash +pip install requests-cache[bson] +``` + +## Error Handling +In some cases, you might cache a response, have it expire, but then encounter an error when +retrieving a new response. If you would like to use expired response data in these cases, use the +`old_data_on_error` option: +```python +>>> # Cache a test response that will expire immediately +>>> session = CachedSession(old_data_on_error=True) +>>> session.get('https://httpbin.org/get', expire_after=0.001) +>>> time.sleep(0.001) +``` + +Afterward, let's say the page has moved and you get a 404, or the site is experiencing downtime and +you get a 500. You will then get the expired cache data instead: +```python +>>> response = session.get('https://httpbin.org/get') +>>> print(response.from_cache, response.is_expired) +True, True +``` + +In addition to error codes, `old_data_on_error` also applies to exceptions (typically a +{py:exc}`~requests.RequestException`). See requests documentation on +[Errors and Exceptions](https://2.python-requests.org/en/master/user/quickstart/#errors-and-exceptions) +for more details on request errors in general. + +## Potential Issues +- Version updates of `requests`, `urllib3` or `requests-cache` itself may not be compatible with + previously cached data (see issues [#56](https://github.com/reclosedev/requests-cache/issues/56) + and [#102](https://github.com/reclosedev/requests-cache/issues/102)). + The best way to prevent this is to use a virtualenv and pin your dependency versions. +- See {ref}`security` for notes on serialization security |