summaryrefslogtreecommitdiff
path: root/docs/user_guide.md
diff options
context:
space:
mode:
authorJordan Cook <jordan.cook@pioneer.com>2021-07-07 15:04:41 -0500
committerJordan Cook <jordan.cook@pioneer.com>2021-07-09 20:07:25 -0500
commit7be287ec477e38b3611e35ab23123d8b00d10593 (patch)
tree3aa10e3957d7065f8c23cb7d5a750e326f247071 /docs/user_guide.md
parentb4766ff2c2b5c6cf637cac163b39d7adc499faaa (diff)
downloadrequests-cache-7be287ec477e38b3611e35ab23123d8b00d10593.tar.gz
Convert docs from rST to MyST, and switch to Furo theme
Diffstat (limited to 'docs/user_guide.md')
-rw-r--r--docs/user_guide.md381
1 files changed, 381 insertions, 0 deletions
diff --git a/docs/user_guide.md b/docs/user_guide.md
new file mode 100644
index 0000000..68cd2a1
--- /dev/null
+++ b/docs/user_guide.md
@@ -0,0 +1,381 @@
+# User Guide
+This section covers the main features of requests-cache.
+
+## Installation
+Install with pip:
+```
+$ pip install requests-cache
+```
+
+### Requirements
+- Requires python 3.6+.
+- You may need additional dependencies depending on which backend you want to use. To install with
+ extra dependencies for all supported {ref}`user_guide:cache backends`:
+ ```
+ $ pip install requests-cache\[backends\]
+ ```
+### Optional Setup Steps
+- See {ref}`security` for recommended setup steps for more secure cache serialization.
+- See {ref}`Contributing Guide <contributing:dev installation>` for setup steps for local development.
+
+## General Usage
+There are two main ways of using requests-cache:
+- **Sessions:** (recommended) Use {py:class}`.CachedSession` to send your requests
+- **Patching:** Globally patch `requests` using {py:func}`.install_cache()`
+
+### Sessions
+{py:class}`.CachedSession` can be used as a drop-in replacement for {py:class}`requests.Session`.
+Basic usage looks like this:
+```python
+>>> from requests_cache import CachedSession
+>>>
+>>> session = CachedSession()
+>>> session.get('http://httpbin.org/get')
+```
+
+Any {py:class}`requests.Session` method can be used (but see {ref}`user_guide:http methods` section
+below for config details):
+```python
+>>> session.request('GET', 'http://httpbin.org/get')
+>>> session.head('http://httpbin.org/get')
+```
+
+Caching can be temporarily disabled with {py:meth}`.CachedSession.cache_disabled`:
+```python
+>>> with session.cache_disabled():
+... session.get('http://httpbin.org/get')
+```
+
+The best way to clean up your cache is through {ref}`user_guide:cache expiration`, but you can also
+clear out everything at once with {py:meth}`.BaseCache.clear`:
+```python
+>>> session.cache.clear()
+```
+
+### Patching
+In some situations, it may not be possible or convenient to manage your own session object. In those
+cases, you can use {py:func}`.install_cache` to add caching to all `requests` functions:
+```python
+>>> import requests
+>>> import requests_cache
+>>>
+>>> requests_cache.install_cache()
+>>> requests.get('http://httpbin.org/get')
+```
+
+As well as session methods:
+```python
+>>> session = requests.Session()
+>>> session.get('http://httpbin.org/get')
+```
+
+{py:func}`.install_cache` accepts all the same parameters as {py:class}`.CachedSession`:
+```python
+>>> requests_cache.install_cache(expire_after=360, allowable_methods=('GET', 'POST'))
+```
+
+It can be temporarily {py:func}`.enabled`:
+```python
+>>> with requests_cache.enabled():
+... requests.get('http://httpbin.org/get') # Will be cached
+```
+
+Or temporarily {py:func}`.disabled`:
+```python
+>>> requests_cache.install_cache()
+>>> with requests_cache.disabled():
+... requests.get('http://httpbin.org/get') # Will not be cached
+```
+
+Or completely removed with {py:func}`.uninstall_cache`:
+```python
+>>> requests_cache.uninstall_cache()
+>>> requests.get('http://httpbin.org/get')
+```
+
+You can also clear out all responses in the cache with {py:func}`.clear`, and check if
+requests-cache is currently installed with {py:func}`.is_installed`.
+
+#### Limitations
+Like any other utility that uses global patching, there are some scenarios where you won't want to
+use {py:func}`.install_cache`:
+- In a multi-threaded or multiprocess application
+- In an application that uses other packages that extend or modify {py:class}`requests.Session`
+- In a package that will be used by other packages or applications
+
+## Cache Backends
+Several cache backends are included, which can be selected with
+the `backend` parameter for either {py:class}`.CachedSession` or {py:func}`.install_cache`:
+
+- `'sqlite'`: [SQLite](https://www.sqlite.org) database (**default**)
+- `'redis'`: [Redis](https://redis.io) cache (requires `redis`)
+- `'mongodb'`: [MongoDB](https://www.mongodb.com) database (requires `pymongo`)
+- `'gridfs'`: [GridFS](https://docs.mongodb.com/manual/core/gridfs/) collections on a MongoDB database (requires `pymongo`)
+- `'dynamodb'`: [Amazon DynamoDB](https://aws.amazon.com/dynamodb) database (requires `boto3`)
+- `'filesystem'`: Stores responses as files on the local filesystem
+- `'memory'` : A non-persistent cache that just stores responses in memory
+
+A backend can be specified either by name, class or instance:
+```python
+>>> from requests_cache.backends import RedisCache
+>>> from requests_cache import CachedSession
+
+>>> # Backend name
+>>> session = CachedSession(backend='redis', namespace='my-cache')
+
+>>> # Backend class
+>>> session = CachedSession(backend=RedisCache, namespace='my-cache')
+
+>>> # Backend instance
+>>> session = CachedSession(backend=RedisCache(namespace='my-cache'))
+```
+See {py:mod}`requests_cache.backends` for more backend-specific usage details, and see
+{ref}`advanced_usage:custom backends` for details on creating your own implementation.
+
+### Cache Name
+The `cache_name` parameter will be used as follows depending on the backend:
+- `sqlite`: Database path, e.g `~/.cache/my_cache.sqlite`
+- `dynamodb`: Table name
+- `mongodb` and `gridfs`: Database name
+- `redis`: Namespace, meaning all keys will be prefixed with `'<cache_name>:'`
+- `filesystem`: Cache directory
+
+## Cache Options
+A number of options are available to modify which responses are cached and how they are cached.
+
+### HTTP Methods
+By default, only GET and HEAD requests are cached. To cache additional HTTP methods, specify them
+with `allowable_methods`. For example, caching POST requests can be used to ensure you don't send
+the same data multiple times:
+```python
+>>> session = CachedSession(allowable_methods=('GET', 'POST'))
+>>> session.post('http://httpbin.org/post', json={'param': 'value'})
+```
+
+### Status Codes
+By default, only responses with a 200 status code are cached. To cache additional status codes,
+specify them with `allowable_codes`"
+```python
+>>> session = CachedSession(allowable_codes=(200, 418))
+>>> session.get('http://httpbin.org/teapot')
+```
+
+### Request Parameters
+By default, all request parameters are taken into account when caching responses. In some cases,
+there may be request parameters that don't affect the response data, for example authentication tokens
+or credentials. If you want to ignore specific parameters, specify them with `ignored_parameters`:
+```python
+>>> session = CachedSession(ignored_parameters=\['auth-token'\])
+>>> # Only the first request will be sent
+>>> session.get('http://httpbin.org/get', params={'auth-token': '2F63E5DF4F44'})
+>>> session.get('http://httpbin.org/get', params={'auth-token': 'D9FAEB3449D3'})
+```
+
+In addition to allowing the cache to ignore these parameters when fetching cached results, these
+parameters will also be removed from the cache data, including in the request headers.
+This makes `ignored_parameters` a good way to prevent key material or other secrets from being
+saved in the cache backend.
+
+### Request Headers
+In some cases, different headers may result in different response data, so you may want to cache
+them separately. To enable this, use `include_get_headers`:
+```python
+>>> session = CachedSession(include_get_headers=True)
+>>> # Both of these requests will be sent and cached separately
+>>> session.get('http://httpbin.org/headers', {'Accept': 'text/plain'})
+>>> session.get('http://httpbin.org/headers', {'Accept': 'application/json'})
+```
+
+## Cache Expiration
+By default, cached responses will be stored indefinitely. There are a number of options for
+specifying how long to store responses. The simplest option is to initialize the cache with an
+`expire_after` value:
+```python
+>>> # Set expiration for the session using a value in seconds
+>>> session = CachedSession(expire_after=360)
+```
+
+### Expiration Precedence
+Expiration can be set on a per-session, per-URL, or per-request basis, in addition to cache
+headers (see sections below for usage details). When there are multiple values provided for a given
+request, the following order of precedence is used:
+1. Cache-Control request headers (if enabled)
+2. Cache-Control response headers (if enabled)
+3. Per-request expiration (`expire_after` argument for {py:meth}`.CachedSession.request`)
+4. Per-URL expiration (`urls_expire_after` argument for {py:class}`.CachedSession`)
+5. Per-session expiration (`expire_after` argument for {py:class}`.CacheBackend`)
+
+### Expiration Values
+`expire_after` can be any of the following:
+- `-1` (to never expire)
+- `0` (to "expire immediately," e.g. bypass the cache)
+- A positive number (in seconds)
+- A {py:class}`~datetime.timedelta`
+- A {py:class}`~datetime.datetime`
+
+Examples:
+```python
+> >>> # To specify a unit of time other than seconds, use a timedelta
+> >>> from datetime import timedelta
+> >>> session = CachedSession(expire_after=timedelta(days=30))
+>
+> >>> # Update an existing session to disable expiration (i.e., store indefinitely)
+> >>> session.expire_after = -1
+>
+> >>> # Disable caching by default, unless enabled by other settings
+> >>> session = CachedSession(expire_after=0)
+```
+
+### URL Patterns
+You can use `urls_expire_after` to set different expiration values for different requests, based on
+URL glob patterns. This allows you to customize caching based on what you know about the resources
+you're requesting. For example, you might request one resource that gets updated frequently, another
+that changes infrequently, and another that never changes. Example:
+```python
+>>> urls_expire_after = {
+... '\*.site_1.com': 30,
+... 'site_2.com/resource_1': 60 * 2,
+... 'site_2.com/resource_2': 60 * 60 * 24,
+... 'site_2.com/static': -1,
+... }
+>>> session = CachedSession(urls_expire_after=urls_expire_after)
+```
+
+You can also use this to define a cache whitelist, so only the patterns you define will be cached:
+```python
+>>> urls_expire_after = {
+... '\*.site_1.com': 30,
+... 'site_2.com/static': -1,
+... '\*': 0, # Every other non-matching URL: do not cache
+... }
+```
+
+**Notes:**
+- `urls_expire_after` should be a dict in the format `{'pattern': expire_after}`
+- `expire_after` accepts the same types as `CachedSession.expire_after`
+- Patterns will match request **base URLs**, so the pattern `site.com/resource/` is equivalent to
+ `http*://site.com/resource/**`
+- If there is more than one match, the first match will be used in the order they are defined
+- If no patterns match a request, `CachedSession.expire_after` will be used as a default.
+
+### Cache-Control
+:::{warning}
+This is **not** intended to be a thorough or strict implementation of header-based HTTP caching,
+e.g. according to RFC 2616.
+:::
+
+Optional support is included for a simplified subset of
+[Cache-Control](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control)
+and other cache headers in both requests and responses. To enable this behavior, use the
+`cache_control` option:
+```python
+>>> session = CachedSession(cache_control=True)
+```
+
+**Supported request headers:**
+- `Cache-Control: max-age`: Used as the expiration time in seconds
+- `Cache-Control: no-cache`: Skips reading response data from the cache
+- `Cache-Control: no-store`: Skips reading and writing response data from/to the cache
+
+**Supported response headers:**
+- `Cache-Control: max-age`: Used as the expiration time in seconds
+- `Cache-Control: no-store` Skips writing response data to the cache
+- `Expires`: Used as an absolute expiration time
+
+**Notes:**
+- Unlike a browser or proxy cache, `max-age=0` does not currently clear previously cached responses.
+- If enabled, Cache-Control directives will take priority over any other `expire_after` value.
+ See {ref}`user_guide:expiration precedence` for the full order of precedence.
+
+### Removing Expired Responses
+For better performance, expired responses won't be removed immediately, but will be removed
+(or replaced) the next time they are requested. To manually clear all expired responses, use
+{py:meth}`.CachedSession.remove_expired_responses`:
+```python
+>>> session.remove_expired_responses()
+```
+
+Or, when using patching:
+```python
+>>> requests_cache.remove_expired_responses()
+```
+
+You can also apply a different `expire_after` to previously cached responses, which will
+revalidate the cache with the new expiration time:
+```python
+>>> session.remove_expired_responses(expire_after=timedelta(days=30))
+```
+
+(serializers)=
+## Serializers
+By default, responses are serialized using {py:mod}`pickle`. Some other options are also available:
+
+:::{note}
+These features require python 3.7+ and additional dependencies
+:::
+
+### JSON Serializer
+Storing responses as JSON gives you the benefit of making them human-readable, in exchange for a
+slight reduction in performance. This can be especially useful in combination with the filesystem
+backend.
+
+:::{admonition} Example JSON-serialized Response
+:class: toggle
+```{literalinclude} sample_response.json
+:language: JSON
+```
+:::
+
+You can install the extra dependencies for this serializer with:
+```bash
+pip install requests-cache[json]
+```
+
+### BSON Serializer
+[BSON](https://www.mongodb.com/json-and-bson) is a serialization format originally created for
+MongoDB, but it can also be used independently. Compared to JSON, it has better performance
+(although still not as fast as `pickle`), and adds support for additional data types. It is not
+human-readable, but some tools support reading and editing it directly
+(for example, [bson-converter](https://atom.io/packages/bson-converter) for Atom).
+
+You can install the extra dependencies for this serializer with:
+```bash
+pip install requests-cache[mongo]
+```
+
+Or if you would like to use the standalone BSON codec for a different backend, without installing
+MongoDB dependencies:
+```bash
+pip install requests-cache[bson]
+```
+
+## Error Handling
+In some cases, you might cache a response, have it expire, but then encounter an error when
+retrieving a new response. If you would like to use expired response data in these cases, use the
+`old_data_on_error` option:
+```python
+>>> # Cache a test response that will expire immediately
+>>> session = CachedSession(old_data_on_error=True)
+>>> session.get('https://httpbin.org/get', expire_after=0.001)
+>>> time.sleep(0.001)
+```
+
+Afterward, let's say the page has moved and you get a 404, or the site is experiencing downtime and
+you get a 500. You will then get the expired cache data instead:
+```python
+>>> response = session.get('https://httpbin.org/get')
+>>> print(response.from_cache, response.is_expired)
+True, True
+```
+
+In addition to error codes, `old_data_on_error` also applies to exceptions (typically a
+{py:exc}`~requests.RequestException`). See requests documentation on
+[Errors and Exceptions](https://2.python-requests.org/en/master/user/quickstart/#errors-and-exceptions)
+for more details on request errors in general.
+
+## Potential Issues
+- Version updates of `requests`, `urllib3` or `requests-cache` itself may not be compatible with
+ previously cached data (see issues [#56](https://github.com/reclosedev/requests-cache/issues/56)
+ and [#102](https://github.com/reclosedev/requests-cache/issues/102)).
+ The best way to prevent this is to use a virtualenv and pin your dependency versions.
+- See {ref}`security` for notes on serialization security