summaryrefslogtreecommitdiff
path: root/docs/user_guide
diff options
context:
space:
mode:
authorJordan Cook <jordan.cook@pioneer.com>2022-04-16 21:54:47 -0500
committerJordan Cook <jordan.cook@pioneer.com>2022-04-17 13:42:21 -0500
commit613de4e1bb379d922cb7bd6c703fc81762f5d3bc (patch)
treeb25b7dcc56b80fa6e53a496c2ca1c4a543211629 /docs/user_guide
parentd6ee9143965d53dae44ca3a98802b2cc7ad6eeb7 (diff)
downloadrequests-cache-613de4e1bb379d922cb7bd6c703fc81762f5d3bc.tar.gz
Move backend docs to user guide, separate from API reference docs
Diffstat (limited to 'docs/user_guide')
-rw-r--r--docs/user_guide/backends.md61
-rw-r--r--docs/user_guide/backends/dynamodb.md48
-rw-r--r--docs/user_guide/backends/filesystem.md56
-rw-r--r--docs/user_guide/backends/gridfs.md23
-rw-r--r--docs/user_guide/backends/mongodb.md104
-rw-r--r--docs/user_guide/backends/redis.md66
-rw-r--r--docs/user_guide/backends/sqlite.md84
-rw-r--r--docs/user_guide/installation.md2
8 files changed, 408 insertions, 36 deletions
diff --git a/docs/user_guide/backends.md b/docs/user_guide/backends.md
index 2740af1..fa27ae9 100644
--- a/docs/user_guide/backends.md
+++ b/docs/user_guide/backends.md
@@ -1,41 +1,38 @@
(backends)=
# {fa}`database` Backends
-![](../_static/sqlite_32px.png)
-![](../_static/redis_32px.png)
-![](../_static/mongodb_32px.png)
-![](../_static/dynamodb_32px.png)
-![](../_static/files-json_32px.png)
-
This page contains general information about the cache backends supported by requests-cache.
-See {py:mod}`.requests_cache.backends` for additional details on each individual backend.
-The default backend is SQLite, since it's simple to use, requires no extra dependencies or
-configuration, and has the best all-around performance for the majority of use cases.
+The default backend is SQLite, since it requires no extra dependencies or configuration, and has
+great all-around performance for the most common use cases.
-```{note}
-In environments where SQLite is explicitly disabled, a non-persistent in-memory cache is used by
-default.
-```
+Here is a full list of backends available, and any extra dependencies required:
-## Backend Dependencies
-Most of the other backends require some extra dependencies, listed below.
+Backend | Class | Alias | Dependencies
+------------------------------------------------------|----------------------------|----------------|----------------------------------------------------------
+![](../_static/sqlite_32px.png) {ref}`sqlite` | {py:class}`.SQLiteCache` | `'sqlite'` |
+![](../_static/redis_32px.png) {ref}`redis` | {py:class}`.RedisCache` | `'redis'` | [redis-py](https://github.com/andymccurdy/redis-py)
+![](../_static/mongodb_32px.png) {ref}`mongodb` | {py:class}`.MongoCache` | `'mongodb'` | [pymongo](https://github.com/mongodb/mongo-python-driver)
+![](../_static/mongodb_32px.png) {ref}`gridfs` | {py:class}`.GridFSCache` | `'gridfs'` | [pymongo](https://github.com/mongodb/mongo-python-driver)
+![](../_static/dynamodb_32px.png) {ref}`dynamodb` | {py:class}`.DynamoDbCache` | `'dynamodb'` | [boto3](https://github.com/boto/boto3)
+![](../_static/files-json_32px.png) {ref}`filesystem` | {py:class}`.FileCache` | `'filesystem'` |
+![](../_static/memory_32px.png) Memory | {py:class}`.BaseCache` | `'memory'` |
-Backend | Class | Alias | Dependencies
--------------------------------------------------------|----------------------------|----------------|-------------
-[SQLite](https://www.sqlite.org) | {py:class}`.SQLiteCache` | `'sqlite'` |
-[Redis](https://redis.io) | {py:class}`.RedisCache` | `'redis'` | [redis-py](https://github.com/andymccurdy/redis-py)
-[MongoDB](https://www.mongodb.com) | {py:class}`.MongoCache` | `'mongodb'` | [pymongo](https://github.com/mongodb/mongo-python-driver)
-[GridFS](https://docs.mongodb.com/manual/core/gridfs/) | {py:class}`.GridFSCache` | `'gridfs'` | [pymongo](https://github.com/mongodb/mongo-python-driver)
-[DynamoDB](https://aws.amazon.com/dynamodb) | {py:class}`.DynamoDbCache` | `'dynamodb'` | [boto3](https://github.com/boto/boto3)
-Filesystem | {py:class}`.FileCache` | `'filesystem'` |
-Memory | {py:class}`.BaseCache` | `'memory'` |
+<!-- Hidden ToC tree to add pages to sidebar ToC -->
+```{toctree}
+:hidden:
+:glob: true
+
+backends/*
+```
## Choosing a Backend
Here are some general notes on choosing a backend:
* All of the backends perform well enough that they usually won't become a bottleneck until you
- start hitting around **700-1000 requests per second**.
-* It's recommended to start with SQLite until you have a specific reason to switch.
-* If/when you outgrow SQLite, the next logical choice would usually be Redis.
+ start hitting around **700-1000 requests per second**
+* It's recommended to start with SQLite until you have a specific reason to switch
+* If/when you encounter limitations with SQLite, the next logical choice is usually Redis
+* Each backend has some unique features that make them well suited for specific use cases; see
+ individual backend docs for more details
Here are some specific situations where you may want to choose one of the other backends:
* Your application is distributed across multiple machines, without access to a common filesystem
@@ -46,9 +43,6 @@ Here are some specific situations where you may want to choose one of the other
* You want to reuse your cached response data outside of requests-cache
* You want to use a specific feature available in one of the other backends
-Docs for {py:mod}`backend modules <requests_cache.backends>` contain more details on use cases
-for each one.
-
## Specifying a Backend
You can specify which backend to use with the `backend` parameter for either {py:class}`.CachedSession`
or {py:func}`.install_cache`. You can specify one by name, using the aliases listed above:
@@ -56,7 +50,7 @@ or {py:func}`.install_cache`. You can specify one by name, using the aliases lis
>>> session = CachedSession('my_cache', backend='redis')
```
-Or by instance:
+Or by instance, which is preferable if you want to pass additional backend-specific options:
```python
>>> backend = RedisCache(host='192.168.1.63', port=6379)
>>> session = CachedSession('my_cache', backend=backend)
@@ -74,10 +68,7 @@ DynamoDB | Table name
Filesystem | Cache directory
Each backend class also accepts optional parameters for the underlying connection. For example,
-{py:class}`.SQLiteCache` accepts parameters for {py:func}`sqlite3.connect`:
-```python
->>> session = CachedSession('my_cache', backend='sqlite', timeout=30)
-```
+the {ref}`sqlite` backend accepts parameters for {py:func}`sqlite3.connect`.
## Testing Backends
If you just want to quickly try out all of the available backends for comparison,
diff --git a/docs/user_guide/backends/dynamodb.md b/docs/user_guide/backends/dynamodb.md
new file mode 100644
index 0000000..ebd2e01
--- /dev/null
+++ b/docs/user_guide/backends/dynamodb.md
@@ -0,0 +1,48 @@
+(dynamodb)=
+# DynamoDB
+```{image} ../../_static/dynamodb.png
+```
+
+[DynamoDB](https://aws.amazon.com/dynamodb) is a fully managed, highly scalable NoSQL document
+database hosted on [Amazon Web Services](https://aws.amazon.com).
+
+## Use Cases
+In terms of features, DynamoDB is roughly comparable to MongoDB and other NoSQL databases. Since
+it's a managed service, no server setup or maintenance is required, and it's very convenient to use
+if your application is already on AWS. It is an especially good fit for serverless applications
+running on [AWS Lambda](https://aws.amazon.com/lambda).
+
+```{warning}
+DynamoDB item sizes are limited to 400KB. If you need to cache larger responses, consider
+using a different backend.
+```
+
+## Usage Example
+Initialize with a {py:class}`.DynamoDbCache` instance:
+```python
+>>> from requests_cache import CachedSession, DynamoDbCache
+>>> session = CachedSession(backend=DynamoDbCache())
+```
+
+Or by alias:
+```python
+>>> session = CachedSession(backend='dynamodb')
+```
+
+## Connection Options
+This backend accepts any keyword arguments for {py:meth}`boto3.session.Session.resource`:
+```python
+>>> backend = DynamoDbCache(region_name='us-west-2')
+>>> session = CachedSession(backend=backend)
+```
+
+## Creating Tables
+Tables will be automatically created if they don't already exist. This is convienient if you just
+want to quickly test out DynamoDB as a cache backend, but in a production environment you will
+likely want to create the tables yourself, for example with [CloudFormation](https://aws.amazon.com/cloudformation/) or [Terraform](https://www.terraform.io/). Here are the
+details you'll need:
+
+- Tables: two tables, named `responses` and `redirects`
+- Partition key (aka namespace): `namespace`
+- Range key (aka sort key): `key`
+- Attributes: `namespace` (string) and `key` (string)
diff --git a/docs/user_guide/backends/filesystem.md b/docs/user_guide/backends/filesystem.md
new file mode 100644
index 0000000..9fdee7c
--- /dev/null
+++ b/docs/user_guide/backends/filesystem.md
@@ -0,0 +1,56 @@
+(filesystem)=
+# Filesystem
+```{image} ../../_static/files-generic.png
+```
+
+This backend stores responses in files on the local filesystem, with one file per response.
+
+## Use Cases
+This backend is useful if you would like to use your cached response data outside of requests-cache,
+for example:
+
+- Manually viewing cached responses without the need for extra tools (e.g., with a simple text editor)
+- Using cached responses as sample data for automated tests
+- Reading cached responses directly from another application or library, without depending on requests-cache
+
+## Usage Example
+Initialize with a {py:class}`.FileCache` instance:
+```python
+>>> from requests_cache import CachedSession, FileCache
+>>> session = CachedSession(backend=FileCache())
+```
+
+Or by alias:
+```python
+>>> session = CachedSession(backend='filesystem')
+```
+
+## File Formats
+By default, responses are saved as pickle files. If you want to save responses in a human-readable
+format, you can use one of the other available {ref}`serializers`. For example, to save responses as
+JSON files:
+```python
+>>> session = CachedSession('~/http_cache', backend='filesystem', serializer='json')
+>>> session.get('https://httpbin.org/get')
+>>> print(list(session.cache.paths()))
+> ['/home/user/http_cache/4dc151d95200ec.json']
+```
+
+Or as YAML (requires `pyyaml`):
+```python
+>>> session = CachedSession('~/http_cache', backend='filesystem', serializer='yaml')
+>>> session.get('https://httpbin.org/get')
+>>> print(list(session.cache.paths()))
+> ['/home/user/http_cache/4dc151d95200ec.yaml']
+```
+
+## Cache Files
+- See {ref}`files` for general info on specifying cache paths
+- The path for a given response will be in the format `<cache_name>/<cache_key>`
+- Redirects are stored in a separate SQLite database, located at `<cache_name>/redirects.sqlite`
+- Use {py:meth}`.FileCache.paths` to get a list of all cached response paths
+
+## Performance and Limitations
+- Write performance will vary based on the serializer used, in the range of roughly 1-3ms per write.
+- This backend stores response files in a single directory, and does not currently implement fan-out. This means that on most filesystems, storing a very large number of responses will result in reduced performance.
+- This backend currently uses a simple threading lock rather than a file lock system, so it is not an ideal choice for highly parallel applications.
diff --git a/docs/user_guide/backends/gridfs.md b/docs/user_guide/backends/gridfs.md
new file mode 100644
index 0000000..568d5b9
--- /dev/null
+++ b/docs/user_guide/backends/gridfs.md
@@ -0,0 +1,23 @@
+(gridfs)=
+# GridFS
+```{image} ../../_static/mongodb.png
+```
+
+[GridFS](https://docs.mongodb.com/manual/core/gridfs/) is a specification for storing large files
+in MongoDB.
+
+## Use Cases
+Use this backend if you are using MongoDB and expect to store responses **larger than 16MB**. See
+{py:mod}`~requests_cache.backends.mongodb` for more general info.
+
+## Usage Example
+Initialize with a {py:class}`.GridFSCache` instance:
+```python
+>>> from requests_cache import CachedSession, GridFSCache
+>>> session = CachedSession(backend=GridFSCache())
+```
+
+Or by alias:
+```python
+>>> session = CachedSession(backend='gridfs')
+```
diff --git a/docs/user_guide/backends/mongodb.md b/docs/user_guide/backends/mongodb.md
new file mode 100644
index 0000000..11201b2
--- /dev/null
+++ b/docs/user_guide/backends/mongodb.md
@@ -0,0 +1,104 @@
+(mongodb)=
+# MongoDB
+```{image} ../../_static/mongodb.png
+```
+
+[MongoDB](https://www.mongodb.com) is a NoSQL document database. It stores data in collections
+of documents, which are more flexible and less strictly structured than tables in a relational
+database.
+
+## Use Cases
+MongoDB scales well and is a good option for larger applications. For raw caching performance, it is
+not quite as fast as {py:mod}`~requests_cache.backends.redis`, but may be preferable if you already
+have an instance running, or if it has a specific feature you want to use. See sections below for
+some relevant examples.
+
+## Usage Example
+Initialize with a {py:class}`.MongoCache` instance:
+```python
+>>> from requests_cache import CachedSession, MongoCache
+>>> session = CachedSession(backend=MongoCache())
+```
+
+Or by alias:
+```python
+>>> session = CachedSession(backend='mongodb')
+```
+
+## Connection Options
+This backend accepts any keyword arguments for {py:class}`pymongo.mongo_client.MongoClient`:
+```python
+>>> backend = MongoCache(host='192.168.1.63', port=27017)
+>>> session = CachedSession('http_cache', backend=backend)
+```
+
+## Viewing Responses
+Unlike most of the other backends, response data can be easily viewed via the
+[MongoDB shell](https://www.mongodb.com/docs/mongodb-shell/#mongodb-binary-bin.mongosh),
+[Compass](https://www.mongodb.com/products/compass), or any other interface for MongoDB. This is
+possible because its internal document format ([BSON](https://www.mongodb.com/json-and-bson))
+supports all the types needed to store a response as a plain document rather than a fully serialized
+blob.
+
+Here is an example response viewed in
+[MongoDB for VSCode](https://code.visualstudio.com/docs/azure/mongodb):
+
+:::{admonition} Screenshot
+:class: toggle
+
+```{image} ../../_static/mongodb_vscode.png
+```
+:::
+
+## Expiration
+MongoDB [natively supports TTL](https://www.mongodb.com/docs/v4.0/core/index-ttl), and can
+automatically remove expired responses from the cache.
+
+**Notes:**
+- TTL is set for a whole collection, and cannot be set on a per-document basis.
+- It will persist until explicitly removed or overwritten, or if the collection is deleted.
+- Expired items are
+ [not guaranteed to be removed immediately](https://www.mongodb.com/docs/v4.0/core/index-ttl/#timing-of-the-delete-operation).
+ Typically it happens within 60 seconds.
+- If you want, you can rely entirely on MongoDB TTL instead of requests-cache
+ {ref}`expiration settings <expiration>`.
+- Or you can set both values, to be certain that you don't get an expired response before MongoDB
+ removes it.
+- If you intend to reuse expired responses, e.g. with {ref}`conditional-requests` or `stale_if_error`,
+ you can set TTL to a larger value than your session `expire_after`, or disable it altogether.
+
+**Examples:**
+Create a TTL index:
+```python
+>>> backend = MongoCache()
+>>> backend.set_ttl(3600)
+```
+
+Overwrite it with a new value:
+```python
+>>> backend = MongoCache()
+>>> backend.set_ttl(timedelta(days=1), overwrite=True)
+```
+
+Remove the TTL index:
+```python
+>>> backend = MongoCache()
+>>> backend.set_ttl(None, overwrite=True)
+```
+
+Use both MongoDB TTL and requests-cache expiration:
+```python
+>>> ttl = timedelta(days=1)
+>>> backend = MongoCache()
+>>> backend.set_ttl(ttl)
+>>> session = CachedSession(backend=backend, expire_after=ttl)
+```
+
+**Recommended:** Set MongoDB TTL to a longer value than your {py:class}`.CachedSession` expiration.
+This allows expired responses to be eventually cleaned up, but still be reused for conditional
+requests for some period of time:
+```python
+>>> backend = MongoCache()
+>>> backend.set_ttl(timedelta(days=7))
+>>> session = CachedSession(backend=backend, expire_after=timedelta(days=1))
+```
diff --git a/docs/user_guide/backends/redis.md b/docs/user_guide/backends/redis.md
new file mode 100644
index 0000000..cfe1898
--- /dev/null
+++ b/docs/user_guide/backends/redis.md
@@ -0,0 +1,66 @@
+(redis)=
+# Redis
+```{image} ../../_static/redis.png
+```
+
+[Redis](https://redis.io) is an in-memory data store with on-disk persistence.
+
+## Use Cases
+Redis offers a high-performace cache that scales exceptionally well, making it an ideal choice for
+larger applications, especially those that make a large volume of concurrent requests.
+
+## Usage Example
+Initialize your session with a {py:class}`.RedisCache` instance:
+```python
+>>> from requests_cache import CachedSession, RedisCache
+>>> session = CachedSession(backend=RedisCache())
+```
+
+Or by alias:
+```python
+>>> session = CachedSession(backend='redis')
+```
+
+## Connection Options
+This backend accepts any keyword arguments for {py:class}`redis.client.Redis`:
+```python
+>>> backend = RedisCache(host='192.168.1.63', port=6379)
+>>> session = CachedSession('http_cache', backend=backend)
+```
+
+Or you can pass an existing `Redis` object:
+```python
+>>> from redis import Redis
+
+>>> connection = Redis(host='192.168.1.63', port=6379)
+>>> backend = RedisCache(connection=connection))
+>>> session = CachedSession('http_cache', backend=backend)
+```
+
+## Persistence
+Redis operates on data in memory, and by default also persists data to snapshots on disk. This is
+optimized for performance, with a minor risk of data loss, and is usually the best configuration
+for a cache. If you need different behavior, the frequency and type of persistence can be customized
+or disabled entirely. See [Redis Persistence](https://redis.io/topics/persistence) for details.
+
+## Expiration
+Redis natively supports TTL on a per-key basis, and can automatically remove expired responses from
+the cache. This will be set by by default, according to normal {ref}`expiration settings <expiration>`.
+
+If you intend to reuse expired responses, e.g. with {ref}`conditional-requests` or `stale_if_error`,
+you can disable this behavior with the `ttl` argument:
+```python
+>>> backend = RedisCache(ttl=False)
+```
+
+## Redislite
+If you can't easily set up your own Redis server, another option is
+[redislite](https://github.com/yahoo/redislite). It contains its own lightweight, embedded Redis
+database, and can be used as a drop-in replacement for redis-py. Usage example:
+```python
+>>> from redislite import Redis
+>>> from requests_cache import CachedSession, RedisCache
+
+>>> backend = RedisCache(connection=Redis())
+>>> session = CachedSession(backend=backend)
+```
diff --git a/docs/user_guide/backends/sqlite.md b/docs/user_guide/backends/sqlite.md
new file mode 100644
index 0000000..c4e0744
--- /dev/null
+++ b/docs/user_guide/backends/sqlite.md
@@ -0,0 +1,84 @@
+(sqlite)=
+# SQLite
+```{image} ../../_static/sqlite.png
+```
+[SQLite](https://www.sqlite.org/) is a fast and lightweight SQL database engine that stores data
+either in memory or in a single file on disk.
+
+## Use Cases
+Despite its simplicity, SQLite is a powerful tool. For example, it's the primary storage system for
+a number of common applications including Firefox, Chrome, and many components of both Android and
+iOS. It's well suited for caching, and requires no extra configuration or dependencies, which is why
+it's 'used as the default backend for requests-cache.
+
+## Usage Example
+SQLite is the default backend, but if you want to pass extra connection options or just want to be
+explicit, initialize your session with a {py:class}`.SQLiteCache` instance:
+```python
+>>> from requests_cache import CachedSession, SQLiteCache
+>>> session = CachedSession(backend=SQLiteCache())
+```
+
+Or by alias:
+```python
+>>> session = CachedSession(backend='sqlite')
+```
+
+## Connection Options
+This backend accepts any keyword arguments for {py:func}`sqlite3.connect`:
+```python
+>>> backend = SQLiteCache('http_cache', timeout=30)
+>>> session = CachedSession(backend=backend)
+```
+
+## Cache Files
+- See {ref}`files` for general info on specifying cache paths
+- If you specify a name without an extension, the default extension `.sqlite` will be used
+
+### In-Memory Caching
+SQLite also supports [in-memory databases](https://www.sqlite.org/inmemorydb.html).
+You can enable this (in "shared" memory mode) with the `use_memory` option:
+```python
+>>> session = CachedSession('http_cache', use_memory=True)
+```
+
+Or specify a memory URI with additional options:
+```python
+>>> session = CachedSession(':file:memdb1?mode=memory')
+```
+
+Or just `:memory:`, if you are only using the cache from a single thread:
+```python
+>>> session = CachedSession(':memory:')
+```
+
+## Performance
+When working with average-sized HTTP responses (\< 1MB) and using a modern SSD for file storage, you
+can expect speeds of around:
+- Write: 2-8ms
+- Read: 0.2-0.6ms
+
+Of course, this will vary based on hardware specs, response size, and other factors.
+
+## Concurrency
+SQLite supports concurrent access, so it is safe to use from a multi-threaded and/or multi-process
+application. It supports unlimited concurrent reads. Writes, however, are queued and run in serial,
+so if you need to make large volumes of concurrent requests, you may want to consider a different
+backend that's specifically made for that kind of workload, like {py:class}`.RedisCache`.
+
+## Hosting Services and Filesystem Compatibility
+There are some caveats to using SQLite with some hosting services, based on what kind of storage is
+available:
+
+- NFS:
+ - SQLite may be used on a NFS, but is usually only safe to use from a single process at a time.
+ See the [SQLite FAQ](https://www.sqlite.org/faq.html#q5) for details.
+ - PythonAnywhere is one example of a host that uses NFS-backed storage. Using SQLite from a
+ multiprocess application will likely result in `sqlite3.OperationalError: database is locked`.
+- Ephemeral storage:
+ - Heroku [explicitly disables SQLite](https://devcenter.heroku.com/articles/sqlite3) on its dynos.
+ - AWS [EC2](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html),
+ [Lambda (depending on configuration)](https://aws.amazon.com/blogs/compute/choosing-between-aws-lambda-data-storage-options-in-web-apps/),
+ and some other AWS services use ephemeral storage that only persists for the lifetime of the
+ instance. This is fine for short-term caching. For longer-term persistance, you can use an
+ [attached EBS volume](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-attaching-volume.html).
diff --git a/docs/user_guide/installation.md b/docs/user_guide/installation.md
index 174baf3..a5d4176 100644
--- a/docs/user_guide/installation.md
+++ b/docs/user_guide/installation.md
@@ -39,4 +39,4 @@ of python, here are the latest compatible versions and their documentation pages
* **python 2.7:** [requests-cache 0.5.2](https://requests-cache.readthedocs.io/en/v0.5.0)
* **python 3.4:** [requests-cache 0.5.2](https://requests-cache.readthedocs.io/en/v0.5.0)
* **python 3.5:** [requests-cache 0.5.2](https://requests-cache.readthedocs.io/en/v0.5.0)
-* **python 3.6:** [requests-cache 0.7.4](https://requests-cache.readthedocs.io/en/v0.7.4)
+* **python 3.6:** [requests-cache 0.7.5](https://requests-cache.readthedocs.io/en/v0.7.5)