diff options
author | Jordan Cook <jordan.cook@pioneer.com> | 2022-04-16 21:54:47 -0500 |
---|---|---|
committer | Jordan Cook <jordan.cook@pioneer.com> | 2022-04-17 13:42:21 -0500 |
commit | 613de4e1bb379d922cb7bd6c703fc81762f5d3bc (patch) | |
tree | b25b7dcc56b80fa6e53a496c2ca1c4a543211629 /docs/user_guide | |
parent | d6ee9143965d53dae44ca3a98802b2cc7ad6eeb7 (diff) | |
download | requests-cache-613de4e1bb379d922cb7bd6c703fc81762f5d3bc.tar.gz |
Move backend docs to user guide, separate from API reference docs
Diffstat (limited to 'docs/user_guide')
-rw-r--r-- | docs/user_guide/backends.md | 61 | ||||
-rw-r--r-- | docs/user_guide/backends/dynamodb.md | 48 | ||||
-rw-r--r-- | docs/user_guide/backends/filesystem.md | 56 | ||||
-rw-r--r-- | docs/user_guide/backends/gridfs.md | 23 | ||||
-rw-r--r-- | docs/user_guide/backends/mongodb.md | 104 | ||||
-rw-r--r-- | docs/user_guide/backends/redis.md | 66 | ||||
-rw-r--r-- | docs/user_guide/backends/sqlite.md | 84 | ||||
-rw-r--r-- | docs/user_guide/installation.md | 2 |
8 files changed, 408 insertions, 36 deletions
diff --git a/docs/user_guide/backends.md b/docs/user_guide/backends.md index 2740af1..fa27ae9 100644 --- a/docs/user_guide/backends.md +++ b/docs/user_guide/backends.md @@ -1,41 +1,38 @@ (backends)= # {fa}`database` Backends -![](../_static/sqlite_32px.png) -![](../_static/redis_32px.png) -![](../_static/mongodb_32px.png) -![](../_static/dynamodb_32px.png) -![](../_static/files-json_32px.png) - This page contains general information about the cache backends supported by requests-cache. -See {py:mod}`.requests_cache.backends` for additional details on each individual backend. -The default backend is SQLite, since it's simple to use, requires no extra dependencies or -configuration, and has the best all-around performance for the majority of use cases. +The default backend is SQLite, since it requires no extra dependencies or configuration, and has +great all-around performance for the most common use cases. -```{note} -In environments where SQLite is explicitly disabled, a non-persistent in-memory cache is used by -default. -``` +Here is a full list of backends available, and any extra dependencies required: -## Backend Dependencies -Most of the other backends require some extra dependencies, listed below. +Backend | Class | Alias | Dependencies +------------------------------------------------------|----------------------------|----------------|---------------------------------------------------------- +![](../_static/sqlite_32px.png) {ref}`sqlite` | {py:class}`.SQLiteCache` | `'sqlite'` | +![](../_static/redis_32px.png) {ref}`redis` | {py:class}`.RedisCache` | `'redis'` | [redis-py](https://github.com/andymccurdy/redis-py) +![](../_static/mongodb_32px.png) {ref}`mongodb` | {py:class}`.MongoCache` | `'mongodb'` | [pymongo](https://github.com/mongodb/mongo-python-driver) +![](../_static/mongodb_32px.png) {ref}`gridfs` | {py:class}`.GridFSCache` | `'gridfs'` | [pymongo](https://github.com/mongodb/mongo-python-driver) +![](../_static/dynamodb_32px.png) {ref}`dynamodb` | {py:class}`.DynamoDbCache` | `'dynamodb'` | [boto3](https://github.com/boto/boto3) +![](../_static/files-json_32px.png) {ref}`filesystem` | {py:class}`.FileCache` | `'filesystem'` | +![](../_static/memory_32px.png) Memory | {py:class}`.BaseCache` | `'memory'` | -Backend | Class | Alias | Dependencies --------------------------------------------------------|----------------------------|----------------|------------- -[SQLite](https://www.sqlite.org) | {py:class}`.SQLiteCache` | `'sqlite'` | -[Redis](https://redis.io) | {py:class}`.RedisCache` | `'redis'` | [redis-py](https://github.com/andymccurdy/redis-py) -[MongoDB](https://www.mongodb.com) | {py:class}`.MongoCache` | `'mongodb'` | [pymongo](https://github.com/mongodb/mongo-python-driver) -[GridFS](https://docs.mongodb.com/manual/core/gridfs/) | {py:class}`.GridFSCache` | `'gridfs'` | [pymongo](https://github.com/mongodb/mongo-python-driver) -[DynamoDB](https://aws.amazon.com/dynamodb) | {py:class}`.DynamoDbCache` | `'dynamodb'` | [boto3](https://github.com/boto/boto3) -Filesystem | {py:class}`.FileCache` | `'filesystem'` | -Memory | {py:class}`.BaseCache` | `'memory'` | +<!-- Hidden ToC tree to add pages to sidebar ToC --> +```{toctree} +:hidden: +:glob: true + +backends/* +``` ## Choosing a Backend Here are some general notes on choosing a backend: * All of the backends perform well enough that they usually won't become a bottleneck until you - start hitting around **700-1000 requests per second**. -* It's recommended to start with SQLite until you have a specific reason to switch. -* If/when you outgrow SQLite, the next logical choice would usually be Redis. + start hitting around **700-1000 requests per second** +* It's recommended to start with SQLite until you have a specific reason to switch +* If/when you encounter limitations with SQLite, the next logical choice is usually Redis +* Each backend has some unique features that make them well suited for specific use cases; see + individual backend docs for more details Here are some specific situations where you may want to choose one of the other backends: * Your application is distributed across multiple machines, without access to a common filesystem @@ -46,9 +43,6 @@ Here are some specific situations where you may want to choose one of the other * You want to reuse your cached response data outside of requests-cache * You want to use a specific feature available in one of the other backends -Docs for {py:mod}`backend modules <requests_cache.backends>` contain more details on use cases -for each one. - ## Specifying a Backend You can specify which backend to use with the `backend` parameter for either {py:class}`.CachedSession` or {py:func}`.install_cache`. You can specify one by name, using the aliases listed above: @@ -56,7 +50,7 @@ or {py:func}`.install_cache`. You can specify one by name, using the aliases lis >>> session = CachedSession('my_cache', backend='redis') ``` -Or by instance: +Or by instance, which is preferable if you want to pass additional backend-specific options: ```python >>> backend = RedisCache(host='192.168.1.63', port=6379) >>> session = CachedSession('my_cache', backend=backend) @@ -74,10 +68,7 @@ DynamoDB | Table name Filesystem | Cache directory Each backend class also accepts optional parameters for the underlying connection. For example, -{py:class}`.SQLiteCache` accepts parameters for {py:func}`sqlite3.connect`: -```python ->>> session = CachedSession('my_cache', backend='sqlite', timeout=30) -``` +the {ref}`sqlite` backend accepts parameters for {py:func}`sqlite3.connect`. ## Testing Backends If you just want to quickly try out all of the available backends for comparison, diff --git a/docs/user_guide/backends/dynamodb.md b/docs/user_guide/backends/dynamodb.md new file mode 100644 index 0000000..ebd2e01 --- /dev/null +++ b/docs/user_guide/backends/dynamodb.md @@ -0,0 +1,48 @@ +(dynamodb)= +# DynamoDB +```{image} ../../_static/dynamodb.png +``` + +[DynamoDB](https://aws.amazon.com/dynamodb) is a fully managed, highly scalable NoSQL document +database hosted on [Amazon Web Services](https://aws.amazon.com). + +## Use Cases +In terms of features, DynamoDB is roughly comparable to MongoDB and other NoSQL databases. Since +it's a managed service, no server setup or maintenance is required, and it's very convenient to use +if your application is already on AWS. It is an especially good fit for serverless applications +running on [AWS Lambda](https://aws.amazon.com/lambda). + +```{warning} +DynamoDB item sizes are limited to 400KB. If you need to cache larger responses, consider +using a different backend. +``` + +## Usage Example +Initialize with a {py:class}`.DynamoDbCache` instance: +```python +>>> from requests_cache import CachedSession, DynamoDbCache +>>> session = CachedSession(backend=DynamoDbCache()) +``` + +Or by alias: +```python +>>> session = CachedSession(backend='dynamodb') +``` + +## Connection Options +This backend accepts any keyword arguments for {py:meth}`boto3.session.Session.resource`: +```python +>>> backend = DynamoDbCache(region_name='us-west-2') +>>> session = CachedSession(backend=backend) +``` + +## Creating Tables +Tables will be automatically created if they don't already exist. This is convienient if you just +want to quickly test out DynamoDB as a cache backend, but in a production environment you will +likely want to create the tables yourself, for example with [CloudFormation](https://aws.amazon.com/cloudformation/) or [Terraform](https://www.terraform.io/). Here are the +details you'll need: + +- Tables: two tables, named `responses` and `redirects` +- Partition key (aka namespace): `namespace` +- Range key (aka sort key): `key` +- Attributes: `namespace` (string) and `key` (string) diff --git a/docs/user_guide/backends/filesystem.md b/docs/user_guide/backends/filesystem.md new file mode 100644 index 0000000..9fdee7c --- /dev/null +++ b/docs/user_guide/backends/filesystem.md @@ -0,0 +1,56 @@ +(filesystem)= +# Filesystem +```{image} ../../_static/files-generic.png +``` + +This backend stores responses in files on the local filesystem, with one file per response. + +## Use Cases +This backend is useful if you would like to use your cached response data outside of requests-cache, +for example: + +- Manually viewing cached responses without the need for extra tools (e.g., with a simple text editor) +- Using cached responses as sample data for automated tests +- Reading cached responses directly from another application or library, without depending on requests-cache + +## Usage Example +Initialize with a {py:class}`.FileCache` instance: +```python +>>> from requests_cache import CachedSession, FileCache +>>> session = CachedSession(backend=FileCache()) +``` + +Or by alias: +```python +>>> session = CachedSession(backend='filesystem') +``` + +## File Formats +By default, responses are saved as pickle files. If you want to save responses in a human-readable +format, you can use one of the other available {ref}`serializers`. For example, to save responses as +JSON files: +```python +>>> session = CachedSession('~/http_cache', backend='filesystem', serializer='json') +>>> session.get('https://httpbin.org/get') +>>> print(list(session.cache.paths())) +> ['/home/user/http_cache/4dc151d95200ec.json'] +``` + +Or as YAML (requires `pyyaml`): +```python +>>> session = CachedSession('~/http_cache', backend='filesystem', serializer='yaml') +>>> session.get('https://httpbin.org/get') +>>> print(list(session.cache.paths())) +> ['/home/user/http_cache/4dc151d95200ec.yaml'] +``` + +## Cache Files +- See {ref}`files` for general info on specifying cache paths +- The path for a given response will be in the format `<cache_name>/<cache_key>` +- Redirects are stored in a separate SQLite database, located at `<cache_name>/redirects.sqlite` +- Use {py:meth}`.FileCache.paths` to get a list of all cached response paths + +## Performance and Limitations +- Write performance will vary based on the serializer used, in the range of roughly 1-3ms per write. +- This backend stores response files in a single directory, and does not currently implement fan-out. This means that on most filesystems, storing a very large number of responses will result in reduced performance. +- This backend currently uses a simple threading lock rather than a file lock system, so it is not an ideal choice for highly parallel applications. diff --git a/docs/user_guide/backends/gridfs.md b/docs/user_guide/backends/gridfs.md new file mode 100644 index 0000000..568d5b9 --- /dev/null +++ b/docs/user_guide/backends/gridfs.md @@ -0,0 +1,23 @@ +(gridfs)= +# GridFS +```{image} ../../_static/mongodb.png +``` + +[GridFS](https://docs.mongodb.com/manual/core/gridfs/) is a specification for storing large files +in MongoDB. + +## Use Cases +Use this backend if you are using MongoDB and expect to store responses **larger than 16MB**. See +{py:mod}`~requests_cache.backends.mongodb` for more general info. + +## Usage Example +Initialize with a {py:class}`.GridFSCache` instance: +```python +>>> from requests_cache import CachedSession, GridFSCache +>>> session = CachedSession(backend=GridFSCache()) +``` + +Or by alias: +```python +>>> session = CachedSession(backend='gridfs') +``` diff --git a/docs/user_guide/backends/mongodb.md b/docs/user_guide/backends/mongodb.md new file mode 100644 index 0000000..11201b2 --- /dev/null +++ b/docs/user_guide/backends/mongodb.md @@ -0,0 +1,104 @@ +(mongodb)= +# MongoDB +```{image} ../../_static/mongodb.png +``` + +[MongoDB](https://www.mongodb.com) is a NoSQL document database. It stores data in collections +of documents, which are more flexible and less strictly structured than tables in a relational +database. + +## Use Cases +MongoDB scales well and is a good option for larger applications. For raw caching performance, it is +not quite as fast as {py:mod}`~requests_cache.backends.redis`, but may be preferable if you already +have an instance running, or if it has a specific feature you want to use. See sections below for +some relevant examples. + +## Usage Example +Initialize with a {py:class}`.MongoCache` instance: +```python +>>> from requests_cache import CachedSession, MongoCache +>>> session = CachedSession(backend=MongoCache()) +``` + +Or by alias: +```python +>>> session = CachedSession(backend='mongodb') +``` + +## Connection Options +This backend accepts any keyword arguments for {py:class}`pymongo.mongo_client.MongoClient`: +```python +>>> backend = MongoCache(host='192.168.1.63', port=27017) +>>> session = CachedSession('http_cache', backend=backend) +``` + +## Viewing Responses +Unlike most of the other backends, response data can be easily viewed via the +[MongoDB shell](https://www.mongodb.com/docs/mongodb-shell/#mongodb-binary-bin.mongosh), +[Compass](https://www.mongodb.com/products/compass), or any other interface for MongoDB. This is +possible because its internal document format ([BSON](https://www.mongodb.com/json-and-bson)) +supports all the types needed to store a response as a plain document rather than a fully serialized +blob. + +Here is an example response viewed in +[MongoDB for VSCode](https://code.visualstudio.com/docs/azure/mongodb): + +:::{admonition} Screenshot +:class: toggle + +```{image} ../../_static/mongodb_vscode.png +``` +::: + +## Expiration +MongoDB [natively supports TTL](https://www.mongodb.com/docs/v4.0/core/index-ttl), and can +automatically remove expired responses from the cache. + +**Notes:** +- TTL is set for a whole collection, and cannot be set on a per-document basis. +- It will persist until explicitly removed or overwritten, or if the collection is deleted. +- Expired items are + [not guaranteed to be removed immediately](https://www.mongodb.com/docs/v4.0/core/index-ttl/#timing-of-the-delete-operation). + Typically it happens within 60 seconds. +- If you want, you can rely entirely on MongoDB TTL instead of requests-cache + {ref}`expiration settings <expiration>`. +- Or you can set both values, to be certain that you don't get an expired response before MongoDB + removes it. +- If you intend to reuse expired responses, e.g. with {ref}`conditional-requests` or `stale_if_error`, + you can set TTL to a larger value than your session `expire_after`, or disable it altogether. + +**Examples:** +Create a TTL index: +```python +>>> backend = MongoCache() +>>> backend.set_ttl(3600) +``` + +Overwrite it with a new value: +```python +>>> backend = MongoCache() +>>> backend.set_ttl(timedelta(days=1), overwrite=True) +``` + +Remove the TTL index: +```python +>>> backend = MongoCache() +>>> backend.set_ttl(None, overwrite=True) +``` + +Use both MongoDB TTL and requests-cache expiration: +```python +>>> ttl = timedelta(days=1) +>>> backend = MongoCache() +>>> backend.set_ttl(ttl) +>>> session = CachedSession(backend=backend, expire_after=ttl) +``` + +**Recommended:** Set MongoDB TTL to a longer value than your {py:class}`.CachedSession` expiration. +This allows expired responses to be eventually cleaned up, but still be reused for conditional +requests for some period of time: +```python +>>> backend = MongoCache() +>>> backend.set_ttl(timedelta(days=7)) +>>> session = CachedSession(backend=backend, expire_after=timedelta(days=1)) +``` diff --git a/docs/user_guide/backends/redis.md b/docs/user_guide/backends/redis.md new file mode 100644 index 0000000..cfe1898 --- /dev/null +++ b/docs/user_guide/backends/redis.md @@ -0,0 +1,66 @@ +(redis)= +# Redis +```{image} ../../_static/redis.png +``` + +[Redis](https://redis.io) is an in-memory data store with on-disk persistence. + +## Use Cases +Redis offers a high-performace cache that scales exceptionally well, making it an ideal choice for +larger applications, especially those that make a large volume of concurrent requests. + +## Usage Example +Initialize your session with a {py:class}`.RedisCache` instance: +```python +>>> from requests_cache import CachedSession, RedisCache +>>> session = CachedSession(backend=RedisCache()) +``` + +Or by alias: +```python +>>> session = CachedSession(backend='redis') +``` + +## Connection Options +This backend accepts any keyword arguments for {py:class}`redis.client.Redis`: +```python +>>> backend = RedisCache(host='192.168.1.63', port=6379) +>>> session = CachedSession('http_cache', backend=backend) +``` + +Or you can pass an existing `Redis` object: +```python +>>> from redis import Redis + +>>> connection = Redis(host='192.168.1.63', port=6379) +>>> backend = RedisCache(connection=connection)) +>>> session = CachedSession('http_cache', backend=backend) +``` + +## Persistence +Redis operates on data in memory, and by default also persists data to snapshots on disk. This is +optimized for performance, with a minor risk of data loss, and is usually the best configuration +for a cache. If you need different behavior, the frequency and type of persistence can be customized +or disabled entirely. See [Redis Persistence](https://redis.io/topics/persistence) for details. + +## Expiration +Redis natively supports TTL on a per-key basis, and can automatically remove expired responses from +the cache. This will be set by by default, according to normal {ref}`expiration settings <expiration>`. + +If you intend to reuse expired responses, e.g. with {ref}`conditional-requests` or `stale_if_error`, +you can disable this behavior with the `ttl` argument: +```python +>>> backend = RedisCache(ttl=False) +``` + +## Redislite +If you can't easily set up your own Redis server, another option is +[redislite](https://github.com/yahoo/redislite). It contains its own lightweight, embedded Redis +database, and can be used as a drop-in replacement for redis-py. Usage example: +```python +>>> from redislite import Redis +>>> from requests_cache import CachedSession, RedisCache + +>>> backend = RedisCache(connection=Redis()) +>>> session = CachedSession(backend=backend) +``` diff --git a/docs/user_guide/backends/sqlite.md b/docs/user_guide/backends/sqlite.md new file mode 100644 index 0000000..c4e0744 --- /dev/null +++ b/docs/user_guide/backends/sqlite.md @@ -0,0 +1,84 @@ +(sqlite)= +# SQLite +```{image} ../../_static/sqlite.png +``` +[SQLite](https://www.sqlite.org/) is a fast and lightweight SQL database engine that stores data +either in memory or in a single file on disk. + +## Use Cases +Despite its simplicity, SQLite is a powerful tool. For example, it's the primary storage system for +a number of common applications including Firefox, Chrome, and many components of both Android and +iOS. It's well suited for caching, and requires no extra configuration or dependencies, which is why +it's 'used as the default backend for requests-cache. + +## Usage Example +SQLite is the default backend, but if you want to pass extra connection options or just want to be +explicit, initialize your session with a {py:class}`.SQLiteCache` instance: +```python +>>> from requests_cache import CachedSession, SQLiteCache +>>> session = CachedSession(backend=SQLiteCache()) +``` + +Or by alias: +```python +>>> session = CachedSession(backend='sqlite') +``` + +## Connection Options +This backend accepts any keyword arguments for {py:func}`sqlite3.connect`: +```python +>>> backend = SQLiteCache('http_cache', timeout=30) +>>> session = CachedSession(backend=backend) +``` + +## Cache Files +- See {ref}`files` for general info on specifying cache paths +- If you specify a name without an extension, the default extension `.sqlite` will be used + +### In-Memory Caching +SQLite also supports [in-memory databases](https://www.sqlite.org/inmemorydb.html). +You can enable this (in "shared" memory mode) with the `use_memory` option: +```python +>>> session = CachedSession('http_cache', use_memory=True) +``` + +Or specify a memory URI with additional options: +```python +>>> session = CachedSession(':file:memdb1?mode=memory') +``` + +Or just `:memory:`, if you are only using the cache from a single thread: +```python +>>> session = CachedSession(':memory:') +``` + +## Performance +When working with average-sized HTTP responses (\< 1MB) and using a modern SSD for file storage, you +can expect speeds of around: +- Write: 2-8ms +- Read: 0.2-0.6ms + +Of course, this will vary based on hardware specs, response size, and other factors. + +## Concurrency +SQLite supports concurrent access, so it is safe to use from a multi-threaded and/or multi-process +application. It supports unlimited concurrent reads. Writes, however, are queued and run in serial, +so if you need to make large volumes of concurrent requests, you may want to consider a different +backend that's specifically made for that kind of workload, like {py:class}`.RedisCache`. + +## Hosting Services and Filesystem Compatibility +There are some caveats to using SQLite with some hosting services, based on what kind of storage is +available: + +- NFS: + - SQLite may be used on a NFS, but is usually only safe to use from a single process at a time. + See the [SQLite FAQ](https://www.sqlite.org/faq.html#q5) for details. + - PythonAnywhere is one example of a host that uses NFS-backed storage. Using SQLite from a + multiprocess application will likely result in `sqlite3.OperationalError: database is locked`. +- Ephemeral storage: + - Heroku [explicitly disables SQLite](https://devcenter.heroku.com/articles/sqlite3) on its dynos. + - AWS [EC2](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html), + [Lambda (depending on configuration)](https://aws.amazon.com/blogs/compute/choosing-between-aws-lambda-data-storage-options-in-web-apps/), + and some other AWS services use ephemeral storage that only persists for the lifetime of the + instance. This is fine for short-term caching. For longer-term persistance, you can use an + [attached EBS volume](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-attaching-volume.html). diff --git a/docs/user_guide/installation.md b/docs/user_guide/installation.md index 174baf3..a5d4176 100644 --- a/docs/user_guide/installation.md +++ b/docs/user_guide/installation.md @@ -39,4 +39,4 @@ of python, here are the latest compatible versions and their documentation pages * **python 2.7:** [requests-cache 0.5.2](https://requests-cache.readthedocs.io/en/v0.5.0) * **python 3.4:** [requests-cache 0.5.2](https://requests-cache.readthedocs.io/en/v0.5.0) * **python 3.5:** [requests-cache 0.5.2](https://requests-cache.readthedocs.io/en/v0.5.0) -* **python 3.6:** [requests-cache 0.7.4](https://requests-cache.readthedocs.io/en/v0.7.4) +* **python 3.6:** [requests-cache 0.7.5](https://requests-cache.readthedocs.io/en/v0.7.5) |