summaryrefslogtreecommitdiff
path: root/chromium/storage/browser/blob/README.md
blob: 73ef187d07e504ac60b77e14ca2e550fa50e134a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
# Chrome's Blob Storage System Design

Elaboration of the blob storage system in Chrome.

# What are blobs?

Please see the [FileAPI Spec](https://www.w3.org/TR/FileAPI/) for the full
specification for Blobs, or [Mozilla's Blob documentation](
https://developer.mozilla.org/en-US/docs/Web/API/Blob) for a description of how
Blobs are used in the Web Platform in general. For the purposes of this
document, the important aspects of blobs are:

1. Blobs are immutable.
2. Blob can be made using one or more of: bytes, files, or other blobs.
3. Blobs can be ['sliced'](
https://developer.mozilla.org/en-US/docs/Web/API/Blob/slice), which creates a
blob that is a subsection of another blob.
4. Reading blobs is asynchronous.
5. Reading blob metadata (like size) is synchronous.
6. Blobs can be passed to other browsing contexts, such as Javascript workers
or other tabs.

In Chrome, after blob creation the actual blob 'data' gets transported to and
lives in the browser process. The renderer just holds a reference -
specifically a string UUID - to the blob, which it can use to read the blob or
pass it to other processes.

# Summary & Terminology

Blobs are created in a renderer process, where their data is temporarily held
for the browser (while Javascript execution can continue). When the browser has
enough memory quota for the blob, it requests the data from the renderer. All
blob data is transported from the renderer to the browser. Once complete, any
pending reads for the blob are allowed to complete. Blobs can be huge (GBs), so
quota is necessary.

If the in-memory space for blobs is getting full, or a new blob is too large to
be in-memory, then the blob system uses the disk. This can either be paging old
blobs to disk, or saving the new too-large blob straight to disk.

Blob reading goes through the network layer, where the renderer dispatches a
network request for the blob and the browser responds with the
`BlobURLRequestJob`.

General Chrome terminology:

* **Renderer, Browser, and IPCs**: See the [Multi-Process Architecture](
https://www.chromium.org/developers/design-documents/multi-process-architecture)
document to learn about these concepts.
* **Shared Memory**: Memory that both the browser and renderer process can read
& write. Created only between 2 processes.

Blob system terminology:

* **Blob**: This is a blob object, which can consist of bytes or files, as
described above.
* **BlobItem** or **[DataElement](
https://cs.chromium.org/chromium/src/storage/common/data_element.h)**:
This is a primitive element that can basically be a File, Bytes, or another
Blob. It also stores an offset and size, so this can be a part of a file. (This
can also represent a "future" file and "future" bytes, which is used to signify
a bytes or file item that has not been transported yet).
* **dependent blobs**: These are blobs that our blob is dependent on to be
constructed. As in, a blob is constructed with a dependency on another blob
(maybe it is a slice or just a blob in our constructor), and before the new
blob can be constructed it might need to wait for the "dependent" blobs to
complete. (This can sound backwards, but it's how it's referenced in the code.
So think "I am dependent on these other blobs")
* **transportation strategy**: a method for sending the data in a BlobItem from
a renderer to the browser. The system currently implements three strategies:
IPC, Shared Memory, and Files.
* **blob description**: the inital data sychronously sent to the browser that
describes the items (content and sizes) of the new blob. This can
optimistically include the blob data if the size is less than the maximimum IPC
size.

# Blob Storage Limits

We calculate the storage limits [here](
https://cs.chromium.org/chromium/src/storage/browser/blob/blob_memory_controller.cc?q=CalculateBlobStorageLimitsImpl&sq=package:chromium).

**In-Memory Storage Limit**

* If the architecture is x64 and NOT Chrome OS or Android: `2GB`
* Otherwise: `total_physical_memory / 5`

**Disk Storage Limit**

* If Chrome OS: `disk_size / 2`
* If Android: `disk_size / 20`
* Else: `disk_size / 10`

Note: Chrome OS's disk is part of the user partition, which is separate from the
system partition.

**Minimum Disk Availability**

We limit our disk limit to accomidate a minimum disk availability. The equation
we use is:

`min_disk_availability = in_memory_limit * 2`

## Example Limits

(All sizes in GB)

| Device | Ram | In-Memory Limit | Disk | Disk Limit | Min Disk Availability |
| --- | --- | --- | --- | --- | --- |
| Cast | 0.5 | 0.1 | 0 | 0 | 0 |
| Android Minimal | 0.5 | 0.1 | 8 | 0.4 | 0.2 |
| Android Fat | 2 | 0.4 | 32 | 1.5 | 0.8 |
| CrOS | 2 | 0.4 | 8 | 4 | 0.8 |
| Desktop 32 | 3 | 0.6 | 500 | 50 | 1.2 |
| Desktop 64 | 4 | 2 | 500 | 50 | 4 |

# Common Pitfalls

## Creating Large Blobs Too Fast

Creating a lot of blobs, especially if they are very large blobs, can cause
the renderer memory to grow too fast and result in an OOM on the renderer side.
This is because the renderer temporarily stores the blob data while it waits
for the browser to request it. Meanwhile, Javascript can continue executing.
Transfering the data can take a lot of time if the blob is large enough to save
it directly to a file, as this means we need to wait for disk operations before
the renderer can get rid of the data.

## Leaking Blob References

If the blob object in Javascript is kept around, then the data will never be
cleaned up in the backend. This will unnecessarily us memory, so make sure to
dereference blob objects if they are no longer needed.

Similarily if a URL is created for a blob, this will keep the blob data around
until the URL is revoked (and the blob object is dereferenced). However, the
URL is automatically revoked when the browser context is destroyed.

# How to use Blobs (Browser-side)

## Building
All blob interaction should go through the `BlobStorageContext`. Blobs are
built using a `BlobDataBuilder` to populate the data and then calling
`BlobStorageContext::AddFinishedBlob` or `::BuildBlob`. This returns a
`BlobDataHandle`, which manages reading, lifetime, and metadata access for the
new blob.

If you have known data that is not available yet, you can still create the blob
reference, but see the documentation in `BlobDataBuilder::AppendFuture* or
::Populate*` methods on the builder, the callback usage on
`BlobStorageContext::BuildBlob`, and
`BlobStorageContext::NotifyTransportComplete` to facilitate this construction.

## Accessing / Reading

All blob information should come from the `BlobDataHandle` returned on
construction. This handle is cheap to copy. Once all instances of handles for
a blob are destructed, the blob is destroyed.

`BlobDataHandle::RunOnConstructionComplete` will notify you when the blob is
constructed or broken (construction failed due to not enough space, filesystem
error, etc).

The `BlobReader` class is for reading blobs, and is accessible off of the
`BlobDataHandle` at any time.

# Blob Creation & Transportation (Renderer)

**This process is outlined with diagrams and illustrations [here](
https://docs.google.com/presentation/d/1MOm-8kacXAon1L2tF6VthesNjXgx0fp5AP17L7XDPSM/edit#slide=id.g75c319281_0_681).**

This outlines the renderer-side responsabilities of the blob system. The
renderer needs to:

 1. Consolidate small bytes items into larger chunks (avoiding a huge array of
 1 byte items).
 2. Communicate the blob description to the browser immediately on
 construction.
 3. Populate shared memory or files sent from the browser with the consolidated
 blob data items.
 4. Hold the blob data until the browser is finished requesting it.

The meat of blob construction starts in the [WebBlobRegistryImpl](
https://cs.chromium.org/chromium/src/content/child/blob_storage/webblobregistry_impl.h)'s
`createBuilder(uuid, content_type)`.

## Blob Data Consolidation

Since blobs are often constructed with arrays with single bytes, we try to
consolidate all **adjacent** memory blob items into one. This is done in
[BlobConsolidation](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_consolidation.h).
The implementation doesn't actually do any copying or allocating of new memory
buffers, instead it facilitates the transformation between the 'consolidated'
blob items and the underlying bytes items. This way we don't waste any memory.

## Blob Transportation (Renderer)

After the blob has been 'consolidated', it is given to the
[BlobTransportController](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_transport_controller.h).
This class:

1. Immediately communicates the blob description to the Browser. We also
[optimistically send](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_transport_controller.cc?l=325)
the blob data if the total memory is less than our IPC threshold.
2. Stores the blob consolidation for data requests from the browser.
3. Answers requests from the browser to populate or send the blob data. The
browser can request the renderer:
  1. Send items and populate the data in IPC ([code](
https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_transport_controller.cc?q="case+IPCBlobItemRequestStrategy::IPC")).
  2. Populate items in shared memory and notify the browser when population is
complete ([code](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_transport_controller.cc?q="case+IPCBlobItemRequestStrategy::SHARED_MEMORY")).
  3. Populate items in files and notify the browser when population is complete
([code](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_transport_controller.cc?q="case+IPCBlobItemRequestStrategy::FILE")).
4. Destroys the blob consolidation when the browser says it's done.

The transport controller also tries to keep the renderer alive while we are
sending blobs, as if the renderer is closed then we would lose any pending blob
data. It does this the [incrementing and decrementing the process reference
count](https://cs.chromium.org/chromium/src/content/child/blob_storage/blob_transport_controller.cc?l=62),
which should prevent fast shutdown.

# Blob Transportation & Storage (Browser)

The browser side is a little more complicated. We are thinking about:

1. Do we have enough space for this blob?
2. Pick transportation strategy for blob's components.
3. Is there enough free memory to transport the blob right now? Or does older
blob data to be paged to disk first?
4. Do I need to wait for files to be created?
5. Do I need to wait for dependent blobs?

## Summary

We follow this general flow for constructing a blob on the browser side:

1. Does the blob fit, and what transportation strategy should be used.
2. Create our browser-side representation of the blob data, including the data
items from dependent blobs. We try to share items as much as possible to save
memory, and allow for the dependent blob items to be not populated yet.
3. Request memory and/or file quota from the BlobMemoryController, which
manages our blob storage limits. Quota is necessary for both transportation and
any copies we have to do from dependent blobs.
4. If transporation quota is needed and when it is granted:
  1. Tell the BlobTransportHost to start asking for blob data given the earlier
  decision of strategy.
    * The BlobTransportHost populates the browser-side blob data item.
  2. When transportation is done we notify the BlobStorageContext
5. When transportation is done, copy quota is granted, and dependent blobs are
complete, we finish the blob.
  1. We perform any pending copies from dependent blobs
  2. We notify any listeners that the blob has been completed.

Note: The transportation sections (steps 1, 2, 3) of this process are described
(without accounting for blob dependencies) with diagrams and details in [this
presentation](https://docs.google.com/presentation/d/1MOm-8kacXAon1L2tF6VthesNjXgx0fp5AP17L7XDPSM/edit#slide=id.g75d5729ce_0_105).

## BlobTransportHost

The `BlobTransportHost` is in charge of the actual transportation of the data
from the renderer to the browser. When the initial description of the blob is
sent to the browser, the BlobTransportHost asks the BlobMemoryController which
strategy (IPC, Shared Memory, or File) it should use to transport the file.
Based on this strategy it can translate the memory items sent from the renderer
into a browser represetation to facilitate the transportation. See [this](
https://docs.google.com/presentation/d/1MOm-8kacXAon1L2tF6VthesNjXgx0fp5AP17L7XDPSM/edit#slide=id.g75d5729ce_0_145)
slide, which illustrates how the browser might segment or split up the
renderer's memory into transportable chunks.

Once the transport host decides its strategy, it will create its own transport
state for the blob, including a `BlobDataBuilder` using the transport's data
segment representation. Then it will tell the `BlobStorageContext` that it is
ready to build the blob.

When the `BlobStorageContext` tells the transport host that it is ready to
transport the blob data, the transport host requests all of the data from the
renderer, populates the data in the `BlobDataBuilder`, and then signals the
storage context that it is done.

## BlobStorageContext

The `BlobStorageContext` is the hub of the blob storage system. It is
responsible for creating & managing all the state of constructing blobs, as
well as all blob handle generation and general blob status access.

When a `BlobDataBuilder` is given to the context, whether from the
`BlobTransportHost` or from elsewhere, the context will do the following:

1. Find all dependent blobs in the new blob (any blob reference in the blob
item list), and create a 'slice' of their items for the new blob.
2. Create the final blob item list representation, which creates a new blob
item list which inserts these 'slice' items into the blob reference spots. This
is 'flattening' the blob.
3. Ask the `BlobMemoryManager` for file or memory quota for the transportation
if necessary
  * When the quota request is granted, notify the `BlobTransportHost` that to
  begin transporting the data.
4. Ask the `BlobMemoryManager` for memory quota for any copies necessary for
blob slicing.
5. Adds completion callbacks to any blobs our blob depends on.

When all of the following conditions are met:

1. The `BlobTransportHost` tells us it has transported all the data (or we
don't need to transport data),
2. The `BlobMemoryManager` approves our memory quota for slice copies (or we
don't need slice copies), and
3. All dependent blobs are completed (or we don't have dependent blobs),

The blob can finish constructing, where any pending blob slice copies are
performed, and we set the status of the blob.

### BlobStatus lifecycle

The BlobStatus tracks the construction procedure (specifically the transport
process), and the copy memory quota and dependent blob process is encompassed
in `PENDING_INTERNALS`.

Once a blob is finished constructing, the status is set to `DONE` or any of
the `ERR_*` values.

### BlobSlice

During construction, slices are created for dependent blobs using the given
offset and size of the reference. This slice consists of the relevant blob
items, and metadata about possible copies from either end. If blob items can
entirely be used by the new blob, then we just share the item between the. But
if there is a 'slice' of the first or last item, then our resulting BlobSlice
representation will create a new bytes item for the new blob, and store
necessary copy data for later.

### BlobFlattener

The `BlobFlattener` takes the new blob description (including blob references),
creates blob slices for all the referenced blobs, and constructs a 'flat'
representation of the new blob, where all blob references are replaced with the
`BlobSlice` items. It also stores any copy data from the slices.

## BlobMemoryController

The `BlobMemoryController` is responsable for:

1. Determining storage quota limits for files and memory, including restricting
file quota when disk space is low.
2. Determining whether a blob can fit and the transportation strategy to use.
3. Tracking memory quota.
4. Tracking file quota and creating files.
5. Accumulating and evicting old blob data to files to disk.