# High-level overview of Save-Page-As code This document describes code under `//content/browser/downloads` restricting the scope only to code handling Save-Page-As functionality (i.e. leaving out other downloads-related code). This document focuses on high-level overview and aspects of the code that span multiple compilation units (hoping that individual compilation units are described by their code comments or by their code structure). ## Classes overview * SavePackage class * coordinates overall save-page-as request * created and owned by `WebContents` (ref-counted today, but it is unnecessary - see https://crbug.com/596953) * UI-thread object * SaveFileCreateInfo::SaveFileSource enum * classifies `SaveItem` and `SaveFile` processing into 2 flavours: * `SAVE_FILE_FROM_NET` (see `SaveFileResourceHandler`) * `SAVE_FILE_FROM_DOM` (see "Complete HTML" section below) * SaveItem class * tracks saving a single file * created and owned by `SavePackage` * UI-thread object * SaveFileManager class * coordinates between FILE and UI threads * Gets requests from `SavePackage` and communicates results back to `SavePackage` on the UI thread. * Shephards data (received from the network OR from DOM) into FILE thread - via `SaveFileManager::UpdateSaveProgress` * created and owned by `BrowserMainLoop` (ref-counted today, but it is unnecessary - see https://crbug.com/596953) * The global instance can be retrieved by the Get method. * SaveFile class * tracks saving a single file * created and owned by `SaveFileManager` * FILE-thread object * SaveFileResourceHandler class * tracks network downloads + forwards their status into `SaveFileManager` (onto FILE-thread) * created by `ResourceDispatcherHostImpl::BeginSaveFile` * IO-thread object * SaveFileCreateInfo POD struct * short-lived object holding data passed to callbacks handling start of saving a file. * MHTMLGenerationManager class * singleton that manages progress of jobs responsible for saving individual MHTML files (represented by `MHTMLGenerationManager::Job`). ## Overview of the processing flow Save-Page-As flow starts with `WebContents::OnSavePage`. The flow is different depending on the save format chosen by the user (each flow is described in a separate section below). ### Complete HTML Very high-level flow of saving a page as "Complete HTML": * Step 1: `SavePackage` asks all frames for "savable resources" and creates `SaveItem` for each of files that need to be saved * Step 2: `SavePackage` first processes `SAVE_FILE_FROM_NET` `SaveItem`s and asks `SaveFileManager` to save them. * Step 3: `SavePackage` handles remaining `SAVE_FILE_FROM_DOM` `SaveItem`s and asks each frame to serialize its DOM/HTML (each frame gets from `SavePackage` a map covering local paths that need to be referenced by the frame). Responses from frames get forwarded to `SaveFileManager` to be written to disk. ### MHTML Very high-level flow of saving a page as MHTML: * Step 1: `WebContents::GenerateMHTML` is called by either `SavePackage` (for Save-Page-As UI) or Extensions (via `chrome.pageCapture` extensions API) or by an embedder of `WebContents` (since this is public API of //content). * Step 2: `MHTMLGenerationManager` creates a new instance of `MHTMLGenerationManager::Job` that coordinates generation of the MHTML file by sequentially (one-at-a-time) asking each frame to write its portion of MHTML to a file handle. Other classes (i.e. `SavePackage` and/or `SaveFileManager`) are not used at this step at all. * Step 3: When done `MHTMLGenerationManager` destroys `MHTMLGenerationManager::Job` instance and calls a completion callback which in case of Save-Page-As will end up in `SavePackage::OnMHTMLGenerated`. Note: MHTML format is by default disabled in Save-Page-As UI on Windows, MacOS and Linux (it is the default on Chrome OS), but for testing this can be easily changed using `--save-page-as-mhtml` command line switch. ### HTML Only Very high-level flow of saving a page as "HTML Only": * `SavePackage` creates only a single `SaveItem` (always `SAVE_FILE_FROM_NET`) and asks `SaveFileManager` to process it (as in the Complete HTML individual SaveItem handling above.). ## Other relevant code Pointers to related code outside of `//content/browser/download`: * End-to-end tests: * `//chrome/browser/downloads/save_page_browsertest.cc` * `//chrome/test/data/save_page/...` * Other tests: * `//content/browser/downloads/*test*.cc` * `//content/renderer/dom_serializer_browsertest.cc` - single process... :-/ * Elsewhere in `//content`: * `//content/renderer/savable_resources...` * Blink: * `//third_party/WebKit/public/web/WebFrameSerializer...` * `//third_party/WebKit/Source/web/WebFrameSerializerImpl...` (used for Complete HTML today; should use `FrameSerializer` instead in the long-term - see https://crbug.com/328354). * `//third_party/WebKit/Source/core/frame/FrameSerializer...` (used for MHTML today) * `//third_party/WebKit/Source/platform/mhtml/MHTMLArchive...`