Garbage Collection ================== Garbage collection is used to remove data from a repository that is no longer referenced. Generally this involves locking the repository and scanning all its branches then generating a new repository with less data. Least work we can hope to perform --------------------------------- * Read all branches to get initial references - tips + tags. * Read through the revision graph to find unreferenced revisions. A cheap HEADS list might help here by allowing comparison of the initial references to the HEADS - any unreferenced head is garbage. * Walk out via inventory deltas to get the full set of texts and signatures to preserve. * Copy to a new repository * Bait and switch back to the original * Remove the old repository. A possibility to reduce this would be to have a set of grouped 'known garbage free' data - 'ancient history' which can be preserved in total should its HEADS be fully referenced - and where the HEADS list is deliberate cheap (e.g. at the top of some index). possibly - null data in place without saving size.