summaryrefslogtreecommitdiff
path: root/doc/developers/last-modified.txt
blob: 0a92f05406ed0914d919240c453025f2e8e53b5c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
==============================
Computing last_modified values
==============================

Introduction
------------

Bazaar (through at least 0.19) computes a ``last_modified``
attribute for all inventory entries and stores it at commit time.
This is the ``revision_id`` that last changed or merged the file.  It is
used in knit and weave repositories to look up the file text, and to index
into the file graph.  It's also used to determine which revisions of the
file text to pull during ``fetch``.

This data is not natively stored by most other systems so we need to
synthesize it during conversion.

This is a case of non-core data that we might wish to treat as cached,
rather than always stored.

Definition
----------

Take the set of all "heads": all the versions of these files in parent
trees.

Reduce the heads by eliminating any whose last_modified is an ancestor of
the last_modified of any other head.

If there is still more than one head, a new last_modified is assigned.
This points to the merge point in the file graph.

If the file text and properties are the same as the sole remaining head,
its last_modified is inherited. Property changes include executable bit,
filename, and containing directory.

Otherwise, a new last_modified is used.

(This is meant to be the simplest statement, but it may not be the most
efficient algorithm; anything that gives equivalent results can be used.)


Generation in commit
--------------------

Commit and converters both need this when writing into Bazaar native
formats.

This is an O(tree) operation because it needs to check for files with
multiple heads.  It could be reduced to O(changed_or_merged_files) if that
was faster to determine.  So it needs to be fast.

For the single-parent commit case, we just need to determine which files have
changed compared to the parent.  If the file was changed, it gets the
revision id of the new revision; otherwise it inherits the value from the
parent tree.

In the multi-parent commit case (commit of a merge), it can take the value
from any of the parent trees, or of the new revision.

Commit in a dirstate tree should be able to do this more easily by looking
at a row of the dirstate to get the per-file parents.  It still needs to
look at the revision or file graph information to work out whether heads can be
eliminated as previously merged.  At the moment ``find_previous_heads`` works on
inventories, so needs to spend considerable effort building whole
inventories, including files that are not modified or merged.  (Called
from ``record_entry_contents``.)  It might be better to have the commit
builder pass in the per-entry parents so that dirstate can generate just
those that are necessary.  (See also the spec for
``iter_changes_multiple_parents``.)

If merge used a per-file graph then it would know when one version fully
supersedes another, and it could emit only a single parent.  Merge could
in fact do this even when not using per-file graphs.  In the current
dirstate format we need to store the full data for all trees because they
can be extracted from the dirstate, but it could mark some parents as
already merged.

Alternatively, we could change the dirstate to include
only the base and current trees, and cache the merged-in parents
elsewhere.

(Offtopic other dirstate changes: we could also omit the working-copy
hash, and just have a stat-fingerprint of when it was last known equal to
the basis revision.  That reduces the amount of data stored and possibly
makes it simpler to update, and shouldn't penalize common cases.)


Generation during conversion
----------------------------

Accessing a foreign branch requires synthesizing this information.
If last_modified is removed from a future bzr version, we will also need
to synthesize it to pull back to earlier formats.

Because last_modified is not natively stored in the foreign branches, we
want to take advantage of any conversion we've already done, so that we
don't need to recursively generate them on every access.  We'd
prefer to find a revision that's already converted to a Bazaar inventory
within another related repository, such as the target of a conversion.


Avoiding last_modified
----------------------

last_modified is potentially expensive to determine and we may not want to
store it in inventories in future.  Therefore we should use it only when
necessary:

* When writing out an inventory format that includes it.

* In Bazaar formats that use it as a key for the file text or file
  ancestry.  This should be hidden behind the Repository/RevisionTree
  interface.

* When a user operation specifically requires the last_modified (e.g.
  hypothetical annotate directory).

We already do this in most cases.


Compared to annotate
--------------------


Use cases
---------

Cases to test
-------------

1. Single parent, unmodified file
2. Single parent, modified file
3. Two parents, one descended from the other, modified in one parent only
4. Two parents, one descended from the other, modified in one parent only,
   but also modified locally.
5. Two parents, not descended from each other, modified in one parent only.
6. Two parents, not descended from each other, modified in one parent only,
   but also modified locally.
7. Two parents, modified in both to different values.
8. Two parents, modified in both to the same value.
9. Two parents, modified in both, and reverted in both back to the
   original text.
10. Three parents, modified in only one
11. Three parents, modified in only one, also modified locally.
12. Three parents, modified in 2
13. Three parents, modified in 2, and locally.
14. Three parents, modified in 2, but one is a descendant of the other.



Performance considerations
--------------------------

Often we'll want the last_modified information for multiple files, perhaps
everything in a directory or in a whole tree.  It may be more efficient
for the api to accommodate this.  Often the last_modified will be similar
for multiple files, and if we process them all at once we can avoid some
repeated work in calculating their heads.


Open questions
--------------

* How does caching ``find_heads`` interact with cherry-picks?



Possible structure
==================

For a single file, if I am different from all parents, 'new'. (Do not need
to evaluate last modified).

..
  vim: ft=rst tw=74