1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
|
====
TODO
====
General
-------
* Classes requiring repo actually only need the git command - this should be
changed to limit their access level and make things a little safer.
* Check for correct usage of id, ref and hexsha and define their meanings,
currently its not so clear what id may be in cases or not - afaik its usually
a sha or ref unless cat-file is used where it must be a sha
* Overhaul command caching - currently its possible to create many instances of
the std-in command types, as it appears they are not killed when the repo gets
deleted. A clear() method could already help to allow long-running programs
to remove cached commands after an idle time.
* References should be parsed 'manually' to get around command invocation, but
be sure to be able to read packed refs.
Object
------
* DataStream method should read the data itself. This would be easy once you have
the actul loose object, but will be hard if it is in a pack. In a distant future,
we might be able to do that or at least implement direct object reading for loose
objects ( to safe a command call ). Currently object information comes from
persistent commands anyway, so the penalty is not that high. The data_stream
though is not based on persistent commands.
It would be good to improve things there as cat-file keeps all the data in a buffer
before it writes it. Hence it does not write to a stream directly, which can be
bad if files are large, say 1GB :).
* Effectively Objects only store hexsha's in their id attributes, so in fact
it should be renamed to 'sha'. There was a time when references where allowed as
well, but now objects will be 'baked' to the actual sha to assure comparisons work.
Commit
------
* message is stipped during parsing, which is wrong unless we parse from
rev-list output. In fact we don't know that, and can't really tell either.
Currently we strip away white space that might actually belong to the message
Config
------
* Expand .get* methods of GitConfigParser to support default value. If it is not None,
it will be returned instead of raising. This way the class will be much more usable,
and ... I truly hate this config reader as it is so 'old' style. Its not even a new-style
class yet showing that it must be ten years old.
Diff
----
* Check docs on diff-core to be sure the raw-format presented there can be read
properly:
- http://www.kernel.org/pub/software/scm/git-core/docs/gitdiffcore.html
Docs
----
Overhaul docs - check examples, check looks, improve existing docs
Index
-----
* [advanced]
write_tree should write a tree directly, which would require ability to create
objects in the first place. Should be rather simple as it is
"tree" bytes datablock | sha1sum and zipped.
Currently we use some file swapping and the git command to do it which probably
is much slower. The thing is that properly writing a tree from an index involves
creating several tree objects, so in the end it might be slower.
Hmm, probably its okay to use the command unless we go c(++)
* Implement diff so that temporary indices can be used as well ( using file swapping )
* Proper merge handling with index and working copy
* Checkout individual blobs using the index and git-checkout. Blobs can already
be written using their stream_data method.
* index.add: could be implemented in python together with hash-object, allowing
to keep the internal entry cache and write once everything is done. Problem
would be that all other git commands are unaware of the changes unless the index
gets written. Its worth an evaluation at least.
* index.remove: On windows, there can be a command line length overflow
as we pass the paths directly as argv. This is as we use git-rm to be able
to remove whole directories easily. This could be implemented using
git-update-index if this becomes an issue, but then we had to do all the globbing
and directory removal ourselves
* commit: advance head = False - tree object should get the base commit wrapping
that index uses after writing itself as tree. Perhaps it would even be better
to have a Commit.create method from a tree or from an index. Allowing the
latter would be quite flexible and would fit into the system as refs have
create methods as well
Refs
-----
* When adjusting the reference of a symbolic reference, the ref log might need
adjustments as well. This is not critical, but would make things totally 'right'
* Reference Objects should be able to set the commit they are pointing to, making
the commit property read-write. Tags are a special case of this and would need
to be handled as well !
* Ability to create new heads and tags in the Repository ( but using the respective
Reference Type ), i.e. Head.create(repo, name, commit = 'HEAD') or
TagReference.create(repo, name
* Ability to rename references and tags
* Ability to remove references and tags
* Ability to checkout a reference -
* Check whether we are the active reference HEAD.commit == self.commit
Remote
------
* 'push' method needs a test, a true test repository is required though, a fork
of a fork would do :)!
* Fetch should return heads that where updated, pull as well.
Repo
----
* Blame: Read the blame format making assumptions about its structure,
currently regex are used a lot although we can deduct what will be next.
- Read data from a stream directly from git command
* Figure out how to implement a proper merge API
Submodules
----------
* add submodule support
Tree
----
* Should return submodules during iteration ( identifies as commit )
* Work through test and check for test-case cleanup and completeness ( what about
testing whether it raises on invalid input ? ). See 6dc7799d44e1e5b9b77fd19b47309df69ec01a99
|