summaryrefslogtreecommitdiff
path: root/doc/tutorial.rst
blob: f48aceb031e00302972eca757fa1a949f5ac2348 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
.. _tutorial_toplevel:

==================
GitPython Tutorial
==================

GitPython provides object model access to your git repository. This tutorial is 
composed of multiple sections, each of which explain a real-life usecase.

Initialize a Repo object
************************

The first step is to create a ``Repo`` object to represent your repository.

    >>> from git import *
    >>> repo = Repo("/Users/mtrier/Development/git-python")

In the above example, the directory ``/Users/mtrier/Development/git-python``
is my working repository and contains the ``.git`` directory. You can also
initialize GitPython with a bare repository.

    >>> repo = Repo.create("/var/git/git-python.git")
    
A repo object provides high-level access to your data, it allows you to create
and delete heads, tags and remotes and access the configuration of the 
repository.
    
    >>> repo.config_reader()        # get a config reader for read-only access
    >>> repo.config_writer()        # get a config writer to change configuration 

Query the active branch, query untracked files or whether the repository data 
has been modified.
    
    >>> repo.is_dirty()
    False
    >>> repo.untracked_files()
    ['my_untracked_file']
    
Clone from existing repositories or initialize new empty ones.

    >>> cloned_repo = repo.clone("to/this/path")
    >>> new_repo = repo.init("path/for/new/repo")
    
Archive the repository contents to a tar file.

    >>> repo.archive(open("repo.tar",'w'))
    
Examining References
********************

References are the tips of your commit graph from which you can easily examine 
the history of your project.

    >>> heads = repo.heads
    >>> master = heads.master       # lists can be accessed by name for convenience
    >>> master.commit               # the commit pointed to by head called master
    >>> master.rename("new_name")   # rename individual heads or
    
Tags are (usually immutable) references to a commit and/or a tag object.

    >>> tags = repo.tags
    >>> tagref = tags[0]
    >>> tagref.tag                  # tags may have tag objects carrying additional information
    >>> tagref.commit               # but they always point to commits
    >>> repo.delete_tag(tagref)     # delete or
    >>> repo.create_tag("my_tag")   # create tags using the repo
    
A symbolic reference is a special case of a reference as it points to another
reference instead of a commit

Modifying References
********************
You can easily create and delete reference types or modify where they point to.

    >>> repo.delete_head('master')  
    >>> master = repo.create_head('master')
    >>> master.commit = 'HEAD~10'       # set another commit without changing index or working tree 

Create or delete tags the same way except you may not change them afterwards

    >>> new_tag = repo.create_tag('my_tag', 'my message')
    >>> repo.delete_tag(new_tag)
    
Change the symbolic reference to switch branches cheaply ( without adjusting the index
or the working copy )

    >>> new_branch = repo.create_head('new_branch')
    >>> repo.head.reference = new_branch

Understanding Objects
*********************
An Object is anything storable in gits object database. Objects contain information
about their type, their uncompressed size as well as their data. Each object is
uniquely identified by a SHA1 hash, being 40 hexadecimal characters in size. 

Git only knows 4 distinct object types being Blobs, Trees, Commits and Tags.

In Git-Pyhton, all objects can be accessed through their common base, compared 
and hashed, as shown in the following example.

    >>> hc = repo.head.commit
    >>> hct = hc.tree
    >>> hc != hct
    >>> hc != repo.tags[0]
    >>> hc == repo.head.reference.commit
    
Basic fields are

    >>> hct.type
    'tree'
    >>> hct.size
    166
    >>> hct.sha
    'a95eeb2a7082212c197cabbf2539185ec74ed0e8'
    >>> hct.data        # returns string with pure uncompressed data
    '...' 
    >>> len(hct.data) == hct.size
    
Index Objects are objects that can be put into gits index. These objects are trees
and blobs which additionally know about their path in the filesystem as well as their
mode.

    >>> hct.path            # root tree has no path
    ''
    >>> hct.trees[0].path   # the first subdirectory has one though
    'dir'
    >>> htc.mode            # trees have mode 0
    0
    >>> '%o' % htc.blobs[0].mode    # blobs have a specific mode though comparable to a standard linux fs
    100644
    
Access blob data (or any object data) directly or using streams.
    >>> htc.data            # binary tree data
    >>> htc.blobs[0].data_stream                # stream object to read data from
    >>> htc.blobs[0].stream_data(my_stream) # write data to given stream
    
    
The Commit object
*****************

Commit objects contain information about a specific commit. Obtain commits using 
references as done in 'Examining References' or as follows

Obtain commits at the specified revision:

    >>> repo.commit('master')
    >>> repo.commit('v0.1')
    >>> repo.commit('HEAD~10')

Iterate 100 commits

    >>> repo.iter_commits('master', max_count=100)

If you need paging, you can specify a number of commits to skip.

    >>> repo.iter_commits('master', max_count=10, skip=20)

The above will return commits 21-30 from the commit list.

    >>> headcommit = repo.headcommit.commit 

    >>> headcommit.sha
    '207c0c4418115df0d30820ab1a9acd2ea4bf4431'

    >>> headcommit.parents
    [<git.Commit "a91c45eee0b41bf3cdaad3418ca3850664c4a4b4">]

    >>> headcommit.tree
    <git.Tree "563413aedbeda425d8d9dcbb744247d0c3e8a0ac">

    >>> headcommit.author
    <git.Actor "Michael Trier <mtrier@gmail.com>">

    >>> headcommit.authored_date        # seconds since epoch
    1256291446

    >>> headcommit.committer
    <git.Actor "Michael Trier <mtrier@gmail.com>">

    >>> headcommit.committed_date
    1256291446

    >>> headcommit.message
    'cleaned up a lot of test information. Fixed escaping so it works with
    subprocess.'

Note: date time is represented in a ``seconds since epock`` format.  Conversion to
human readable form can be accomplished with the various time module methods.

    >>> import time
    >>> time.asctime(time.gmtime(headcommit.committed_date))
    'Wed May 7 05:56:02 2008'

    >>> time.strftime("%a, %d %b %Y %H:%M", time.gmtime(headcommit.committed_date))
    'Wed, 7 May 2008 05:56'

.. _struct_time: http://docs.python.org/library/time.html

You can traverse a commit's ancestry by chaining calls to ``parents``.

    >>> headcommit.parents[0].parents[0].parents[0]

The above corresponds to ``master^^^`` or ``master~3`` in git parlance.

The Tree object
***************

A tree records pointers to the contents of a directory. Let's say you want
the root tree of the latest commit on the master branch.

    >>> tree = repo.heads.master.commit.tree
    <git.Tree "a006b5b1a8115185a228b7514cdcd46fed90dc92">

    >>> tree.sha
    'a006b5b1a8115185a228b7514cdcd46fed90dc92'

Once you have a tree, you can get the contents.

    >>> tree.trees          # trees are subdirectories
    [<git.Tree "f7eb5df2e465ab621b1db3f5714850d6732cfed2">]
    
    >>> tree.blobs          # blobs are files
    [<git.Blob "a871e79d59cf8488cac4af0c8f990b7a989e2b53">,
    <git.Blob "3594e94c04db171e2767224db355f514b13715c5">,
    <git.Blob "e79b05161e4836e5fbf197aeb52515753e8d6ab6">,
    <git.Blob "94954abda49de8615a048f8d2e64b5de848e27a1">]

Its useful to know that a tree behaves like a list with the ability to 
query entries by name.

    >>> tree[0] == tree['dir']
    <git.Tree "f7eb5df2e465ab621b1db3f5714850d6732cfed2">
    >>> for entry in tree: do_something(entry)

    >>> blob = tree[0][0]
    >>> blob.name
    'file'
    >>> blob.path
    'dir/file'
    >>> blob.abspath
    '/Users/mtrier/Development/git-python/dir/file'

There is a convenience method that allows you to get a named sub-object
from a tree with a syntax similar to how paths are written in an unix
system.

    >>> tree/"lib"
    <git.Tree "c1c7214dde86f76bc3e18806ac1f47c38b2b7a30">

You can also get a tree directly from the repository if you know its name.

    >>> repo.tree()
    <git.Tree "master">

    >>> repo.tree("c1c7214dde86f76bc3e18806ac1f47c38b2b7a30")
    <git.Tree "c1c7214dde86f76bc3e18806ac1f47c38b2b7a30">
    >>> repo.tree('0.1.6')
    <git.Tree "6825a94104164d9f0f5632607bebd2a32a3579e5">
    
As trees only allow direct access to their direct entries, use the traverse 
method to obtain an iterator to access entries recursively.

    >>> tree.traverse()
    <generator object at 0x7f6598bd65a8>
    >>> for entry in traverse(): do_something(entry)

    
The Index Object
****************
The git index is the stage containing changes to be written to the next commit
or where merges finally have to take place. You may freely access and manipulate 
this information using the Index Object.

    >>> index = repo.index
    
Access objects and add/remove entries. Commit the changes.

    >>> for stage,blob in index.iter_blobs(): do_something(...)
    Access blob objects
    >>> for (path,stage),entry in index.entries.iteritems: pass
    Access the entries directly
    >>> index.add(['my_new_file'])      # add a new file to the index
    >>> index.remove(['dir/existing_file'])
    >>> new_commit = index.commit("my commit message")
    
Create new indices from other trees or as result of a merge. Write that result to 
a new index.

    >>> tmp_index = Index.from_tree(repo, 'HEAD~1') # load a tree into a temporary index
    >>> merge_index = Index.from_tree(repo, 'HEAD', 'some_branch') # merge two trees
    >>> merge_index.write("merged_index")
    
Handling Remotes
****************

Remotes are used as alias for a foreign repository to ease pushing to and fetching
from them.

    >>> test_remote = repo.create_remote('test', 'git@server:repo.git')
    >>> repo.delete_remote(test_remote) # create and delete remotes
    >>> origin = repo.remotes.origin    # get default remote by name
    >>> origin.refs                     # local remote references
    >>> o = origin.rename('new_origin') # rename remotes
    >>> o.fetch()                       # fetch, pull and push from and to the remote
    >>> o.pull()
    >>> o.push()

You can easily access configuration information for a remote by accessing options 
as if they where attributes.

    >>> o.url
    'git@server:dummy_repo.git'
    
Change configuration for a specific remote only 
    >>> o.config_writer.set("url", "other_url")
    
Obtaining Diff Information
**************************

Diffs can generally be obtained by Subclasses of ``Diffable`` as they provide 
the ``diff`` method. This operation yields a DiffIndex allowing you to easily access
diff information about paths.

Diffs can be made between Index and Trees, Index and the working tree, trees and 
trees as well as trees and the working copy. If commits are involved, their tree
will be used implicitly.

    >>> hcommit = repo.head.commit
    >>> idiff = hcommit.diff()          # diff tree against index
    >>> tdiff = hcommit.diff('HEAD~1')  # diff tree against previous tree
    >>> wdiff = hcommit.diff(None)      # diff tree against working tree
    
    >>> index = repo.index
    >>> index.diff()                    # diff index against itself yielding empty diff
    >>> index.diff(None)                # diff index against working copy
    >>> index.diff('HEAD')              # diff index against current HEAD tree

The item returned is a DiffIndex which is essentially a list of Diff objects. It 
provides additional filtering to find what you might be looking for

    >>> for diff_added in wdiff.iter_change_type('A'): do_something(diff_added)

Switching Branches
******************
To switch between branches, you effectively need to point your HEAD to the new branch
head and reset your index and working copy to match. A simple manual way to do it 
is the following one.

    >>> repo.head.reference = repo.heads.other_branch
    >>> repo.head.reset(index=True, working_tree=True
    
The previous approach would brutally overwrite the user's changes in the working copy 
and index though and is less sophisticated than a git-checkout for instance which 
generally prevents you from destroying your work.

Using git directly
******************
In case you are missing functionality as it has not been wrapped, you may conveniently
use the git command directly. It is owned by each repository instance.

    >>> git = repo.git
    >>> git.checkout('head', b="my_new_branch")         # default command
    >>> git.for_each_ref()                              # '-' becomes '_' when calling it
    
The return value will by default be a string of the standard output channel produced
by the command.

Keyword arguments translate to short and long keyword arguments on the commandline.
The special notion ``git.command(flag=True)`` will create a flag without value like
``command --flag``.

If ``None`` is found in the arguments, it will be dropped silently. Lists and tuples 
passed as arguments will be unpacked to individual arguments. Objects are converted 
to strings using the str(...) function.

And even more ...
*****************

There is more functionality in there, like the ability to archive repositories, get stats
and logs, blame, and probably a few other things that were not mentioned here.  

Check the unit tests for an in-depth introduction on how each function is supposed to be used.