diff options
| author | Junio C Hamano <gitster@pobox.com> | 2015-08-03 11:01:15 -0700 | 
|---|---|---|
| committer | Junio C Hamano <gitster@pobox.com> | 2015-08-03 11:01:15 -0700 | 
| commit | 81bc521af22a6549e93d33e57de40d335e0ee65b (patch) | |
| tree | e59a3a28d993cfa7843a245a884f2b8162bde88f | |
| parent | a3f4eb1b40457d85ab63168b621e71eaf73bb3c4 (diff) | |
| parent | 3a59e5954ef19ac94522219c2f29d49a187d31d8 (diff) | |
| download | git-81bc521af22a6549e93d33e57de40d335e0ee65b.tar.gz | |
Merge branch 'kb/i18n-doc'
* kb/i18n-doc:
  Documentation/i18n.txt: clarify character encoding support
| -rw-r--r-- | Documentation/i18n.txt | 33 | 
1 files changed, 23 insertions, 10 deletions
| diff --git a/Documentation/i18n.txt b/Documentation/i18n.txt index e9a1d5d25a..2dd79db5cb 100644 --- a/Documentation/i18n.txt +++ b/Documentation/i18n.txt @@ -1,18 +1,31 @@ -At the core level, Git is character encoding agnostic. - - - The pathnames recorded in the index and in the tree objects -   are treated as uninterpreted sequences of non-NUL bytes. -   What readdir(2) returns are what are recorded and compared -   with the data Git keeps track of, which in turn are expected -   to be what lstat(2) and creat(2) accepts.  There is no such -   thing as pathname encoding translation. +Git is to some extent character encoding agnostic.   - The contents of the blob objects are uninterpreted sequences     of bytes.  There is no encoding translation at the core     level. - - The commit log messages are uninterpreted sequences of non-NUL -   bytes. + - Path names are encoded in UTF-8 normalization form C. This +   applies to tree objects, the index file, ref names, as well as +   path names in command line arguments, environment variables +   and config files (`.git/config` (see linkgit:git-config[1]), +   linkgit:gitignore[5], linkgit:gitattributes[5] and +   linkgit:gitmodules[5]). ++ +Note that Git at the core level treats path names simply as +sequences of non-NUL bytes, there are no path name encoding +conversions (except on Mac and Windows). Therefore, using +non-ASCII path names will mostly work even on platforms and file +systems that use legacy extended ASCII encodings. However, +repositories created on such systems will not work properly on +UTF-8-based systems (e.g. Linux, Mac, Windows) and vice versa. +Additionally, many Git-based tools simply assume path names to +be UTF-8 and will fail to display other encodings correctly. + + - Commit log messages are typically encoded in UTF-8, but other +   extended ASCII encodings are also supported. This includes +   ISO-8859-x, CP125x and many others, but _not_ UTF-16/32, +   EBCDIC and CJK multi-byte encodings (GBK, Shift-JIS, Big5, +   EUC-x, CP9xx etc.).  Although we encourage that the commit log messages are encoded  in UTF-8, both the core and Git Porcelain are designed not to | 
