summaryrefslogtreecommitdiff
path: root/Cython/Compiler/StringEncoding.py
Commit message (Collapse)AuthorAgeFilesLines
* Really only use PyUnicode_FromUnicode() when needed (GH-3697)scoder2020-06-301-0/+28
| | | | | | | | | | * Really only use PyUnicode_FromUnicode() for strings that contain lone surrogate, not for normal non-BMP strings and not for surrogate pairs on 16bit Unicode platforms. See https://github.com/cython/cython/issues/3678 * Extend buildenv test to debug a MacOS problem. * Add a test for surrogate pairs in Unicode strings. * Limit PyUnicode_FromUnicode() usage to strings containing lone surrogates. * Accept ambiguity of surrogate pairs in Unicode string literals when generated on 16bit Py2 systems.
* Fix many indentation and whitespace issues throughout the code base (GH-3673)scoder2020-06-101-2/+2
| | | … and enforce them with pycodestyle.
* unicode imports (#3119)da-woods2019-09-301-0/+21
| | | | | * Handle normalization of unicode identifiers * Support unicode characters in module names (Only valid under Python 3)
* Unicode identifiers (PEP 3131) (GH-3081)da-woods2019-08-241-0/+8
| | | Closes #2601
* Evaluate multiplication of string literals at compile time if the result is ↵Stefan Behnel2018-01-131-0/+8
| | | | short (<= 256 characters).
* whitespaceStefan Behnel2017-02-111-0/+1
|
* Merge branch '0.23.x'Stefan Behnel2015-09-021-4/+9
|\ | | | | | | | | | | Conflicts: Cython/Compiler/Optimize.py Cython/Compiler/StringEncoding.py
| * fix bytes literal creation from compile-time DEF expressions (used to become ↵Stefan Behnel2015-09-021-3/+11
| | | | | | | | Unicode strings due to missing encoding)
* | clean up and fix docstring serialisation (some are const, some are not)Stefan Behnel2015-08-081-0/+7
|/
* adapt 'unicode' usage to Py2/Py3Stefan Behnel2015-07-261-8/+8
|
* adapt usages of map() to Py2/Py3Stefan Behnel2015-07-251-1/+1
|
* use explicit relative imports everywhere and enable absolute imports by defaultStefan Behnel2014-06-171-0/+3
|
* support surrogates in unicode string literals in Py3.3Stefan Behnel2013-03-151-1/+21
|
* Pass-through single surrogates in Py_UNICODE[] literal encoding routine.Nikita Nemkin2013-03-071-3/+3
|
* Compatibility fix: no UTF-32 codec in Python 2.4/2.5.Nikita Nemkin2013-03-071-14/+21
|
* Renamed Py_UNICODE* entities to use "pyunicode_ptr" prefix; fixed small ↵Nikita Nemkin2013-03-051-1/+1
| | | | issues in Py_UNICODE* support.
* Full support for Py_UNICODE[] literals with non-BMP characters.Nikita Nemkin2013-03-031-6/+14
|
* Basic support for Py_UNICODE* strings.Nikita Nemkin2013-03-031-0/+12
|
* preprocess byte string literal escaping instead of doing repeated ↵Stefan Behnel2013-01-101-11/+15
| | | | replacements at runtime
* undo Py3.3 surrogates support fixes - breaks too many special cases with stringsStefan Behnel2013-01-101-26/+12
|
* fix surrogates in Unicode literals in Python 3.3 (the UTF-8 codec rejects ↵Stefan Behnel2013-01-061-12/+26
| | | | them explictly)
* Fix python 3 deepcopy & sorting compatiblityMark Florisson2011-10-031-0/+6
|
* Fix trac #640, long string literals with escapes.Robert Bradshaw2011-01-121-3/+17
|
* support redundant parsing of string literals as unicode *and* bytes string, ↵Stefan Behnel2010-09-041-0/+35
| | | | fix 'str' literal assignments to char* targets when using Future.unicode_literals
* prevent control characters in unicode literals (ord<32) from sneaking into ↵Stefan Behnel2010-08-091-2/+3
| | | | the C source
* fix order of surrogate pair in wide unicode stringsStefan Behnel2010-07-031-1/+1
|
* fix parsing of wide unicode escapes on narrow Unicode platformsStefan Behnel2010-07-031-2/+13
|
* Don't split long literals at backslash.Robert Bradshaw2010-02-121-2/+2
|
* Split long string literals at 2000 chars.Robert Bradshaw2010-02-051-1/+3
| | | | (There may not be enough line breaks...)
* split BytesNode, UnicodeNode and StringNodeStefan Behnel2009-10-101-7/+7
|
* Py2 bytes handling fixStefan Behnel2009-08-211-3/+10
|
* Py2.x fix after Py3 char fix ;)Stefan Behnel2009-08-211-3/+3
|
* properly handle char values (bytes with length 1) in Py3Stefan Behnel2009-08-211-2/+3
|
* fix byte string escaping of '\' in Py2.x (broken by latest Py3 fixes)Stefan Behnel2009-07-081-1/+1
|
* enable % formatting of byte strings by providing a __str__() special method ↵Stefan Behnel2009-07-061-3/+3
| | | | that encodes to unicode
* Py3 fix: make sure byte strings end up in the code as expected (not like ↵Stefan Behnel2009-07-061-4/+8
| | | | >>b'...'<<)
* make sure header filenames pass literally into the C codeStefan Behnel2009-07-061-0/+6
|
* Py3 fixesStefan Behnel2009-07-051-1/+1
|
* Py3 fixStefan Behnel2009-07-051-21/+45
|
* Optimization for shorter docstrings.Robert Bradshaw2008-08-161-0/+2
|
* Split docstring around \n for compilers who barf at long string literals (VS ↵david@evans-2.local2008-08-151-0/+3
| | | | 2003).
* Rewrite of the string literal handling codeStefan Behnel2008-08-151-0/+144
String literals pass through the compiler as follows: - unicode string literals are stored as unicode strings and encoded to UTF-8 on the way out - byte string literals are stored as correctly encoded byte strings by unescaping the source string literal into the corresponding byte sequence. No further encoding is done later on! - char literals are stored as byte strings of length 1. This can be verified by the parser now, e.g. a non-ASCII char literal in UTF-8 source code will result in an error, as it would end up as two or more bytes in the C code, which can no longer be represented as a C char. Storing byte strings is necessary as we otherwise loose the ability to encode byte string literals on the way out. They do not necessarily contain only bytes that fit into the source code encoding as the source can use escape sequences to represent them. Previously, ASCII encoded source code could not contain byte string literals with properly escaped non-ASCII bytes. Another bug that was fixed: in Python, escape sequences behave different in unicode strings (where they represent the character code) and byte strings (where they represent a byte value). Previously, they resulted in the same byte value in Cython code. This is only a problem for non-ASCII escapes, since the character code and the byte value of ASCII characters are identical.