summaryrefslogtreecommitdiff
path: root/ext/tokenizer/tokenizer.c
Commit message (Collapse)AuthorAgeFilesLines
* Rename PhpToken::getAll() to PhpToken::tokenize()Nikita Popov2020-11-091-1/+1
| | | | | See https://externals.io/message/112189. Fixes bug #80328.
* Improve type declarations for Zend APIsGeorge Peter Banyard2020-08-281-23/+19
| | | | | | | | | Voidification of Zend API which always succeeded Use bool argument types instead of int for boolean arguments Use bool return type for functions which return true/false (1/0) Use zend_result return type for functions which return SUCCESS/FAILURE as they don't follow normal boolean semantics Closes GH-6002
* Remove proto comments from C filesMax Semenik2020-07-061-10/+5
| | | | Closes GH-5758
* Add helper APIs for maybe-interned string creationtwosee2020-06-081-1/+1
| | | | | | | | | | | | Add ZVAL_CHAR/RETVAL_CHAR/RETURN_CHAR as a shortcut for using ZVAL_INTERNED_STRING and ZSTR_CHAR. Add zend_string_init_fast() as a helper for the empty string / one char interned string / zend_string_init() pattern. Also add corresponding ZVAL_STRINGL_FAST etc macros. Closes GH-5684.
* Fix bug #77966: Cannot alias a method named "namespace"Nikita Popov2020-06-081-14/+39
| | | | | | | | | | | | | | | | | | | | This is a bit tricky: In this cases we have "namespace as", which means that we will only recognize "namespace" as an identifier when the lookahead token is already at the "as". This means that zend_lex_tstring picks up the wrong identifier. We solve this by actually assigning the identifier as the semantic value on the parser stack -- as in almost all cases we will not actually need the identifier, this is just an (offset, size) reference, not a copy of the string. Additionally, we need to teach the lexer feedback mechanism used by tokenizer TOKEN_PARSE mode to apply feedback to something other than the very last token. To that purpose we pass through the token text and check the tokens in reverse order to find the right one. Closes GH-5668.
* Improve some TypeError and ValueError messagesMáté Kocsis2020-04-141-2/+2
| | | | Closes GH-5377
* Generate function entries from stubs for a couple of extensionsMáté Kocsis2020-04-141-23/+2
| | | | | Migrates ext/standard, ext/tidy, ext/tokenizer, ext/xml, ext/xml_reader, and ext/xml_writer. Closes GH-5381.
* Syntax errors caused by unclosed {, [, ( mention specific locationAlex Dowad2020-04-141-1/+3
| | | | | | | | | | | | | | | | | | | | | | Aside from a few very specific syntax errors for which detailed exceptions are thrown, generally PHP just emits the default error messages generated by bison on syntax error. These messages are very uninformative; they just say "Unexpected ... at line ...". This is most problematic with constructs which can span an arbitrary number of lines, such as blocks of code delimited by { }, 'if' conditions delimited by ( ), and so on. If a closing delimiter is missed, the block will run for the entire remainder of the source file (which could be thousands of lines), and then at the end, a parse error will be thrown with the dreaded words: "Unexpected end of file". Therefore, track the positions of opening and closing delimiters and ensure that they match up correctly. If any mismatch or missing delimiter is detected, immediately throw a parse error which points the user to the offending line. This is best done in the *lexer* and not in the parser. Thanks to Nikita Popov and George Peter Banyard for suggesting improvements. Fixes bug #79368. Closes GH-5364.
* Add PhpToken classNikita Popov2020-03-261-48/+330
| | | | | | | | | RFC: https://wiki.php.net/rfc/token_as_object Relative to the RFC, this also adds a __toString() method, as discussed on list. Closes GH-5176.
* Clarify that token_get_all() never returns falseNikita Popov2020-02-141-1/+3
| | | | It can only fail in TOKEN_PARSE mode, in which case it will throw.
* Merge branch 'PHP-7.4'Nikita Popov2019-09-281-1/+5
|\
| * Reduce memory used by token_get_all()Tyson Andre2019-09-281-1/+5
| | | | | | | | | | | | | | | | | | | | Around a quarter of all strings in array tokens would have a string that's one character long (e.g. ` `, `\`, `1`) For parsing a large number of php files, The memory increase dropped from 378374248 to 369535688 (2.5%) Closes GH-4753.
* | Remove mention of PHP major version in Copyright headersGabriel Caruso2019-09-251-2/+0
| | | | | | | | Closes GH-4732.
* | Arginfo stubs for tokenizerStephen Reay2019-08-111-11/+1
|/
* Revert "Switch to bison location tracking"Nikita Popov2019-03-281-4/+11
| | | | | | | | | | | This reverts commit e528762c1c59bc0bd0bd6d78246c14269630cf0f. Dmitry reports that this has a non-trivial impact on parsing overhead, especially on 32-bit systems. As we don't have a strong need for this change right now, I'm reverting it. See also comments on https://github.com/php/php-src/commit/e528762c1c59bc0bd0bd6d78246c14269630cf0f.
* Switch to bison location trackingNikita Popov2019-03-211-11/+4
| | | | | | | | | | | | | | | | | Locations for AST nodes are now tracked with the help of bison location tracking. This is more accurate than what we currently do and easier to extend with more information. A zend_ast_loc structure is introduced, which is used for the location stack. Currently it only holds the start lineno, but can be extended to also hold end lineno and offset/column information in the future. All AST constructors now accept a zend_ast_loc* as first argument, and will use it to determine their lineno. Previously this used either the CG(zend_lineno), or the smallest AST lineno of child nodes. On the parser side, the location structure for a whole rule can be obtained using the &@$ character salad.
* Remove local variablesPeter Kokot2019-02-031-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the so called local variables defined per file basis for certain editors to properly show tab width, and similar settings. These are mainly used by Vim and Emacs editors yet with recent changes the once working definitions don't work anymore in Vim without custom plugins or additional configuration. Neither are these settings synced across the PHP code base. A simpler and better approach is EditorConfig and fixing code using some code style fixing tools in the future instead. This patch also removes the so called modelines for Vim. Modelines allow Vim editor specifically to set some editor configuration such as syntax highlighting, indentation style and tab width to be set in the first line or the last 5 lines per file basis. Since the php test files have syntax highlighting already set in most editors properly and EditorConfig takes care of the indentation settings, this patch removes these as well for the Vim 6.0 and newer versions. With the removal of local variables for certain editors such as Emacs and Vim, the footer is also probably not needed anymore when creating extensions using ext_skel.php script. Additionally, Vim modelines for setting php syntax and some editor settings has been removed from some *.phpt files. All these are mostly not relevant for phpt files neither work properly in the middle of the file.
* Remove yearly range from copyright noticeZeev Suraski2019-01-301-1/+1
|
* Remove unused Git attributes identPeter Kokot2018-07-251-2/+0
| | | | | | | | | | | | | | | The $Id$ keywords were used in Subversion where they can be substituted with filename, last revision number change, last changed date, and last user who changed it. In Git this functionality is different and can be done with Git attribute ident. These need to be defined manually for each file in the .gitattributes file and are afterwards replaced with 40-character hexadecimal blob object name which is based only on the particular file contents. This patch simplifies handling of $Id$ keywords by removing them since they are not used anymore.
* Replace legacy zval_dtor() by zval_ptr_dtor_nogc() or even more specialized ↵Dmitry Stogov2018-07-041-3/+3
| | | | | | | destructors. zval_dtor() doesn't make a lot of sense in PHP-7.* and it's used incorrectly in some places. Its occurances should be replaced by zval_ptr_dtor() or zval_ptr_dtor_nogc(), or even more specialized destructors.
* Fix typos...Nikita Popov2018-06-271-1/+1
|
* Fixed bug #76538Nikita Popov2018-06-271-1/+1
|
* Fixed typoXinchen Hui2018-06-181-1/+1
|
* Fixed bug #76437 (token_get_all with TOKEN_PARSE flag fails to recognise ↵Xinchen Hui2018-06-181-2/+10
| | | | close tag)
* PHP scanner optimizationDmitry Stogov2018-03-141-2/+1
|
* year++Xinchen Hui2018-01-021-1/+1
|
* Move constants into read-only data segmentDmitry Stogov2017-12-141-1/+1
|
* Use ZSTR_CHAR in token_get_all()Nikita Popov2017-03-221-31/+24
|
* Simplify increment_lineno handlingNikita Popov2017-03-221-10/+5
|
* Update copyright headers to 2017Sammy Kaye Powers2017-01-021-1/+1
|
* Use new param API in tokenizerSara Golemon2016-12-311-6/+8
|
* Make sure TOKEN_PARSE mode is thread safeNikita Popov2016-07-231-8/+10
| | | | | | Introduce an on_event_context passed to the on_event hook. Use this context to pass along the token array. Previously this was stored in a non-tls global :/
* Merge branch 'PHP-5.6' into PHP-7.0Lior Kaplan2016-01-011-1/+1
|\ | | | | | | | | * PHP-5.6: Happy new year (Update copyright to 2016)
| * Happy new year (Update copyright to 2016)Lior Kaplan2016-01-011-1/+1
| |
| * bump yearXinchen Hui2015-01-151-1/+1
| |
* | Don't return T_ERROR from token_get_all()Nikita Popov2015-07-091-3/+3
| | | | | | | | | | | | This turned out to be rather inconvenient after all. Instead just return the same output we did on PHP 5. If people want to have an error, use TOKEN_PARSE.
* | Fix bug #69430Nikita Popov2015-07-091-20/+9
| | | | | | | | | | Don't throw from token_get_all() unless TOKEN_PARSE is used. Errors are reported as T_ERROR tokens.
* | Update token_get_all() arginfoNikita Popov2015-07-091-1/+2
| |
* | Avoid zval duplication in ZVAL_ZVAL() macro (it was necessary only in few ↵Dmitry Stogov2015-06-121-2/+3
| | | | | | | | | | | | places). Switch from ZVAL_ZVAL() to simpler macros where possible (it makes sense to review remaining places)
* | ext tokenizer port + cleanup unused lexer statesMárcio Almada2015-04-301-17/+115
| | | | | | | | | | | | | | | | | | | | | | we basically added a mechanism to store the token stream during parsing and exposed the entire parser stack on the tokenizer extension through an opt in flag: token_get_all($src, TOKEN_PARSE). this change allows easy future language enhancements regarding context aware parsing & scanning without further maintance on the tokenizer extension while solves known inconsistencies "parseless" tokenizer extension has when it handles `__halt_compiler()` presence.
* | fix indentation + remove c++ commentsMárcio Almada2015-04-301-5/+5
| |
* | Throw ParseException from lexerNikita Popov2015-04-021-0/+2
| | | | | | | | | | | | | | Primarily to avoid getting fatal errors from token_get_all(). Implemented using a magic E_ERROR token, which the lexer emits to force a parser failure.
* | cleanup mod version macros and mod defs, round xAnatol Belski2015-03-231-5/+1
| |
* | bump yearXinchen Hui2015-01-151-1/+1
| |
* | trailing whitespace removalStanislav Malyshev2015-01-101-1/+1
| |
* | first shot remove TSRMLS_* thingsAnatol Belski2014-12-131-9/+9
| |
* | s/PHP 5/PHP 7/Johannes Schlüter2014-09-191-1/+1
| |
* | Avoid double IS_INTERNED() checkDmitry Stogov2014-09-191-1/+1
| |
* | master renames phase 1Anatol Belski2014-08-251-7/+7
| |
* | fixes to tokenizerAnatol Belski2014-08-191-2/+2
| |