summaryrefslogtreecommitdiff
path: root/UPGRADING.INTERNALS
diff options
context:
space:
mode:
authorChristoph M. Becker <cmbecker69@gmx.de>2018-12-02 16:28:18 +0100
committerChristoph M. Becker <cmbecker69@gmx.de>2018-12-15 14:38:15 +0100
commit3b0f05119383fe21ee75adaed3d0239ba8976aef (patch)
tree5bbad1d4dbf64e4f8b00eaadf05eca03066917b0 /UPGRADING.INTERNALS
parentd206630f13812ba63cf62765d4731149700125e3 (diff)
downloadphp-git-3b0f05119383fe21ee75adaed3d0239ba8976aef.tar.gz
Allow empty $escape to eschew escaping CSV
Albeit CSV is still a widespread data exchange format, it has never been officially standardized. There exists, however, the “informational” RFC 4180[1] which has no notion of escape characters, but rather defines `escaped` as strings enclosed in double-quotes where contained double-quotes have to be doubled. While this concept is supported by PHP's implementation (`$enclosure`), the `$escape` sometimes interferes, so that `fgetcsv()` is unable to correctly parse externally generated CSV, and `fputcsv()` is sometimes generating non-compliant CSV. Since PHP's `$escape` concept is availble for many years, we cannot drop it for BC reasons (even though many consider it as bug). Instead we allow to pass an empty string as `$escape` parameter to the respective functions, which results in ignoring/omitting any escaping, and as such is more inline with RFC 4180. It is noteworthy that this is almost no userland BC break, since formerly most functions did not accept an empty string, and failed in this case. The only exception was `str_getcsv()` which did accept an empty string, and used a backslash as escape character then (which appears to be unintended behavior, anyway). The changed functions are `fputcsv()`, `fgetcsv()` and `str_getcsv()`, and also the `::setCsvControl()`, `::getCsvControl()`, `::fputcsv()`, and `::fgetcsv()` methods of `SplFileObject`. The implementation also changes the type of the escape parameter of the PHP_APIs `php_fgetcsv()` and `php_fputcsv()` from `char` to `int`, where `PHP_CSV_NO_ESCAPE` means to ignore/omit escaping. The parameter accepts the same values as `isalpha()` and friends, i.e. “the value of which shall be representable as an `unsigned char` or shall equal the value of the macro `EOF`. If the argument has any other value, the behavior is undefined.” This is a subtle BC break, since the character `chr(128)` has the value `-1` if `char` is signed, and so likely would be confused with `EOF` when converted to `int`. We consider this BC break to be acceptable, since it's rather unlikely that anybody uses `chr(128)` as escape character, and it easily can be fixed by casting all `escape` arguments to `unsigned char`. This patch implements the feature requests 38301[2] and 51496[3]. [1] <https://tools.ietf.org/html/rfc4180> [2] <https://bugs.php.net/bug.php?id=38301> [3] <https://bugs.php.net/bug.php?id=51496>
Diffstat (limited to 'UPGRADING.INTERNALS')
-rw-r--r--UPGRADING.INTERNALS6
1 files changed, 6 insertions, 0 deletions
diff --git a/UPGRADING.INTERNALS b/UPGRADING.INTERNALS
index e2e56c74a3..8488bc8062 100644
--- a/UPGRADING.INTERNALS
+++ b/UPGRADING.INTERNALS
@@ -9,6 +9,7 @@ PHP 7.4 INTERNALS UPGRADE NOTES
f. get_properties_for() handler / Z_OBJDEBUG_P
g. Required object handlers
h. Immutable classes and op_arrays
+ i. php_fgetcsv() and php_fputcsv()
2. Build system changes
a. Abstract
@@ -136,6 +137,11 @@ PHP 7.4 INTERNALS UPGRADE NOTES
zend_get_op_array_extension_handle() during MINIT and access its value
using ZEND_OP_ARRAY_EXTENSION(op_array, handle).
+ i. The type of the escape parameter of php_fgetcsv() and php_fputcsv() has
+ been changed from char to int. This allows to pass the new constant macro
+ PHP_CSV_NO_ESCAPE to this parameter, to disable PHP's proprietary escape
+ mechanism.
+
========================
2. Build system changes
========================