diff options
Diffstat (limited to 'ext/mbstring/README_PHP3-i18n-ja')
-rw-r--r-- | ext/mbstring/README_PHP3-i18n-ja | 774 |
1 files changed, 0 insertions, 774 deletions
diff --git a/ext/mbstring/README_PHP3-i18n-ja b/ext/mbstring/README_PHP3-i18n-ja deleted file mode 100644 index cac00b82a1..0000000000 --- a/ext/mbstring/README_PHP3-i18n-ja +++ /dev/null @@ -1,774 +0,0 @@ -========================================== - README for I18N Package -========================================== - -o Name and location of package - -Name: php-3.0.18-i18n-ja-2 -Location: http://www.happysize.co.jp/techie/php-ja-jp/ - ftp://ftp.happysize.co.jp/php-ja-jp/ - http://php.vdomains.org/ - ftp://ftp.vdomains.org/pub/php-ja-jp/ - http://php.jpnnet.com/ - -Currently, this I18N version of PHP only adds Japanese support to base -PHP. It allows you to use Japanese in scripts, as well as conversion -between various Japanese encodings. It will work perfectly fine with -ASCII with i18n option enabled. (note: executable is bit larger due -to UNICODE table). The basic design aproach is to allow for other -languages to be added in the future. Developers are encourage to join -us! - -For more information on Japanese encodings, please refer to the -section "Additional Notes." - - -o What is this package? - -This package allows you to handle multiple Japanese encodings (SJIS, EUC, -UTF-8, JIS) in PHP. If you find any bugs in this package, please report -them to the appropriate mailing list. For now, the PHP-jp mailing list -is the best place for this. - -PHP-jp ML mailto:PHP-jp@sidecar.ics.es.osaka-u.ac.jp - http://sidecar.ics.es.osaka-u.ac.jp/php-jp/ - (discussions are in Japanese) - - -o Who should use this - -Due to lack of documentation, it's not intended for beginners. If -something goes wrong, be prepared to fix it on your own. - - -o Warranty and Copyright - -There is no warranty with this package. Use it at your own risk. - -Please refer to the source code for the copyrights. In general, each -program's copyright is owned by the programmer. Unless you obey the -copyright holders restrictions, you are not allowed to use it in any -form. - - -o Redistribution - -As described in the source code, this package and the components are -allowed to be redistributed with certain restrictions. - -Due to this package being still in beta, please try to redistribute -it as an entire package. Please try not to distribute it as a form -of patch. Because we would prefer to have this package distributed -as one single package (not patch of patch of patch), avoid releasing -any patch to this package. - - -o Who made this - -A team of volunteers, PHP3 Internationalization, has been contributing -their free time producing it. Although we are not related to the core -PHP programmers, we are hoping to have our modifications merged into the -core distribution in the near future. Thus, we did not call this a -"Japanese Patch" (or distribution). Our final goal is to have true -i18nized PHP! - -For anyone interested in this project, please drop us a line. - -Contact Address: - phpj-dev@kage.net - (Discussions are in Japanese, but feel free to write us in English) - -Webpage (English and Japanese): - http://php.jpnnet.com/ - -Project Outline (Japanese): - http://www.happysize.co.jp/techie/php-ja-jp/spec.htm - -Developers: - Hironori Sato <satoh@jpnnet.com> - Shigeru Kanemoto <sgk@happysize.co.jp> - Tsukada Takuya <tsukada@fminn.nagano.nagano.jp> - U. Kenkichi <kenkichi@axes.co.jp> - Tateyama <tateyan@amy.hi-ho.ne.jp> - Other gracious contributors - - -o Future plans - -- fulfilling what's written in outline -- support for other languages other than Japanese -- make the character conversion as a library (?) -- more testing - - -o Special Thanks to - -PHP Japanese webpage maintainer, Hirokawa-san - http://www.cityfujisawa.ne.jp/%7Elouis/apps/phpfi/ -PHP-JP ML's Yamamoto-san - http://sidecar.ics.es.osaka-u.ac.jp/php-jp/ -Previous jp-patch developers - - - -========================================== - Advantages of using I18N package -========================================== - -- allows you to use various character encodings for script files and - http output -- distinguish character encoding in POST/GET/COOKIE -- proper mail output using JIS as body and MIME/Base64/JIS subject -- if http output's Content-Type is text/html, it will set proper charset -- stable character encoding conversion -- multibyte regex - - - -========================================== - Installation -========================================== - -o Summary - -Add --enable-i18n option when running configure. For your own setup, -add any other appropriate options as well. - -Don't forget to copy php3.ini-dist to desired location. -(ex. /usr/local/lib/php3.ini) - -If you have already installed PHP3, copy all the entries in php3.ini-dist -which start with "i18n.xxxx" to php3.ini. - - -o configure option - --enable-i18n - include i18n features - - --enable-mbregex - include multibyte regex library - (without i18n enabled, mbregex functions will not function) - - -o creating cgi version - - % tar xvzf php-3.0.18-i18n-ja-2.tar.gz - % cd php-3.0.18-i18n-ja-2 - % ./configure --enable-i18n --enable-mbregex - % make - - -o creating Apache version (regular module) - - % tar xvzf php-3.0.18-i18n-ja-2.tar.gz - % tar xvzf apache_1.3.x.tar.gz - % cd apache_1.3.x - % ./configure - % cd ../php-3.0.18-i18n-ja-2 - % ./configure --with-apache=../apache_1.3.x --enable-i18n --enable-mbregex - % make - % make install - % cd ../apache_1.3.x - % ./configure --activate-module=src/modules/php3/libphp3.a - % make - % make install - - -o creating Apache DSO version - - create DSO capable Apache first - % tar xvzf apache_1.3.x.tar.gz - % cd apache-1.3.x - % ./configure --enable-shared=max - % make - % make install - - now create php3 - % cd php-3.0.18-i18n-ja-2 - % ./configure --with-apxs=/usr/local/apache/bin/apxs --enable-i18n \ - --enable-mbregex - % make - % make install - - -========================================== - Additional Notes -========================================== - -o Multibyte regex library - -From beta4, we have included the multibyte (mb) regex library which comes with -Ruby. With this addition, you can now use regex in EUC, SJIS and UTF-8 -encoding. To avoid any conflicts with HSREGEX included with Apache, -each function name has been changed. Therefore, mb regex functions are -named differently from the original ereg functions in PHP. The character -encoding used in mb regex is configured in i18n.internal_encoding. - - -o Binary Output - -If http output encoding is set to other than 'pass', conversion of encoding -from internal encoding to http output is done automatically. Thus, -if you prefer to spit out anything in raw binary format, your data -may be corrupted. In such event, set http_output to 'pass'. - -ex. - <? - i18n_http_output("pass"); - ... - echo $the_binary_data_string; - ?> - - -o Content-Type - -Depending on the setting of http_output, PHP will output the proper charset. -ex. Content-Type: text/html; charset="..." - -Be aware of following: - -- If you set Content-Type header using header() function, that will - override the automatic addition of charset. -- Be cautious when you set i18n_http_output, since if any output is - made prior to this, proper header may have been sent out to the - client already. - - -o In the event of trouble - -If you find any bugs or trouble, please contact us at the above address. -It may help us to track the problem if you send us the script as well. - -If you encounter any memory related error such as segmentation violation, -add --enable-debug when you run configure. This will give you more -detail information on where error has occurred. The error is stored -in the server log or regular http output in CGI mode. - - -o About Japanese encodings - -Due to historical reason, there are multiple character encodings used -for Japanese. The most common encodings are: SJIS, EUC, JIS, and UTF-8. -Here are (very) brief description of them: - -EUC - commonly used in UNIX environment - 8bit-8bit combo - always >=0x80 - -SJIS - commonly used in Mac or PCs - similar to EUC - mostly 8bit-8bit (some 8bit-7bit) - mostly >=0x80 - there are some halfwidth (size of ASCII) multibytes - -JIS - commonly used in 7bit environment (nntp and smtp) - starts with escaping char, \033 and a few more characters - -UTF-8 - 16bit+ encoding - defines many languages existing in this world - see http://www.unicode.org/ for more detail - -Because of having all these character encodings, PHP needs to translate -between these encodings on the fly. Also, the addition of the mb regex -library allows you to handle mb strings without fear of getting mb char -chopped in half. - -Since Japanese is not the only language with multiple encodings, we -encourage other developers to modify our code to suit your needs. We -definitely need people to work with Korean, Chinese (both traditional -and simplified), and Russian. Let us know if you are interested in -this project! - - - -========================================== - php3.ini setting -========================================== - -The following init options will allow you to change the default settings. -Define these settings in the global section of php3.ini. - -All keywords are case-insensitive. - -o Encoding naming - - For each encoding, there are three names: standarized, alias, MIME - - - UTF-8 - standard: UTF-8 - alias: N/A - mime: UTF-8 - - - ASCII - standard: ASCII - alias: N/A - mime: US-ASCII - - - Japanese EUC - standard: EUC-JP - alias: EUC, EUC_JP, eucJP, x-euc-jp - mime: EUC-JP - - - Shift JIS - standard: SJIS - alias: x-sjis, MS_Kanji - mime: Shift_JIS - - - JIS - standard: JIS - alias: N/A - mime: ISO-2022-JP - - - Quoted-Printable - standard: Quoted-Printable - alias: qprint - mime: N/A - - - BASE64 - standard: BASE64 - alias: N/A - mime: N/A - - - no conversion - standard: pass - alias: none - mime: N/A - - - auto encoding detection - standard: auto - alias: unknown - mime: N/A - - * N/A - Not Applicapable - -o i18n.http_output - default http output encoding - - i18n.http_output = EUC-JP|SJIS|JIS|UTF-8|pass - EUC-JP : EUC - SJIS: SJIS - JIS : JIS - UTF-8: UTF-8 - pass: no conversion - - The default is pass (internal encoding is used) - It can be re-configured on the fly using i18n_http_output(). - - -o i18n.internal_encoding - internal encoding - - i18n.internal_encoding = EUC-JP|SJIS|UTF-8 - EUC-JP : EUC - SJIS: SJIS - UTF-8: UTF-8 - - The default is EUC-JP. - - PHP parser is designed based on using ISO-8859-1. For other - encodings, following conditions have to be satisfied in order - to use them: - - per byte encoding - - single byte charactor in range of 00h-7fh which is compatible - with ASCII - - multibyte without 00h-7fh - In case of Japanese, EUC-JP and UTF-8 are the only encoding that - meets this criteria. - - If i18n.internal_encoding and i18n.http_output differs, conversion - takes place at the time of output. If you convert any data within - PHP scripts to URL encoding, BASE64 or Quoted-Printable, encoding - stays as defined in i18n.internal_encoding. Thus, if you would - prefer to encode in compliance with i18n.http_output, you need - to manually convert encoding. - - ex. $str = urlencode( i18n_convert($str, i18n_http_output()) ); - - Encoding such as ISO-2022-** and HZ encoding which uses escape - sequences can not be used as internal encoding. If used, they - result in following errors: - - parser pukes funky error - - magic_quotes_*** breaks encoding (SJIS may have similar problem) - - string manipulation and regex will malfunction - - -o i18n.script_encoding - script encoding - - i18n.script_encoding = auto|EUC-JP|SJIS|JIS|UTF-8 - auto: automatic - EUC-JP : EUC - SJIS: SJIS - JIS : JIS - UTF-8: UTF-8 - - The default is auto. - The script's encoding is converted to i18n.internal_encoding before - entering the script parser. - - Be aware that auto detection may fail under some conditions. - For best auto detection, add multibyte charactor at begining of - script. - - -o i18n.http_input - handling of http input (GET/POST/COOKIE) - - i18n.http_input = pass|auto - auto: auto conversion - pass: no conversion - - The default is auto. - If set to pass, no conversion will take place. - If set to auto, it will automatically detect the encoding. If - detection is successful, it will convert to the proper internal - encoding. If not, it will assume the input as defined in - i18n.http_input_default. - -o i18n.http_input_default - default http input encoding - - i18n.http_input_default = pass|EUC-JP|SJIS|JIS|UTF-8 - pass: no conversion - EUC-JP : EUC - SJIS: SJIS - JIS : JIS - UTF-8: UTF-8 - - The default is pass. - This option is only effective as long as i18n.http_input is set to - auto. If the auto detection fails, this encoding is used as an - assumption to convert the http input to the internal encoding. - If set to pass, no conversion will take place. - -o sample settings - - 1) For most flexibility, we recommend using following example. - i18n.http_output = SJIS - i18n.internal_encoding = EUC-JP - i18n.script_encoding = auto - i18n.http_input = auto - i18n.http_input_default = SJIS - - 2) To avoid unexpected encoding problems, try these: - - i18n.http_output = pass - i18n.internal_encoding = EUC-JP - i18n.script_encoding = pass - i18n.http_input = pass - i18n.http_input_default = pass - - - -========================================== - PHP functions -========================================== - -The following describes the additional PHP functions. - -All keywords are case-insensitive. - -o i18n_http_output(encoding) -o encoding = i18n_http_output() - - This will set the http output encoding. Any output following this - function will be controlled by this function. If no argument is given, - the current http output encode setting is returned. - - encodings - EUC-JP : EUC - SJIS: SJIS - JIS : JIS - UTF-8: UTF-8 - pass: no conversion - - NONE is not allowed - - -o encoding = i18n_internal_encoding() - - Returns the current internal encoding as a string. - - internal encoding - EUC-JP : EUC - SJIS: SJIS - UTF-8: UTF-8 - - -o encoding = i18n_http_input() - - Returns http input encoding. - - encodings - EUC-JP : EUC - SJIS: SJIS - JIS : JIS - UTF-8: UTF-8 - pass: no conversion (only if i18n.http_input is set to pass) - - -o string = i18n_convert(string, encoding) - string = i18n_convert(string, encoding, pre-conversion-encoding) - - Returns converted string in desired encoding. If - pre-conversion-encoding is not defined, the given - string is assumed to be in internal encoding. - - encoding - EUC-JP : EUC - SJIS: SJIS - JIS : JIS - UTF-8: UTF-8 - pass: no conversion - - pre-conversion-encoding - EUC-JP : EUC - SJIS: SJIS - JIS : JIS - UTF-8: UTF-8 - pass: no conversion - auto: auto detection - - -o encoding = i18n_discover_encoding(string) - - Encoding of the given string is returned (as a string). - - encoding - EUC-JP : EUC - SJIS: SJIS - JIS : JIS - UTF-8: UTF-8 - ASCII: ASCII (only 09h, 0Ah, 0Dh, 20h-7Eh) - pass: unable to determine (text is too short to determine) - unknown: unknown or possible error - - -o int = mbstrlen(string) -o int = mbstrlen(string, encoding) - - Returns character length of a given string. If no encoding is defined, - the encoding of string is assumed to be the internal encoding. - - encoding - EUC-JP : EUC - SJIS: SJIS - JIS : JIS - UTF-8: UTF-8 - auto: automatic - - -o int = mbstrpos(string1, string2) -o int = mbstrpos(string1, string2, start) -o int = mbstrpos(string1, string2, start, encoding) - - Same as strpos. If no encoding is defined, the encoding of string - is assumed to be the internal encoding. - - encoding - EUC-JP : EUC - SJIS: SJIS - JIS : JIS - UTF-8: UTF-8 - - -o int = mbstrrpos(string1, string2) -o int = mbstrrpos(string1, string2, encoding) - - Same as strrpos. If no encoding is defined, the encoding of string - is assumed to be the internal encoding. - - encoding - EUC-JP : EUC - SJIS: SJIS - JIS : JIS - UTF-8: UTF-8 - - -o string = mbsubstr(string, position) -o string = mbsubstr(string, position, length) -o string = mbsubstr(string, position, length, encoding) - - Same as substr. If no encoding is defined, the encoding of string - is assumed to be the internal encoding. - - encoding - EUC-JP : EUC - SJIS: SJIS - JIS : JIS - UTF-8: UTF-8 - - -o string = mbstrcut(string, position) -o string = mbstrcut(string, position, length) -o string = mbstrcut(string, position, length, encoding) - - Same as subcut. If position is the 2nd byte of a mb character, it will cut - from the first byte of that character. It will cut the string without - chopping a single byte from a mb character. In another words, if you - set length to 5, you will only get two mb characters. If no encoding - is defined, the encoding of string is assumed to be the internal encoding. - - encoding - EUC-JP : EUC - SJIS: SJIS - JIS : JIS - UTF-8: UTF-8 - - -o string = i18n_mime_header_encode(string) - MIME encode the string in the format of =?ISO-2022-JP?B?[string]?=. - - -o string = i18n_mime_header_decode(string) - MIME decodes the string. - - -o string = i18n_ja_jp_hantozen(string) -o string = i18n_ja_jp_hantozen(string, option) -o string = i18n_ja_jp_hantozen(string, option, encoding) - - Conversion between full width character and halfwidth character. - - option - The following options are allowed. The default is "KV". - Acronym: FW = fullwidth, HW = halfwidth - - "r" : FW alphabet -> HW alphabet - - "R" : HW alphabet -> FW alphabet - - "n" : FW number -> HW number - - "N" : HW number -> FW number - - "a" : FW alpha numeric (21h-7Eh) -> HW alpha numeric - - "A" : HW alpha numeric (21h-7Eh) -> FW alpha numeric - - "k" : FW katakana -> HW katakana - - "K" : HW katakana -> FW katakana - - "h" : FW hiragana -> HW hiragana - - "H" : HW hiragana -> FW katakana - - "c" : FW katakana -> FW hiragana - - "C" : FW hiragana -> FW katakana - - "V" : merge dakuon character. only works with "K" and "H" option - - encoding - If no encoding is defined, the encoding of string is assumed to be - the internal encoding. - EUC-JP : EUC - SJIS: SJIS - JIS : JIS - UTF-8: UTF-8 - - -int = mbereg(regex_pattern, string, string) -int = mberegi(regex_pattern, string, string) - mb version of ereg() and eregi() - - -string = mbereg_replace(regex_pattern, string, string) -string = mberegi_replace(regex_pattern, string, string) - mb version of ereg_replace() and eregi_replace() - - -string_array = mbsplit(regex, string, limit) - mb version of split() - - - -========================================== - FAQ -========================================== - -Here, we have gathered some commonly asked questions on PHP-jp mailing -list. - -o To use Japanese in GET method - -If you need to assign Japanese text in GET method with argument, such as; -xxxx.php?data=<Japanese text>, use urlencode function in PHP. If not, -text may not be passed onto action php properly. - -ex: <a href="hoge.php?data=<? echo urlencode($data) ?>">Link</a> - - -o When passing data via GET/POST/COOKIE, \ character sneaks in - -When using SJIS as internal encoding, or passed-on data includes '"\, -PHP automatically inserts escaping character, \. Set magic_quotes_gpc -in php3.ini from On to Off. An alternative work around to this problem -is to use StripSlashes(). - -If $quote_str is in SJIS and you would like to extract Japanese text, -use ereg_replace as follows: - -ereg_replace(sprintf("([%c-%c%c-%c]\\\\)\\\\",0x81,0x9f,0xe0,0xfc), - "\\1",$quote_str); - -This will effectively extract Japanese text out of $quote_str. - - -o Sometimes, encoding detection fails - -If i18n_http_input() returns 'pass', it's likely that PHP failed to -detect whether it's SJIS or EUC. In such case, use <input type=hidden -value="some Japanese text"> to properly detect the incoming text's -encoding. - - - -========================================== - Japanese Manual -========================================== -Translated manual done by "PHP Japanese Manual Project" : - -http://www.php.net/manual/ja/manual.php - -Starting 3.0.18-i18n-ja, we have removed doc-jp from tarball package. - - -========================================== - Change Logs -========================================== - -o 2000-10-28, Rui Hirokawa <hirokawa@php.net> - -This patch is derived from php-3.0.15-i18n-ja as well as php-3.0.16 by -Kuwamura applied to original php-3.0.18. It also includes following fixes: - -1) allows you to set charset in mail(). -2) fixed mbregex definitions to avoid conflicts with system regex -3) php3.ini-dist now uses PASS for http_output instead of SJIS - -o 2000-11-24, Hironori Sato <satoh@yyplanet.com> - -Applied above patched and added detection for gdImageStringTTF in configure. -Following setups are known to work: - -gd-1.3-6, gd-devel-1.3-6, freetype-1.3.1-5, freetype-devel-1.3.1-5 - ImageTTFText($im,$size,$angle,$x1,$y1,$color,"/path/to/font.ttf", - i18n_convert("日本語", "UTF-8")); - ImageGif($im); - -gd-1.7.3-1k1, gd-devel-1.7.3-1k1, freetype-1.3.1-5, freetype-devel-1.3.1-5 - ImageTTFText($im,$size,$angle,$x1,$y1,$color,"/path/to/font.ttf","日本語"); - ImagePng($im); - * i18n_internal_encoding = EUC 又は SJIS - -For any gd libraries before 1.6.2, you need to use i18n_convert. For -gd-1.5.2/3, upgrade to anything above 1.7 to use ImageTTFText without -using i18n_convert. As long as you have internal_encoding set to EUC or -SJIS, ImageTTFText should work without mojibake. Again, make sure you -have i18n_http_output("pass") before calling ImageGif, ImagePng, ImageJpeg! - -o 2000-12-09, Rui Hirokawa <hirokawa@php.net> - -Fixed mail() which was causing segmentation fault when header was null. - |