Blame view

sources/3rdparty/Patchwork/README.md 5.31 KB
03e52840d   Kload   Init
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
  Patchwork UTF-8
  ===============
  
  Patchwork UTF-8 provides both :
  
  - a portability layer for Unicode handling in PHP, and
  - a class that mirrors the quasi complete set of native string functions,
    enhanced to UTF-8 [grapheme clusters](http://unicode.org/reports/tr29/)
    awareness.
  
  It can also serve as a documentation source referencing the practical problems
  that arise when handling UTF-8 in PHP: Unicode concepts, related algorithms,
  bugs in PHP core, workarounds, etc.
  
  Portability
  -----------
  
  Unicode handling in PHP is best performed using a combo of `mbstring`, `iconv`,
  `intl` and `pcre` with the `u` flag enabled. But when an application is expected
  to run on many servers, you should be aware that these 4 extensions are not
  always enabled.
  
  Patchwork UTF-8 provides pure PHP implementations for 3 of those 4 extensions.
  Here is the set of portability-fallbacks that are currently implemented:
  
  - *utf8_encode, utf8_decode*,
  - `mbstring`: *mb_convert_encoding, mb_decode_mimeheader, mb_encode_mimeheader,
    mb_convert_case, mb_internal_encoding, mb_list_encodings, mb_strlen,
    mb_strpos, mb_strrpos, mb_strtolower, mb_strtoupper, mb_substitute_character,
    mb_substr, mb_stripos, mb_stristr, mb_strrchr, mb_strrichr, mb_strripos,
    mb_strstr*,
  - `iconv`: *iconv, iconv_mime_decode, iconv_mime_decode_headers,
    iconv_get_encoding, iconv_set_encoding, iconv_mime_encode, ob_iconv_handler,
    iconv_strlen, iconv_strpos, iconv_strrpos, iconv_substr*,
  - `intl`: *Normalizer, grapheme_extract, grapheme_stripos, grapheme_stristr,
    grapheme_strlen, grapheme_strpos, grapheme_strripos, grapheme_strrpos,
    grapheme_strstr, grapheme_substr*.
  
  `pcre` compiled with unicode support is required.
  
  Patchwork\Utf8
  --------------
  
  [Grapheme clusters](http://unicode.org/reports/tr29/) should always be
  considered when working with generic Unicode strings. The `Patchwork\Utf8`
  class implements the quasi-complete set of native string functions that need
  UTF-8 grapheme clusters awareness. Function names, arguments and behavior
  carefully replicates native PHP string functions so that usage is very easy.
  
  Some more functions are also provided to help handling UTF-8 strings:
  
  - *isUtf8()*: checks if a string contains well formed UTF-8 data,
  - *toAscii()*: generic UTF-8 to ASCII transliteration,
  - *strtocasefold()*: unicode transformation for caseless matching,
  - *strtonatfold()*: generic case sensitive transformation for collation matching
  
  Mirrored string functions are:
  *strlen, substr, strpos, stripos, strrpos, strripos, strstr, stristr, strrchr,
  strrichr, strtolower, strtoupper, wordwrap, chr, count_chars, ltrim, ord, rtrim,
  trim, str_ireplace, str_pad, str_shuffle, str_split, str_word_count, strcmp,
  strnatcmp, strcasecmp, strnatcasecmp, strncasecmp, strncmp, strcspn, strpbrk,
  strrev, strspn, strtr, substr_compare, substr_count, substr_replace, ucfirst,
  lcfirst, ucwords, number_format, utf8_encode, utf8_decode*.
  
  Missing are *printf*-family functions.
  
  Usage
  -----
  
  The recommended way to install Patchwork UTF-8 is [through
  composer](http://getcomposer.org). Just create a `composer.json` file and run
  the `php composer.phar install` command to install it:
  
      {
          "require": {
              "patchwork/utf8": "1.1.*"
          }
      }
  
  Then, early in your bootstrap sequence, you have to configure your environment:
  
  ```php
  \Patchwork\Utf8\Bootup::initAll(); // Enables the portablity layer and configures PHP for UTF-8
  \Patchwork\Utf8\Bootup::filterRequestUri(); // Redirects to an UTF-8 encoded URL if it's not already the case
  \Patchwork\Utf8\Bootup::filterRequestInputs(); // Sanitizes HTTP inputs to UTF-8 NFC
  ```
  
  Run `phpunit` in the `tests/` directory to see the code in action.
  
  Make sure that you are confident about using UTF-8 by reading
  [Character Sets / Character Encoding Issues](http://www.phpwact.org/php/i18n/charsets)
  and [Handling UTF-8 with PHP](http://www.phpwact.org/php/i18n/utf-8),
  or [PHP et UTF-8](http://julp.lescigales.org/articles/3-php-et-utf-8.html) for french readers.
  
  You should also get familar with the concept of
  [Unicode Normalization](http://en.wikipedia.org/wiki/Unicode_equivalence) and
  [Grapheme Clusters](http://unicode.org/reports/tr29/).
  
  Do not blindly replace all use of PHP's string functions. Most of the time you
  will not need to, and you will be introducing a significant performance overhead
  to your application.
  
  Screen your input on the *outer perimeter* so that only well formed UTF-8 pass
  through. When dealing with badly formed UTF-8, you should not try to fix it.
  Instead, consider it as ISO-8859-1 and use `utf8_encode()` to get an UTF-8
  string. Don't forget also to choose one unicode normalization form and stick to
  it. NFC is the most in use today.
  
  This library is orthogonal to `mbstring.func_overload` and will not work if the
  php.ini setting is enabled.
  
  Licensing
  ---------
  
  Patchwork\Utf8 is free software; you can redistribute it and/or modify it under
  the terms of the (at your option):
  - [Apache License v2.0](http://apache.org/licenses/LICENSE-2.0.txt), or
  - [GNU General Public License v2.0](http://gnu.org/licenses/gpl-2.0.txt).
  
  Unicode handling requires tedious work to be implemented and maintained on the
  long run. As such, contributions such as unit tests, bug reports, comments or
  patches licensed under both licenses are really welcomed.
  
  I hope many projects could adopt this code and together help solve the unicode
  subject for PHP.