(→Documentation: _pcre_valid_utf8() interface has changed before 8.21. 8.30 brings rename only.)
(→In progress: php will be fixed by upgrade)
|Line 28:||Line 28:|
Revision as of 14:17, 14 February 2012
Upgrade to PCRE (Perl-Compatible Regular Expression) library 8.30 or newer. This library version brings UTF-16 support and changes API which affects a lot of packages.
- Name: Petr Pisar
- Email: <email@example.com>
- Targeted release: Fedora 18
- Last updated: 2012-02-14
- Percentage of completion: 87.16 %
- PCRE 8.30 built
95 of 109 reverse dependencies have been rebuilt. Remaining:
- php - will be fixed in 5.4.0 final, planned for 2012-02-16
To be done
- Remove libpcre.so.0 from pcre at the end
Each PCRE release brings new fixes and features (like updated Unicode tables). Thus it's necessary to keep synchronization with upstream releases. Version 8.30 changes API. Because PCRE is in critical path and in minimal build root, it's necessary to do the upgrade carefully. So feature page to track the progress is necessary. Also 8.30 brings support for UTF-16 encoding, which is helpful for applications using this encoding internally. It will avoid expensive recoding between UTF-16 and UTF-8. Qt upstream has already expressed intention to move from its own regular expression implementation to PCRE.
Benefit to Fedora
Fedora will keep providing latest upstream PCRE version with latest Unicode tables. Fedora will provide UTF-16 mode in PCRE.
8.30 version changes API as described by upstream:
- The pcre_info() function, which has been obsolete for over 10 years, has been removed.
- When a compiled pattern was saved to a file and later reloaded on a host with different endianness, PCRE used automatically to swap the bytes in some of the data fields. With the advent of the 16-bit library, where more of this swapping is needed, it is no longer done automatically. Instead, the bad endianness is detected and a specific error is given. The user can then all a new function called pcre_pattern_to_host_byte_order() (or an equivalent 16-bit function) to do the swap.
- In UTF-8 mode, the values 0xd800 to 0xdfff are not legal Unicode code points and are now faulted. (They are the so-called surrogates" that are reserved for coding high values in UTF-16.)
This is reflected in changed libpcre SONAME from libpcre.so.0 to libpcre.so.1. This change affects 109 packages. All the packages needs to be rebuilt and some of them may need moving to new API.
How To Test
- Check the distribution contains pcre >= 8.30.
- Check none package depends on old PCRE soname libpcre.so.0 (e.g.
repoquery --whatrequires 'libpcre.so.0()(64bit)').
- Check PCRE is compiled with UTF-16 support (install pcre-tools, check
pcretest -C pcre16returns 1).
- Check PCRE tools works properly with UTF-16 PCRE library variant (install pcretools, read pcretest(1) manual, try
pcretest -16 …).
- Check applications can be compiled against pcre16 library. Install pcre-devel, check presence of pcre16_*(3) manual pages, check output of
pkg-config --libs libpcre16. Try to compile and link a short code using pcre16 library.
There is no visible change for end users. Developers can see pcre_info(3) has been removed. pcre_info(3) users need to migrate to pcre_fullinfo(3) as document for last 10 years in pcre_info(3) manual page.
109 packages needs rebuilding:
adanaxisgpl blender bti cclive ccze cduce cegui cegui06 cfengine classads coccinelle collada-dom condor dansguardian EMBOSS eterm ettercap exim fsniper gambas2 gambas3 ganglia ghc-hakyll ghc-pcre-light ghc-regex-pcre git gnaughty gnome-mud gnote gource grep gsmartcontrol gxneur haproxy highlighting-kate httpd imapfilter Io-language kannel kaya kdelibs kdelibs3 kismet leafnode ledger less libast libguestfs lighttpd logstalgia lua-rex maildrop matahari mboxgrep mcstrans medusa mod_security mongodb monotone mysql-workbench nekovm nginx ngrep nmap ocaml-ocamlnet octave openCOLLADA openscada openscap opensips ovaldi pads pandoc perl-HTML-Template-Pro php picviz pidgin-musictracker poco postfix prelude-lml privoxy proftpd R regexxer rekall root scilab slang spring-installer sssd suricata swig syncevolution syslog-ng tabled Thunar tin tintin tinyfugue varnish wmweather+ xastir xfce4-verve-plugin xgrep xmlcopyeditor xneur znc-infobot zoneminder 389-ds-base
There is no contingency plan. All reverse dependencies will be rebuilt, possibly adapted to new API, or removed from the distribution.
- pcre16(3) manual page for UTF-16 feature
- pcre_fullinfo(3) manual page as replacement for pcre_info(3)
- tinyfugue conversion from pcre_info() to pcre_fullinfo()
- Private _pcre_valid_utf8() function has been renamed to _pcre_valid_utf()
- UTF-16 support through pcre16 library added
- API change of pcre library documented in NEWS and Changelog