From Fedora Project Wiki


Glibc collation update and sync with cldr

Summary

Update collation data in glibc to an ISO file from 2015 (in sync with Unicode 9.0.0) and sync collation rules of the locales with CLDR.

Owner

  • Name: Mike Fabian
  • Email: <mfabian@redhat.com>
  • Release notes ticket: #79

Current status

  • Targeted release: Fedora 28
  • Last updated: 2018-03-08
  • Tracker bug: #1537247
  • Change is pushed to glibc master branch upstream.
  • I have now backported the change to the glibc 2.27 release branch to make patches for the Fedora 28 glibc rpm packages.

Detailed Description

The collation data in glibc is extremely out of date, most locales base their collation rules on an iso14651_t1_common file which has not been updated for probably more than 15 years. Therefore, all characters added in later Unicode versions are missing and not sorted at all which causes bugs like [Bug 1336308 - Infinite (∞) and empty set (∅) are treated as if they were the same character by sort and uniq]. This change is about updating that iso146541_t1_common file to the latest available version from ISO which is from 2016 and up-to-date with Unicode 9.0.0. Because additions and changes in the syntax of the new iso146541_t1_common file, updating that file requires changing the collation rules of almost all locales. Because all these collation rules have to be touched anyway, this is a good opportunity to fix bugs in the collation ruies and sync them with the collation rules in CLDR.

Benefit to Fedora

This will fix many bugs in the collation and make glibc sort more correctly according to current standards.

Scope

  • Proposal owners: Work with upstream, file bugs and provide patches where required.
  • Other developers: This change will impact glibc and everything which sorts strings using the collation functions from glibc. Other Developers do not need to make any changes from their end, but they need to watch how their application behaves with improved localedata. We need proper testing to see that it does not break any application.
  • Policies and guidelines: No, this change does not require any updates to Policies or packaging guideline updates.
  • Trademark approval: N/A (not needed for this Change)

Upgrade/compatibility impact

The sort order of strings in many locales will change somewhat.

How To Test

Test if locale specific sorting works correctly according to the sorting rules for a locale. Test if characters added up to Unicode 9.0.0 sort correctly.

User Experience

Better sorting of strings by glibc, more up-to-date with current standards.

Dependencies

  • Upstream release schedule.
  • If our patches does not come in upstream, we will not try to patch it in Fedora. So this change will make it into Fedora 28 only if glibc 2.27 is released in time for Fedora 28.

Contingency Plan

  • Contingency mechanism: Will move change to Fedora 29 release.
  • Contingency deadline: Fedora 29 Beta release.
  • Blocks release? No. Yes/No
  • Blocks product? No.

Documentation

[Bug 14095 - Review / update collation data from Unicode / ISO 14651]

Release Notes