From Fedora Project Wiki
(staging works :))
 
(17 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= Distribution Localization statistics
= Localization measurement and tooling =


== Summary ==
== Summary ==
<!-- A sentence or two summarizing what this change is and what it will do. This information is used for the overall changeset summary page for each release.
 
Note that motivation for the change should be in the Benefit to Fedora section below, and this part should answer the question "What?" rather than "Why?". -->
Provide a public website for end users and contributors, containing Fedora Workstation translation progress and useful files for translators (as an example: translation memories).
Generate per language statistics about the current localization support for the whole Fedora Operating System.
Provide a static website with results and useful files for translators (as an example: translation memories).


== Owner ==
== Owner ==
* Name: [[User:jibecfed|Jean-Baptiste Holcroft]]
* Name: [[User:jibecfed|Jean-Baptiste Holcroft]], [[User:darknao|Francois Andrieu]]
<!-- Include you email address that you can be reached should people want to contact you about helping with your change, status is requested, or technical issues need to be resolved. If the change proposal is owned by a SIG, please also add a primary contact person. -->
<!-- Include you email address that you can be reached should people want to contact you about helping with your change, status is requested, or technical issues need to be resolved. If the change proposal is owned by a SIG, please also add a primary contact person. -->
* Email: <jean-baptiste@holcroft.fr>
* Email: <jean-baptiste@holcroft.fr>
Line 20: Line 18:


== Current status ==
== Current status ==
[[Category:ChangePageIncomplete]]
[[Category:ChangeAcceptedF34]]
<!-- When your change proposal page is completed and ready for review and announcement -->
<!-- When your change proposal page is completed and ready for review and announcement -->
<!-- remove Category:ChangePageIncomplete and change it to Category:ChangeReadyForWrangler -->
<!-- remove Category:ChangePageIncomplete and change it to Category:ChangeReadyForWrangler -->
Line 30: Line 28:
<!-- [[Category:SystemWideChange]] -->
<!-- [[Category:SystemWideChange]] -->


* Targeted release: [[Releases/34 | Fedora 34]]  
* Targeted release: [[Releases/34| Fedora 34]]  
* Last updated: <!-- this is an automatic macro — you don't need to change this line -->  {{REVISIONYEAR}}-{{REVISIONMONTH}}-{{REVISIONDAY2}}  
* Last updated: <!-- this is an automatic macro — you don't need to change this line -->  {{REVISIONYEAR}}-{{REVISIONMONTH}}-{{REVISIONDAY2}}  
<!-- After the change proposal is accepted by FESCo, tracking bug is created in Bugzilla and linked to this page  
<!-- After the change proposal is accepted by FESCo, tracking bug is created in Bugzilla and linked to this page  
Line 39: Line 37:
CLOSED as NEXTRELEASE -> change is completed and verified and will be delivered in next release under development
CLOSED as NEXTRELEASE -> change is completed and verified and will be delivered in next release under development
-->
-->
* FESCo issue: <will be assigned by the Wrangler>
* FESCo issue: [https://pagure.io/fesco/issue/2545 #2545]
* Tracker bug: <will be assigned by the Wrangler>
* Tracker bug: [https://bugzilla.redhat.com/show_bug.cgi?id=1921178 #1921178]
* Release notes tracker: <will be assigned by the Wrangler>
* Release notes tracker: [https://pagure.io/fedora-docs/release-notes/issue/645 #645]
* Staging URL: https://languages.stg.fedoraproject.org/


== Detailed Description ==
== Detailed Description ==
Line 51: Line 50:
The ability to share efforts is limited (with data, tools, etc.):
The ability to share efforts is limited (with data, tools, etc.):


* because of the complexity to get an overview of the current localization status of the Linux community,
* because translators often have a low level of technical knowledge,
* because translators often have a low level of technical knowledge,
* because development experts are more keen to use English by default, and don't know much about languages support requirements,
* because development experts are more keen to use English by default, and don't know much about languages support requirements.
* because of the complexity to get an overview of the current localization status of the Linux community.


Debian did something similar (20 years ago) https://www.debian.org/international/l10n/ (code: https://salsa.debian.org/webmaster-team/webwml/-/commits/master/english/international/l10n/scripts/transmonitor-check)
Debian did something similar (20 years ago) https://www.debian.org/international/l10n/ . But this work:
 
* is limited in terms of features (no translation memories there)
* is too deeply integrated with Debian infrastructure (data extraction, computation and website generation are 100% debian specific)
* is using a programming language that doesn't allow to share easily with existing i18n/l10n libraries (it did not exist 20 years ago)


== Feedback ==
== Feedback ==


* Why not reuse the code from Debian? This source code is deeply integrated with Debian infrastructure, and uses a language that doesn't allow to share easily with existing i18n/l10n tooling/libraries. We do use translation-finder used by Weblate, language data from Weblate, polib from David Jean Louis, Translate Toolkit from translatehouse.org.
<!-- Summarize the feedback from the community and address why you chose not to accept proposed alternatives. This section is optional for all change proposals but is strongly suggested. Incorporating feedback here as it is raised gives FESCo a clearer view of your proposal and leaves a good record for the future. If you get no feedback, that is useful to note in this section as well. For innovative or possibly controversial ideas, consider collecting feedback before you file the change proposal. -->
 
'''Wouldn't it be better to e.g. enhance Weblate to report stats for projects which are externally translated through some different project?'''
 
https://translate.fedoraproject.org/ contains what is specific to Fedora project (documentation, websites, FAS, etc.)
but most of what is contained in the operating system we build is not specific to the Fedora project.
 
Each upstream project decides their translation process.
Gnome: https://l10n.gnome.org/
KDE: https://l10n.kde.org
Mozilla: https://pontoon.mozilla.org/
Libreoffice: https://translations.documentfoundation.org/
etc.
 
What we can measure in https://translate.fedoraproject.org is the health of the Fedora community.
 
What we will measure with this change, is what is what the Linux ecosystem is delivering to end users. Which should help to make the Linux community more effective.
 
Weblate is a translation platform.
Using it to display translations of projects who did not choose to be part of our translation would be equivalent to fork what upstream do.
 
Ubuntu does it with launchpad, but the limit between upstream work and distribution work isn't clear enough.
Translators do some work in launchpad but the translation don't go upstream automatically (which means most of the time it never goes upstream).


<!-- Summarize the feedback from the community and address why you chose not to accept proposed alternatives. This section is optional for all change proposals but is strongly suggested. Incorporating feedback here as it is raised gives FESCo a clearer view of your proposal and leaves a good record for the future. If you get no feedback, that is useful to note in this section as well. For innovative or possibly controversial ideas, consider collecting feedback before you file the change proposal. -->
This would probably be really confusing for end-users and Fedora community would find it incompatible with Fedora values.
 
In addition, Weblate is a great tool, but really complex and moving quite fast.
We do share technical components (translate toolkit and language lists), but more won't make sense for our usecase.
 
'''translations should be controlled by upstream projects, not distributions. Fedora should stay closer to the upstream projects, not drift away from them.'''
 
This change is aligned with this, one idea would be to help contributors to understand where to go translate each project
upstream.
 
The Language-Team attribute probably is good enough to lead a contributor at the right
place, here are a few examples for French language:
 
0ad: "Language-Team: French (http://www.transifex.com/wildfire-games/0ad/language/fr/)\n"
ABRT: "Language-Team: French <https://translate.fedoraproject.org/projects/abrt/"
Apstream: "Language-Team: French <https://hosted.weblate.org/projects/appstream/"
Audacious: "Language-Team: French (http://www.transifex.com/audacious/audacious/language/fr/)\n"
Gnome shell: "Language-Team: GNOME French Team <gnomefr(a)traduc.org"
Krita: "Language-Team: French <kde-francophone(a)kde.org"
 
'''Why not using transtats? What's the future of transtats?'''
 
Transtats covers 100 manually configured packages, while the change does the following (stats are for f33):
 
* use dnf to download all srpm for a fedora relaese (21330 packages)
* detect po files (2230 packages have at least one po file, more file format exists, but
it will be for the future ;))
* extract all po files (200 337 po files)
* deduct language list (344 languages)
* produce stats and consolidated files (16GB of files before compression)
* publish a website (2 GB once files are compressed)
 
The Transtats UI is good, but it really is focused on translation propagation across systems, bringing a huge complexity.
 
We could probably try to merge both tools together by writing down the goals each tool want to achieve.
Measuring the usage of transtats would help to identify if some features are to be preserved.
 
Workshop is to be organized to build a plan. Proposed date is before flock (if this is a physical Flock).


== Benefit to Fedora ==
== Benefit to Fedora ==
Line 67: Line 129:
<!-- What is the benefit to the distribution?  Will the software we generate be improved? How will the process of creating Fedora releases be improved? -->
<!-- What is the benefit to the distribution?  Will the software we generate be improved? How will the process of creating Fedora releases be improved? -->


Help the Linux community to face understand the language support challenges by providing measurement.
It is a progress for the project: provide a new tool to translator community.
Increase the contributor effectiveness by providing translation memories and other tools.
 
Opens the possibility to change the translation file release process.
It helps the Linux community to better understand the language support challenges.
 
It increases contributors effectiveness by providing translation memories and other tools.
 
These translation memories open new possibilities:
 
* to train machines to suggest new translations?
* to detect quality issues (spellcheck, linters, etc)?
* the change the way we ship translations to users? (Ubuntu does it, but never bring back translation to main project)
* to advertise user that Linux is available in many languages?


== Scope ==
== Scope ==
Line 76: Line 147:


* Proposal owners:
* Proposal owners:
* [[User:Darknao|Francois Andrieu]] integrate the existing scripts into containers to allow execution into openshift
** [[User:Darknao|Francois Andrieu]] integrate the existing scripts into containers to allow execution into openshift
* Infra team:
** Infra team:
** provide some space for script execution (50 GB per release)
*** provide some space for script execution (50 GB per release)
** provide a location for static website (about 2 GB per release, may increase over time)
*** provide the languages.fedoraproject.org domain name
*** provide a location for static website (about 2 GB per release, may increase over time)


<!-- What work do the feature owners have to accomplish to complete the feature in time for release?  Is it a large change affecting many parts of the distribution or is it a very isolated change? What are those changes?-->
<!-- What work do the feature owners have to accomplish to complete the feature in time for release?  Is it a large change affecting many parts of the distribution or is it a very isolated change? What are those changes?-->
* Other developers: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
* Other developers: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
<!-- What work do other developers have to accomplish to complete the feature in time for release?  Is it a large change affecting many parts of the distribution or is it a very isolated change? What are those changes?-->
<!-- What work do other developers have to accomplish to complete the feature in time for release?  Is it a large change affecting many parts of the distribution or is it a very isolated change? What are those changes?-->
* Release engineering: [https://pagure.io/releng/issues #Releng issue number] (a check of an impact with Release Engineering is needed) <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
* Release engineering: [https://pagure.io/releng/issues #Releng issue number] (a check of an impact with Release Engineering is needed) <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
<!-- Does this feature require coordination with release engineering (e.g. changes to installer image generation or update package delivery)?  Is a mass rebuild required?  include a link to the releng issue.  
<!-- Does this feature require coordination with release engineering (e.g. changes to installer image generation or update package delivery)?  Is a mass rebuild required?  include a link to the releng issue.  
The issue is required to be filed prior to feature submission, to ensure that someone is on board to do any process development work and testing and that all changes make it into the pipeline; a bullet point in a change is not sufficient communication -->
The issue is required to be filed prior to feature submission, to ensure that someone is on board to do any process development work and testing and that all changes make it into the pipeline; a bullet point in a change is not sufficient communication -->
* Policies and guidelines: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
* Policies and guidelines: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
<!-- Do the packaging guidelines or other documents need to be updated for this feature?  If so, does it need to happen before or after the implementation is done?  If a FPC ticket exists, add a link here. -->
<!-- Do the packaging guidelines or other documents need to be updated for this feature?  If so, does it need to happen before or after the implementation is done?  If a FPC ticket exists, add a link here. -->
* Trademark approval: N/A (not needed for this Change)
* Trademark approval: N/A (not needed for this Change)
<!-- If your Change may require trademark approval (for example, if it is a new Spin), file a ticket ( https://fedorahosted.org/council/ ) requesting trademark approval from the Fedora Council. This approval will be done via the Council's consensus-based process. -->
<!-- If your Change may require trademark approval (for example, if it is a new Spin), file a ticket ( https://fedorahosted.org/council/ ) requesting trademark approval from the Fedora Council. This approval will be done via the Council's consensus-based process. -->
 
* Alignment with mission: ''In our community, contributors of all kinds come together to advance the ecosystem for the benefit of everyone.''
* Alignment with Objectives:  
.
<!-- Does your proposal align with the current Fedora Objectives: https://docs.fedoraproject.org/en-US/project/objectives/ ? It's okay if it doesn't, but it's something to consider -->
<!-- Does your proposal align with the current Fedora Objectives: https://docs.fedoraproject.org/en-US/project/objectives/ ? It's okay if it doesn't, but it's something to consider -->


Line 155: Line 223:


A draft with simplistic template is there: https://jibecfed.fedorapeople.org/partage/fedora-localization-statistics/f32/language/fr/
A draft with simplistic template is there: https://jibecfed.fedorapeople.org/partage/fedora-localization-statistics/f32/language/fr/
Code and "documentation" are there: https://pagure.io/fedora-localization-statistics
Code and "documentation" are there: https://pagure.io/fedora-localization-statistics


<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
About other project:
N/A (not a System Wide Change)
 
* Debian's code to build website with language progress: https://salsa.debian.org/webmaster-team/webwml/-/commits/master/english/international/l10n/scripts/transmonitor-check
* Ubuntu's code to build langpacks: https://bazaar.launchpad.net/~ubuntu-langpack/langpack-o-matic/main/files
** Note: ubuntu does provide language progress in launchpad: https://translations.launchpad.net/ubuntu and some useful documentation is there: https://dev.launchpad.net/Translations


== Release Notes ==
== Release Notes ==

Latest revision as of 04:51, 27 March 2021

Localization measurement and tooling

Summary

Provide a public website for end users and contributors, containing Fedora Workstation translation progress and useful files for translators (as an example: translation memories).

Owner

Current status

Detailed Description

Language support is a transversal activity, there is no way to know the actual language support provided by Fedora as an Operating System.

Because language support and translations are part of each upstream software, the Linux language community is as spread as the Free Libre and Open Source community is.

The ability to share efforts is limited (with data, tools, etc.):

  • because of the complexity to get an overview of the current localization status of the Linux community,
  • because translators often have a low level of technical knowledge,
  • because development experts are more keen to use English by default, and don't know much about languages support requirements.

Debian did something similar (20 years ago) https://www.debian.org/international/l10n/ . But this work:

  • is limited in terms of features (no translation memories there)
  • is too deeply integrated with Debian infrastructure (data extraction, computation and website generation are 100% debian specific)
  • is using a programming language that doesn't allow to share easily with existing i18n/l10n libraries (it did not exist 20 years ago)

Feedback

Wouldn't it be better to e.g. enhance Weblate to report stats for projects which are externally translated through some different project?

https://translate.fedoraproject.org/ contains what is specific to Fedora project (documentation, websites, FAS, etc.) but most of what is contained in the operating system we build is not specific to the Fedora project.

Each upstream project decides their translation process. Gnome: https://l10n.gnome.org/ KDE: https://l10n.kde.org Mozilla: https://pontoon.mozilla.org/ Libreoffice: https://translations.documentfoundation.org/ etc.

What we can measure in https://translate.fedoraproject.org is the health of the Fedora community.

What we will measure with this change, is what is what the Linux ecosystem is delivering to end users. Which should help to make the Linux community more effective.

Weblate is a translation platform. Using it to display translations of projects who did not choose to be part of our translation would be equivalent to fork what upstream do.

Ubuntu does it with launchpad, but the limit between upstream work and distribution work isn't clear enough. Translators do some work in launchpad but the translation don't go upstream automatically (which means most of the time it never goes upstream).

This would probably be really confusing for end-users and Fedora community would find it incompatible with Fedora values.

In addition, Weblate is a great tool, but really complex and moving quite fast. We do share technical components (translate toolkit and language lists), but more won't make sense for our usecase.

translations should be controlled by upstream projects, not distributions. Fedora should stay closer to the upstream projects, not drift away from them.

This change is aligned with this, one idea would be to help contributors to understand where to go translate each project upstream.

The Language-Team attribute probably is good enough to lead a contributor at the right place, here are a few examples for French language:

0ad: "Language-Team: French (http://www.transifex.com/wildfire-games/0ad/language/fr/)\n" ABRT: "Language-Team: French <https://translate.fedoraproject.org/projects/abrt/" Apstream: "Language-Team: French <https://hosted.weblate.org/projects/appstream/" Audacious: "Language-Team: French (http://www.transifex.com/audacious/audacious/language/fr/)\n" Gnome shell: "Language-Team: GNOME French Team <gnomefr(a)traduc.org" Krita: "Language-Team: French <kde-francophone(a)kde.org"

Why not using transtats? What's the future of transtats?

Transtats covers 100 manually configured packages, while the change does the following (stats are for f33):

  • use dnf to download all srpm for a fedora relaese (21330 packages)
  • detect po files (2230 packages have at least one po file, more file format exists, but

it will be for the future ;))

  • extract all po files (200 337 po files)
  • deduct language list (344 languages)
  • produce stats and consolidated files (16GB of files before compression)
  • publish a website (2 GB once files are compressed)

The Transtats UI is good, but it really is focused on translation propagation across systems, bringing a huge complexity.

We could probably try to merge both tools together by writing down the goals each tool want to achieve. Measuring the usage of transtats would help to identify if some features are to be preserved.

Workshop is to be organized to build a plan. Proposed date is before flock (if this is a physical Flock).

Benefit to Fedora

It is a progress for the project: provide a new tool to translator community.

It helps the Linux community to better understand the language support challenges.

It increases contributors effectiveness by providing translation memories and other tools.

These translation memories open new possibilities:

  • to train machines to suggest new translations?
  • to detect quality issues (spellcheck, linters, etc)?
  • the change the way we ship translations to users? (Ubuntu does it, but never bring back translation to main project)
  • to advertise user that Linux is available in many languages?

Scope

All of the work is isolated, as long as dnf works, the automation works. The closer to mirror the cheaper it is for network cost (all Fedora is downloaded at each execution).

  • Proposal owners:
    • Francois Andrieu integrate the existing scripts into containers to allow execution into openshift
    • Infra team:
      • provide some space for script execution (50 GB per release)
      • provide the languages.fedoraproject.org domain name
      • provide a location for static website (about 2 GB per release, may increase over time)
  • Other developers: N/A (not a System Wide Change)
  • Release engineering: #Releng issue number (a check of an impact with Release Engineering is needed)
  • Policies and guidelines: N/A (not a System Wide Change)
  • Trademark approval: N/A (not needed for this Change)
  • Alignment with mission: In our community, contributors of all kinds come together to advance the ecosystem for the benefit of everyone.

.

Upgrade/compatibility impact

N/A (not a System Wide Change)

How To Test

N/A (not a System Wide Change)

User Experience

Dependencies

N/A (not a System Wide Change)

Contingency Plan

  • Contingency mechanism: (What to do? Who will do it?) N/A (not a System Wide Change)
  • Contingency deadline: N/A (not a System Wide Change)
  • Blocks release? N/A (not a System Wide Change), Yes/No
  • Blocks product? product

Documentation

A draft with simplistic template is there: https://jibecfed.fedorapeople.org/partage/fedora-localization-statistics/f32/language/fr/

Code and "documentation" are there: https://pagure.io/fedora-localization-statistics

About other project:

Release Notes