-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PDEP-14: Publish translations of pandas.pydata.org #57204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,137 +5,65 @@ | |
- Discussion: [#56301](https://github.com/pandas-dev/pandas/issues/56301) | ||
[#57204](https://github.com/pandas-dev/pandas/pull/57204) | ||
- Author: [Albert Steppi](https://github.com/steppi), | ||
- Revision: 1 | ||
- Revision: 2 | ||
|
||
## Abstract | ||
|
||
The suggestion is to have official translations made for content of the core | ||
project website [pandas.pydata.org](https://pandas.pydata.org) and provide a | ||
language drop-down selector on [pandas.pydata.org](https://pandas.pydata.org) | ||
similar to what currently exists at [numpy.org](https://numpy.org). | ||
project website [pandas.pydata.org](https://pandas.pydata.org) and offer | ||
a low friction way for users to access these translations on the core | ||
project website. | ||
Comment on lines
+13
to
+15
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I infer from @datapythonista comments that ideally the proposal should be low friction for contributors as well as the users. i.e. pandas users and volunteers can contribute along side grant funded Quansight staff. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that's right. Grant funded Quansight staff will work mostly on setting up infrastructure, and helping to coordinate and facilitate. The hope is that most of the translators will be volunteers, or will be supported by small grants we could potentially help find for them. |
||
|
||
## Motivation, Scope, Usage, and Impact | ||
|
||
## Motivation and Scope | ||
There are many potential users with no or a low level of English proficiency | ||
who could benefit from quality official translations of the Pandas website | ||
content. Though translations for all documentation would be valuable, | ||
producing and maintaining translations for such a large and oft-changing | ||
collection of text would take an immense and sustained effort which may | ||
be infeasible. The suggestion is instead to have translations made for only | ||
a key set of pages from the core project website. | ||
Comment on lines
+21
to
+25
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am correct in thinking that anyone would be able to open a PR on the mirror site (if this would be the solution). documentation changes are often good first issues. I assume its the approval process that "would take an immense and sustained effort which may be infeasible." Otherwise do we have an off ramp for when the translator funding ends? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, anyone could open a PR, and anyone could contribute translations on Crowdin after asking for an invite. Our hope is to set up a compounding snowball type effect, where we can help build a community of volunteer translators who can help keep translations up to date. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great. pandas is foremost a community driven volunteer project so this aligns well with the project values. |
||
|
||
Pandas is a foundational package in the Scientific Python ecosystem and there | ||
are many potential users with no or low English proficiency who would benefit | ||
from having high quality information about Pandas available in their native | ||
language. | ||
|
||
Translation of all content presents considerable challenge due to its sheer | ||
volume and due to the tendency for technical documentation to exist in a state | ||
of flux. The suggestion is to have translations for a targeted subset, selected: | ||
|
||
- from things which are relatively stable to reduce the ongoing burden of | ||
keeping translations up to date. | ||
- to maximize the benefit to users and potential users who currently have no or | ||
a low level of English proficiency, given the person-hours and resources that | ||
are likely to be available now and into the future. | ||
|
||
Consideration of what subset of content would be most useful for users with | ||
no or a low level of English proficiency could be a guiding principal to help | ||
select what information should be available on the core project website, outside | ||
of the technical documentation. | ||
|
||
## Detailed Description | ||
|
||
The following is a list of all pages on the core project website which are sourced | ||
from markdown files at https://github.com/pandas-dev/pandas/tree/main/web/pandas. | ||
|
||
- Landing page: https://pandas.pydata.org | ||
- About pandas: https://pandas.pydata.org/about | ||
- Project roadmap: https://pandas.pydata.org/about/roadmap.html | ||
- Governance: https://pandas.pydata.org/about/governance.html | ||
- Team: https://pandas.pydata.org/about/team.html | ||
- Sponsors: https://pandas.pydata.org/about/sponsors.html | ||
- Citing and logo: https://pandas.pydata.org/about/citing.html | ||
- Getting started: https://pandas.pydata.org/getting_started.html | ||
- Code of conduct: https://pandas.pydata.org/community/coc.html | ||
- Ecosystem: https://pandas.pydata.org/community/ecosystem.html | ||
- Contribute: https://pandas.pydata.org/contribute.html | ||
|
||
Provisionally, the suggestion is for all of this content to be translated with | ||
the possible exception of the "Project roadmap", which may be of limited | ||
interest to new users. Currently the "Getting started" section may be of | ||
limited utility to users unable to engage with the externally linked content. In | ||
the "Project roadmap" within the subsection labeled "Documentation improvements" | ||
there is a stated goal to: | ||
|
||
*Improve the "Getting Started" documentation, designing and writing learning | ||
paths for users different backgrounds (e.g. brand new to programming, familiar | ||
with other languages like R, already familiar with Python).* | ||
|
||
It is recommended that this goal be accomplished alongside translation work in | ||
order to make this page more useful to those with no or low English proficiency. | ||
This would also prevent the need for retranslation if this goal were to be | ||
accomplished after the original translation work is completed. | ||
|
||
A language selection drop-down should be added to the navigation-bar similar to | ||
what exists at https://numpy.org. | ||
|
||
|
||
## Usage and Impact | ||
|
||
The primary impact would be lowering the barrier to entry for non-English | ||
speakers to get started using Pandas and moving along the path towards learning | ||
to use it skillfully. | ||
|
||
In 2022 it was estimated that there were approximately 400 million native | ||
speakers of English and between 1.5 - 2 billion people who speak English as a | ||
second language worldwide | ||
[Wikipedia](https://web.archive.org/web/20240129080609/https://en.wikipedia.org/wiki/English-speaking_world). | ||
With an estimated world population of over 8 billion people, this leaves many | ||
for whom the Pandas core website is not directly accessible. Pandas is an | ||
important piece of software infrastructure for data manipulation and analysis | ||
with utility beyond the English speaking world. There is a vast population of | ||
users and potential users who could benefit from having official information | ||
about Pandas published in their native language. | ||
|
||
Although automated translation tools can help those with no or low English | ||
proficiency access the content of the Pandas website, these tools often still | ||
struggle with the technical and jargon-laden language of scientific | ||
software. This was evinced during the translation of https://numpy.org. | ||
Automatic translation tools are invaluable as a starting point for human | ||
translators, but human translators remain important to ensure accuracy. | ||
|
||
## Implementation | ||
## Detailed Description and Implementation | ||
|
||
The bulk of the work for setting up translation infrastructure, finding and | ||
vetting translators, and working out how to publish translations, will fall | ||
upon a cross-functional team funded by the [Scientific Python Community & Communications | ||
Infrastructure grant](https://scientific-python.org/doc/scientific-python-community-and-communications-infrastructure-2022.pdf) | ||
to work on adding translations for the main websites of all | ||
[Scientific Python core projects](https://scientific-python.org/specs/core-projects/). | ||
The goal is to minimize the burden on the core Pandas maintainers. | ||
|
||
A GitHub repository should be set up to mirror content from the core webpage | ||
which is selected for translation. A GitHub action should be set up to keep | ||
the mirrored repository up-to-date. Either an action within the main Pandas | ||
repo which pushes updates to the mirror, or a cron in the mirror which polls | ||
for relevant updates in Pandas repo and pulls them when necessary. | ||
The hope is to minimize the burden on the core Pandas maintainers. | ||
|
||
The mirrored repository would then be synced to the Crowdin localization | ||
management platform as described in | ||
No translated content would be hosted within the Pandas repository itself. | ||
Instead a separate GitHub repository could be set up containing the content | ||
selected for translation. This repository could then be synced to the Crowdin | ||
localization management platform as described in | ||
[Crowdin's documentation](https://support.crowdin.com/github-integration/). | ||
There would be separate folders within the mirror repository, one for each target | ||
language, with the content initially untranslated. | ||
Crowdin would then provide a user interface for translators, and updates | ||
to translations would be pushed to the branch `l10n_main` on the mirrored | ||
repository. Periodically, manual pull requests would be made to the main Pandas | ||
repo, adding translated content within folders alongside of the English content. | ||
|
||
Translations will be managed within an enterprise Crowdin organization created for | ||
Scientific Python localization projects. Access to this organization is | ||
invite-only, and translators will be vetted to help safe-guard against the | ||
spamming of low quality or inflammatory translations. Approval from a trusted | ||
admin would be required before translations are merged into the main Pandas | ||
repo. | ||
|
||
A language drop-down selector will need to be added to the navigation-bar of | ||
the Pandas website. The plan is for development of a generic solution that | ||
can be reused for all Scientific Python website translations. | ||
Crowdin would then provide a user interface for translators, and updates to | ||
translations would be pushed to a feature branch, with completed translations | ||
periodically merged into `main` after given approval by trusted | ||
language-specific admin's working across the Scientific Python core projects | ||
participating in the translation program. There will be no need for Pandas | ||
maintainers to verify the quality of translations. | ||
|
||
The result would be a repository containing parallel versions of content from | ||
pandas.pydata.org, translated into various languages. Translated content could | ||
then be pulled from this repository during generation of the Pandas website. A | ||
low friction means of choosing between languages could then be added. Possibly a | ||
drop-down language selector similar to what now exists for https://numpy.org, or | ||
simple links similar to what now exists for https://www.sympy.org/en/index.html. | ||
A developer supported by the "Scientific Python Community & Communications | ||
Infrastructure grant" could assist with making the changes necessary for the | ||
Pandas website to support publication of translations. | ||
|
||
If desired, a cron job could be set up on the repository containing translated | ||
content to check for relevant changes or updates to the Pandas website's content | ||
and pull them if necessary. Translators could then receive a notification from | ||
Crowdin that there are new strings to translate. This could help with the | ||
process of keeping translations up to date. | ||
|
||
|
||
### PDEP History | ||
|
||
- 01 February 2024: Initial draft | ||
- 02 February 2024: First revision | ||
simonjayhawkins marked this conversation as resolved.
Show resolved
Hide resolved
|
Uh oh!
There was an error while loading. Please reload this page.