Skip to content

PDEP-14: Publish translations of pandas.pydata.org #57204

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
154 changes: 41 additions & 113 deletions web/pandas/pdeps/0014-translate-website-content.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,137 +5,65 @@
- Discussion: [#56301](https://github.com/pandas-dev/pandas/issues/56301)
[#57204](https://github.com/pandas-dev/pandas/pull/57204)
- Author: [Albert Steppi](https://github.com/steppi),
- Revision: 1
- Revision: 2

## Abstract

The suggestion is to have official translations made for content of the core
project website [pandas.pydata.org](https://pandas.pydata.org) and provide a
language drop-down selector on [pandas.pydata.org](https://pandas.pydata.org)
similar to what currently exists at [numpy.org](https://numpy.org).
project website [pandas.pydata.org](https://pandas.pydata.org) and offer
a low friction way for users to access these translations on the core
project website.
Comment on lines +13 to +15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I infer from @datapythonista comments that ideally the proposal should be low friction for contributors as well as the users. i.e. pandas users and volunteers can contribute along side grant funded Quansight staff.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. Grant funded Quansight staff will work mostly on setting up infrastructure, and helping to coordinate and facilitate. The hope is that most of the translators will be volunteers, or will be supported by small grants we could potentially help find for them.


## Motivation, Scope, Usage, and Impact

## Motivation and Scope
There are many potential users with no or a low level of English proficiency
who could benefit from quality official translations of the Pandas website
content. Though translations for all documentation would be valuable,
producing and maintaining translations for such a large and oft-changing
collection of text would take an immense and sustained effort which may
be infeasible. The suggestion is instead to have translations made for only
a key set of pages from the core project website.
Comment on lines +21 to +25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am correct in thinking that anyone would be able to open a PR on the mirror site (if this would be the solution).

documentation changes are often good first issues.

I assume its the approval process that "would take an immense and sustained effort which may be infeasible."

Otherwise do we have an off ramp for when the translator funding ends?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, anyone could open a PR, and anyone could contribute translations on Crowdin after asking for an invite. Our hope is to set up a compounding snowball type effect, where we can help build a community of volunteer translators who can help keep translations up to date.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. pandas is foremost a community driven volunteer project so this aligns well with the project values.


Pandas is a foundational package in the Scientific Python ecosystem and there
are many potential users with no or low English proficiency who would benefit
from having high quality information about Pandas available in their native
language.

Translation of all content presents considerable challenge due to its sheer
volume and due to the tendency for technical documentation to exist in a state
of flux. The suggestion is to have translations for a targeted subset, selected:

- from things which are relatively stable to reduce the ongoing burden of
keeping translations up to date.
- to maximize the benefit to users and potential users who currently have no or
a low level of English proficiency, given the person-hours and resources that
are likely to be available now and into the future.

Consideration of what subset of content would be most useful for users with
no or a low level of English proficiency could be a guiding principal to help
select what information should be available on the core project website, outside
of the technical documentation.

## Detailed Description

The following is a list of all pages on the core project website which are sourced
from markdown files at https://github.com/pandas-dev/pandas/tree/main/web/pandas.

- Landing page: https://pandas.pydata.org
- About pandas: https://pandas.pydata.org/about
- Project roadmap: https://pandas.pydata.org/about/roadmap.html
- Governance: https://pandas.pydata.org/about/governance.html
- Team: https://pandas.pydata.org/about/team.html
- Sponsors: https://pandas.pydata.org/about/sponsors.html
- Citing and logo: https://pandas.pydata.org/about/citing.html
- Getting started: https://pandas.pydata.org/getting_started.html
- Code of conduct: https://pandas.pydata.org/community/coc.html
- Ecosystem: https://pandas.pydata.org/community/ecosystem.html
- Contribute: https://pandas.pydata.org/contribute.html

Provisionally, the suggestion is for all of this content to be translated with
the possible exception of the "Project roadmap", which may be of limited
interest to new users. Currently the "Getting started" section may be of
limited utility to users unable to engage with the externally linked content. In
the "Project roadmap" within the subsection labeled "Documentation improvements"
there is a stated goal to:

*Improve the "Getting Started" documentation, designing and writing learning
paths for users different backgrounds (e.g. brand new to programming, familiar
with other languages like R, already familiar with Python).*

It is recommended that this goal be accomplished alongside translation work in
order to make this page more useful to those with no or low English proficiency.
This would also prevent the need for retranslation if this goal were to be
accomplished after the original translation work is completed.

A language selection drop-down should be added to the navigation-bar similar to
what exists at https://numpy.org.


## Usage and Impact

The primary impact would be lowering the barrier to entry for non-English
speakers to get started using Pandas and moving along the path towards learning
to use it skillfully.

In 2022 it was estimated that there were approximately 400 million native
speakers of English and between 1.5 - 2 billion people who speak English as a
second language worldwide
[Wikipedia](https://web.archive.org/web/20240129080609/https://en.wikipedia.org/wiki/English-speaking_world).
With an estimated world population of over 8 billion people, this leaves many
for whom the Pandas core website is not directly accessible. Pandas is an
important piece of software infrastructure for data manipulation and analysis
with utility beyond the English speaking world. There is a vast population of
users and potential users who could benefit from having official information
about Pandas published in their native language.

Although automated translation tools can help those with no or low English
proficiency access the content of the Pandas website, these tools often still
struggle with the technical and jargon-laden language of scientific
software. This was evinced during the translation of https://numpy.org.
Automatic translation tools are invaluable as a starting point for human
translators, but human translators remain important to ensure accuracy.

## Implementation
## Detailed Description and Implementation

The bulk of the work for setting up translation infrastructure, finding and
vetting translators, and working out how to publish translations, will fall
upon a cross-functional team funded by the [Scientific Python Community & Communications
Infrastructure grant](https://scientific-python.org/doc/scientific-python-community-and-communications-infrastructure-2022.pdf)
to work on adding translations for the main websites of all
[Scientific Python core projects](https://scientific-python.org/specs/core-projects/).
The goal is to minimize the burden on the core Pandas maintainers.

A GitHub repository should be set up to mirror content from the core webpage
which is selected for translation. A GitHub action should be set up to keep
the mirrored repository up-to-date. Either an action within the main Pandas
repo which pushes updates to the mirror, or a cron in the mirror which polls
for relevant updates in Pandas repo and pulls them when necessary.
The hope is to minimize the burden on the core Pandas maintainers.

The mirrored repository would then be synced to the Crowdin localization
management platform as described in
No translated content would be hosted within the Pandas repository itself.
Instead a separate GitHub repository could be set up containing the content
selected for translation. This repository could then be synced to the Crowdin
localization management platform as described in
[Crowdin's documentation](https://support.crowdin.com/github-integration/).
There would be separate folders within the mirror repository, one for each target
language, with the content initially untranslated.
Crowdin would then provide a user interface for translators, and updates
to translations would be pushed to the branch `l10n_main` on the mirrored
repository. Periodically, manual pull requests would be made to the main Pandas
repo, adding translated content within folders alongside of the English content.

Translations will be managed within an enterprise Crowdin organization created for
Scientific Python localization projects. Access to this organization is
invite-only, and translators will be vetted to help safe-guard against the
spamming of low quality or inflammatory translations. Approval from a trusted
admin would be required before translations are merged into the main Pandas
repo.

A language drop-down selector will need to be added to the navigation-bar of
the Pandas website. The plan is for development of a generic solution that
can be reused for all Scientific Python website translations.
Crowdin would then provide a user interface for translators, and updates to
translations would be pushed to a feature branch, with completed translations
periodically merged into `main` after given approval by trusted
language-specific admin's working across the Scientific Python core projects
participating in the translation program. There will be no need for Pandas
maintainers to verify the quality of translations.

The result would be a repository containing parallel versions of content from
pandas.pydata.org, translated into various languages. Translated content could
then be pulled from this repository during generation of the Pandas website. A
low friction means of choosing between languages could then be added. Possibly a
drop-down language selector similar to what now exists for https://numpy.org, or
simple links similar to what now exists for https://www.sympy.org/en/index.html.
A developer supported by the "Scientific Python Community & Communications
Infrastructure grant" could assist with making the changes necessary for the
Pandas website to support publication of translations.

If desired, a cron job could be set up on the repository containing translated
content to check for relevant changes or updates to the Pandas website's content
and pull them if necessary. Translators could then receive a notification from
Crowdin that there are new strings to translate. This could help with the
process of keeping translations up to date.


### PDEP History

- 01 February 2024: Initial draft
- 02 February 2024: First revision