Skip to content
This repository was archived by the owner on Sep 20, 2023. It is now read-only.

[Discussion] Localization for Schemas #256

Closed
jordajm opened this issue Jun 5, 2018 · 21 comments
Closed

[Discussion] Localization for Schemas #256

jordajm opened this issue Jun 5, 2018 · 21 comments
Assignees

Comments

@jordajm
Copy link
Contributor

jordajm commented Jun 5, 2018

The problem

We currently have a category object in each schema with an enum array similar to this:

 "category": {
      "type": "string",
      "title": "Category",
      "enum": [
        "Airplane Charter",
        "Bike Rentals",
        "Boat Rentals",
        "Car Rentals",
        "Taxi service",
        "Yacht charters"
      ],
      "default":"Car Rentals"
    },

The problem is that those enums are in English, and since they are in the schema instead of the React code, they don't fit easily into the react-intl patterns we're using throughout the rest of the app.

This is a common problem and it has been a regular topic of discussion among json-schema contributors.

The general consensus is that UI-specific elements like translated strings shouldn't be mixed into json-schema JSON documents, so we should probably come up with a solution for this in the React layer.

Proposed Solution

I propose building a util file in src/utils/ that will provide a translation of each enum that is available in a schema. This util can be used by any component to render a translated string representing any enum.

It would work similarly to other places in the app where we use translated strings in plain JavaScript (as opposed to JSX) - For example, https://github.com/OriginProtocol/demo-dapp/blob/develop/src/components/listing-create.js#L34-L59

Those strings would then be translated along with all of the other strings and made available for consumption by components.

@jordajm jordajm self-assigned this Jun 5, 2018
@wanderingstan
Copy link
Contributor

wanderingstan commented Jun 5, 2018

Good foresight. Immediate thoughts:

IIRC, those strings are rendered by react-jsonschema-form. That would mean us getting a PR accepted to them, or forking. This is okay, we also will need to do a PR for #24 and maybe #44

I would propose that we change the text in the enums from English to id tags, same as the id tags used for strings throughout the dapp, (e.g. the ids in this Chinese json doc). All languages (including English) are then handled/rendered the same, presumably using something like formatmessage within react-jsonschema-form.

So the schema would then look something more like:

 "category": {
      "type": "string",
      "title": "schema.category",
      "enum": [
        "schema.rental.airplane",
        "schema.rental.bicycle",
        "schema.rental.boat",
        "schema.rental.car",
        "schema.rental.taxi",
        "schema.rental.yacht"
      ],
      "default":"rental.car"
    },

@jordajm
Copy link
Contributor Author

jordajm commented Jun 5, 2018

I like that idea @wanderingstan - thanks...
I think we'd need to make some kind of util that looks for enums in schema files and pairs them with defaultMessage strings that can be passed into react-intl and used within the app (and in react-jsonschema-form). You are correct that the form uses the enums, and they are also used in the listing card and maybe other places.

The util might work something like this:

import { defineMessages } from 'react-intl'
const schemaFiles = // get the schema files and parse them
const allEnums = // find all enums in all schema files

// This object would have to be manually updated each time the enums change in the schema files
const enumMap = {
        "schema.rental.airplane": "Airplane",
        "schema.rental.bicycle": "Bicycle",
        "schema.rental.boat": "Boat"
}

const mappedEnums = // loop over enums and create react-intl objects with id and defaultMessage

defineMessages(mappedEnums)

So, aside from maintaining the enumMap, I think this would allow the enums would work right into our normal i18n workflow.

If you agree, I'll close this issue and create another one describing the creation of the util and the conversion of enum English strings to IDs

@wanderingstan
Copy link
Contributor

wanderingstan commented Jun 6, 2018

Two follow up questions:

Do we need a separate enumMap var? I was hoping we could use the ids in translated-messages.json, and then all our translations live in one big happy json file.

And are you proposing that we would loop over the schema json and do our own id-to-translation substitution before passing it into react-jsonschema-form? If so, then I fear that would mean that the json data returned from the user's form would have localized text for the enums, which is not what we'd want to put into IFPS.

@jordajm
Copy link
Contributor Author

jordajm commented Jun 6, 2018

Do we need a separate enumMap var?

Well, we need a defaultMessage paired with each id for it to work with our react-intl setup, and the enum in your example will only provide the ID.

However, your comment about react-jsonschema-form got me doing some more reading and I found this.

Using the enumNames property, we could do something like this:

const schema = {
  type: "string",
  enum: ["schema.rental.airplane", "schema.rental.bicycle", "schema.rental.boat"],
  enumNames: ["Airplane", "Bicycle", "Boat"]
};

Which would render a <select> element in the form like this:

<select>
  <option value="schema.rental.airplane">Airplane</option>
  <option value="schema.rental.bicycle">"Bicycle"</option>
  <option value="schema.rental.boat">Boat</option>
</select>

We could then write a util similar to the one I mentioned above that pairs enum with enumName to create an id and defaultMessage to send to react-intl. This is nice b/c we don't have to maintain key/value pairs to map IDs to defaultMessages in the helper itself... all the data stays in the schema.

Before passing the schema into react-jsonschema-form, we could run some translations on the enumNames to translate them, but the enum itself would still be the id so the select would look like:

<select>
  <option value="schema.rental.airplane">地址</option>
  <option value="schema.rental.bicycle">"将更改发布"</option>
  <option value="schema.rental.boat">没有说明的原</option>
</select>

This would mean that we'd store the ID (not localized text) in IPFS, which we could then convert to a localized string when we render the listing card in the DApp.

Whatchathinkaboutthat?

@wanderingstan
Copy link
Contributor

wanderingstan commented Jun 6, 2018

Great find on the "Custom labels for enum fields"! Am I understanding correctly that with this, we definitely don't need any changes in react-jsonschema-form?

This would mean that we'd store the ID (not localized text) in IPFS...

This is definitely what we want to be storing in the IPFS json, not a human-language string.

Well, we need a defaultMessage paired with each id for it to work with our react-intl setup, and the enum in your example will only provide the ID.

Ah, is the defaultMessage a requirement for react-intl to find the English text to be translated? That would make sense. Do you know how the text extraction process takes place? Is there some other way we can mark strings as "needs a translation, with this id" other than via the <FormattedMessage> component? Or perhaps we could have a dummy component file with <FormattedMessage> tags for all of our schema enum strings. It would never be rendered, but the text would get extracted for translation.

We could then write a util similar to the one I mentioned above that pairs enum with enumName to create an id and defaultMessage to send to react-intl.

I'm a little confused about "sending" to react-intl. When the util is processing the schema json in preparation for react-jsonschema-form, couldn't it use a function like formatmessage to directly get the enumName for the enum (aka id)?

Sorry this is a bit scattered. Correct me if I'm wrong. I'm imagining the (possible) steps as:

  1. Extraction We have a dummy component file with <FormattedMessage> components for all schema enum strings. E.g.
            <FormattedMessage
              id={ 'schema.rental.airplane' }
              defaultMessage={ 'Airplane Rental' }
            />

This will cause our English text to get extracted. Thus we will end up with translations in the the translated-messages.json file, and accessible via react-intl.

  1. Pre-processing Before passing schema into react-jsonschema-form, your util will augment the json with the custom labels, e.g. for Chinese user:
  enum: ["schema.rental.airplane", "schema.rental.bicycle", "schema.rental.boat"],
  enumNames: ["飞机租赁", "自行车出租", "你可以读中文"]

The enumNames in this case are fetched from react-intl using formatmessage function or similar.

  1. Rendering We pass our augmented schema json into react-jsonschema-form, which --with no modifications needed!-- will render the form with translated text. The resulting json with user-submitted data will contain only the ids (e.g. schema.rental.airplane) not the human-language text.

@jordajm
Copy link
Contributor Author

jordajm commented Jun 6, 2018

Yes, you've got it exactly, with one small caveat...

Instead of building a component that never renders, we can use the defineMessages function in react-intl as I did in my example above to extract the messages and end up with translations in translated-messages.json. Here's an example, updated with the new knowledge about enumNames:

import { defineMessages } from 'react-intl'
const schemaFiles = // get the schema files and parse them
const enumsWithNames = // build objects that can be consumed by `react-intl` 
                       // with enum as `id` and enumName as `defaultMessage`

defineMessages(enumsWithNames)

So, there will actually need to be three utils, which I wasn't clear about before:

  1. Loop over the schemas, find enum's, pair them with enumNames, and extract them for translation
  2. Before rendering the react-jsonschema-form, translate the enumNames array values using the formatMessage function in react-intl like I did in many other components (see link in description of this issue for an example) (and yes, no PR needed to react-jsonschema-form :-)
  3. Before rendering a component that relies on a schema enum, translate the enumNames using formatMessage function in react-intl

@wanderingstan
Copy link
Contributor

wanderingstan commented Jun 6, 2018

Ah ha! It's all clicking now! Didn't realize that defineMessages() was the magic function to tell react-intl that you're providing more strings for translation. Mistakenly thought enumMap was something we'd have to translate on our own.

Sorry for the derail, and thanks for the lesson in react-intl!

One thought: Seems this utility would be of use to lots of other people; anyone using react-jsonschema-form with react-intl. A stretch goal would be to make your util into a standalone npm module that we could share with the world. (Or encorporate into PR to react-jsonschema-form)?

EDIT: Relevant thread - rjsf-team/react-jsonschema-form#739

The FieldTemplate approach could also be interesting to us, as it would also be needed for #44 (aka, the issue that always is wanted, but never quite gets done. :) )

@jordajm
Copy link
Contributor Author

jordajm commented Jun 6, 2018

That's a great idea 💯
There are a ton of GitHub issues and stackoverflow posts about localization of json-schema and react-jsonschema-form.

I think our NPM package would be named something like json-schema-react-intl because it would provide the (often requested) connector between the two.

It could have two methods to start with:

  1. extractMessagesFromSchema or just extractMessages
  2. translateEnumNames

Also, we could let all of the relevant json-schema-related and react-intl-related library maintainers know about it and see if they'll accept a PR linking to it in their docs or something. Might be good exposure for Origin 👍

@wanderingstan
Copy link
Contributor

Nice!

Rather than functions, could we just wrap the Form component with our own, eg FormIntl? Eg:

  <FormIntl schema={schema}
        formData={formData} locale={locale}/>

Then all the magic of translating the enunNames happens behind the scenes.

They would just have to do their own defineMessages() with with IDs and default English.

@jordajm
Copy link
Contributor Author

jordajm commented Jun 7, 2018

That's a cool idea. I think the two functions I mentioned above would still be valuable though, because the work of extracting the messages from the schema is a bit complex, and so is translating the enum names "in-place" in the enumNames array.

But yes, it would be nice to offer a way to do it via react component that wraps the react-jsonschema-form - maybe in the first version or maybe as a follow-up task...

@jordajm
Copy link
Contributor Author

jordajm commented Jun 7, 2018

oof... I was so focused on enums that I forgot about the need to translate the other form field types...

For normal (non-enum) schema objects, like the ones used to generate form fields, I think the best way to localize them might be to replace the title with an ID like we're going to do with enums, then we could add a defaultMessage property to the object that would contain the English string for the label.

So instead of a text field for a location looking like this:

"location": {
     "type": "string",
     "title": "Location"
}

It would look like this:

"location": {
     "type": "string",
     "title": "schema.rental.lcation",
     "defaultMessage": "Location"
}

We could then extract those messages with the utility functions described above so they can be translated and included in translated-messages.json, and we could also translate the labels in-place (in the schema itself, like we're going to do for enums) before we pass the schema into react-jsonschema-form.

Thoughts on this @wanderingstan ?

@jordajm
Copy link
Contributor Author

jordajm commented Jun 7, 2018

Created #258 to describe the work needed for localizing schemas. Will close this issue once #258 is finalized and ready for development to begin.

@wanderingstan
Copy link
Contributor

It seems a little odd to me that the defaultMessage (ie the English) would be stored directly in the Schema json. As with the enums, wouldn't we want to define those externally with a call to defineMessages()?

You've read more on this than me: has really no one encountered this before? It seems a major oversight of the jsonschema people that they would just have people put raw English (or whatever) into the schema, not thinking of translations.

I wonder if there is (or should be?) some convention to indicate "hey, this is an id of a string, not an actual human language string!". E.g.

"location": {
     "type": "string",
     "title": "@schema.rental.location"
}

to use the over-used @ sign!

@jordajm
Copy link
Contributor Author

jordajm commented Jun 8, 2018

@wanderingstan Unfortunately, there doesn't seem to be a consensus among json-schema folks about how to localize strings in schemas. There's a whole other concept in react-jsonschema-form known as "UI Schema", and some people use that to localize forms. Here's an example. But, to me it seems clunky and overkill for simple localization. But in theory we could include something similar in each schema file to map IDs to defaultMessages for each string in a schema (both enums and form field labels)

As with the enums, wouldn't we want to define those externally with a call to defineMessages()?

What I'm suggesting is that we store english strings in json-schema to serve as the defaultMessage for both enums and form field labels. I know that seems janky, but the those defaultMessage (english) strings have to be stored somewhere and if we don't store them in the schema, all developers have to know to edit two different files when they add something to a schema (the schema plus some kind of key/val map to link IDs with their defaultMessage)

So I think it makes sense to put both ID and defaultMessage in the schema so all the data is in one place and obvious to all devs how to add/edit a schema. But open to suggestions.

@wanderingstan
Copy link
Contributor

wanderingstan commented Jun 8, 2018

Hmm..we're wading into heavy territory here!

If these schemas are to represent a standard, the purist in me says English really doesn't belong there—this is supposed to be just the schema for data. If someone in France decides to extend the schema, would they also be required to put the defaultMessage in English? (Good luck with that 😉)

I don't care super strongly as I'm sure we'll iterate on this a lot more times, but my gut is to just put the "pure" IDs in the schema, and have the translation logic and mappings live outside.

(Another thought: defaultMessage is a convention of react-intl, but there is no saying what package or languages will be using these Listing schemas and doing the job of rendering them with translations. Could be a different react package, could be Vue, could be Python, could be Perl...)

@jordajm
Copy link
Contributor Author

jordajm commented Jun 8, 2018

Ok, fair enough. I'll split out the mapping of id with defaultMessage in separate files... seems to me that a sub-directory inside the schemas directory would make sense, with a mapping file that corresponds to each schema.

So it would be structured something like:

schemas
    |_ announcements.json
    |_ for-sale.json
    |_ housing.json
    |_ schemaMessages
        |_ announcements.js
        |_ for-sale.js
        |_ housing.js

Sound good?

@wanderingstan
Copy link
Contributor

Yes, LGTM!

@jordajm jordajm changed the title Localization for Schema Enums [Discussion] Localization for Schemas Jun 10, 2018
@jordajm
Copy link
Contributor Author

jordajm commented Jun 10, 2018

Created #258 to track the work that was discussed here

@jordajm jordajm closed this as completed Jun 10, 2018
@wanderingstan
Copy link
Contributor

wanderingstan commented Jun 10, 2018

@joshfraser : would love a lookover of our localization strategy here as our listing schemas will be one of the biggest "interfaces" to Origin for the developer community.

Essentially, I steered @jordajm toward completely exorcising English from our schemas, leaving only ID's whose translations are defined elsewhere. E.g. instead of having "Airplane Rentals" in the json, instead we'd have "schema.rentals.airplane", and the translation of this abstract id into human languages is defined (in the react world) via defineMessages(). In the future, other languages/frameworks (python, vue) can use other localization packages to effect similar transformation of ids into human language.

@joshfraser
Copy link
Contributor

@wanderingstan you have more experience dealing with localization issues than I do. This seems reasonable and I trust you if you think this is the way to go.

One question - Is there a reason we can't use "Airplane Rentals" as the key? A developer shouldn't have to think about localization stuff at all to get their custom "hello world" marketplace up and running. From my limited experience with localization, it seems pretty common to just use the English string as the lookup key instead of introducing a separate lookup key.

@wanderingstan
Copy link
Contributor

@joshfraser - More research shows that even the JSON schema community is unsure on this! The last three comments on the localization thread (from Dec 2016!) are from JSON schema contributors who conclude that the standard is not clear:
json-schema-org/json-schema-spec#53 (comment)

@jordajm 's implementation of english-less ID's is working right now, so I say we go with it for now. But I agree it might make sense to use english text as the keys in this case to enable the "hello world" marketplace to be as easy as possible.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants