You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
translit-scala is a transliteration library for Scala and Scala.js. It implements transliteration rules for Slavic languages. It supports converting texts from the Latin to the Cyrillic alphabet.
5
+
translit-scala is a transliteration library for Scala and Scala.js. It implements transliteration rules for Slavic languages. It supports converting texts from the Latin to the Cyrillic alphabet and vice-versa.
6
6
7
7
## Compatibility
8
8
| Back end | Scala versions |
@@ -52,7 +52,7 @@ We decompose letters in their Latin transliteration more consistently than Natio
52
52
* Volodymyr (Володимир)
53
53
* blyz'ko (близько)
54
54
55
-
The Latin letter *y* is also the phonetic basis of four letters in the Slavic alphabet: я, є, ї, ю. They get transliterated accordingly:
55
+
The Latin letter *y* is also the phonetic basis of four letters (iotated vowels) in the Ukrainian alphabet: я, є, ї, ю. They get transliterated accordingly:
56
56
57
57
* ya → я
58
58
* ye → є
@@ -63,20 +63,23 @@ Unlike National 2010, we always use the same transliteration regardless of the p
63
63
64
64
The accented counterpart of и is й and is represented by a separate letter, *j*.
65
65
66
-
*Example:*Zhurs'kyj (Згурський)
66
+
*Example:*Zgurs'kyj (Згурський)
67
67
68
68
#### Soft Signs and Apostrophes
69
69
The second change to National 2010 is that we try to restore soft signs and apostrophes:
In National 2010, *g* is mapped to *ґ* which is phonetically accurate, though the letter is fairly uncommon in Ukrainian. Therefore, *ґ* is represented by *g'*.
75
+
74
76
This feature is experimental and can be disabled by setting `apostrophes` to `false`.
75
77
76
78
#### Convenience mappings
77
79
Another modification was to provide the following mappings:
78
80
79
81
* c → ц
82
+
* h → х
80
83
* q → щ
81
84
* w → ш
82
85
* x → ж
@@ -91,9 +94,7 @@ Note that these mappings are phonetically inaccurate. However, using them still
91
94
* Another advantage is the proximity on the English keyboard layout:
92
95
**q* and *w* are located next to each other; *ш* and *щ* characters are phonetically close
93
96
**z* and *x* are located next to each other; *з* and *ж* characters are phonetically close
94
-
95
-
#### Precedence
96
-
The replacement patterns are applied sequentially by traversing the input character-by-character. In some cases, a rule spanning multiple characters should not be applied. An example is the word: схильність. The transliteration of *сх* corresponds to two separate letters *s* and *h*, which would map to *ш*. To prevent this, one can place a vertical bar between the two characters. The full transliteration then looks as follows: *s|hyl'nist*
97
+
**h* is mapped to *х* since it is a common letter, *kh* is only needed in case *h* is ambiguous
97
98
98
99
## Russian
99
100
The Russian rules are similar to the Ukrainian ones.
@@ -105,12 +106,6 @@ Some differences are:
105
106
* Soft sign: *'* for ь
106
107
* Hard sign: *`* for ъ
107
108
108
-
### Precedence
109
-
As with the Ukrainian rules, a vertical bar can be placed to avoid certain rules from being applied.
110
-
111
-
* красивые: krasivy|e
112
-
* сходить: s|hodit
113
-
114
109
### Mapping
115
110
| Latin | Cyrillic |
116
111
|-------|----------|
@@ -141,7 +136,7 @@ As with the Ukrainian rules, a vertical bar can be placed to avoid certain rules
141
136
| y | ы |
142
137
| z | з |
143
138
| ' | ь |
144
-
|" | ъ |
139
+
|\`| ъ |
145
140
| ch | ч |
146
141
| sh | ш |
147
142
| ya | я |
@@ -151,15 +146,20 @@ As with the Ukrainian rules, a vertical bar can be placed to avoid certain rules
151
146
| yu | ю |
152
147
| shch | щ |
153
148
154
-
#### Examples
155
-
| Russian | Transliterated |
156
-
|---------|----------------|
157
-
| Привет | Privet |
158
-
| Съел | S"el |
159
-
| Щётка | Shchyotka |
160
-
| Льдина | L'dina |
149
+
### Examples
150
+
| Russian | Transliterated |
151
+
|----------|----------------|
152
+
| Привет | Privet |
153
+
| Съел | S\`el |
154
+
| Щётка | Shchyotka |
155
+
| Льдина | L'dina |
156
+
| красивые | krasivye |
157
+
| сходить | skhodit' |
158
+
159
+
## Internals
160
+
The replacement patterns are applied sequentially by traversing the input character-by-character. The functions `latinToCyrillicIncremental` and `cyrillicToLatinIncremental` take the left context which is needed for some rules. The result indicates the number of characters to remove and a replacement string.
161
161
162
-
###Credits
162
+
## Credits
163
163
The rules and examples were adapted from the following libraries:
0 commit comments