You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
translit-scala is a transliteration library for Scala and Scala.js. It implements transliteration rules for Slavic languages. It supports converting texts from the Latin to the Cyrillic alphabet.
5
+
translit-scala is a transliteration library for Scala and Scala.js. It implements transliteration rules for Slavic languages. It supports converting texts from the Latin to the Cyrillic alphabet and vice-versa.
6
6
7
7
## Compatibility
8
8
| Back end | Scala versions |
@@ -52,7 +52,7 @@ We decompose letters in their Latin transliteration more consistently than Natio
52
52
* Volodymyr (Володимир)
53
53
* blyz'ko (близько)
54
54
55
-
The Latin letter *y*is also the phonetic basis of four letters in the Slavic alphabet: я, є, ї, ю. They get transliterated accordingly:
55
+
The Latin letter *y*forms the phonetic basis of four letters (iotated vowels) in the Ukrainian alphabet: я, є, ї, ю. They get transliterated accordingly:
56
56
57
57
* ya → я
58
58
* ye → є
@@ -63,20 +63,23 @@ Unlike National 2010, we always use the same transliteration regardless of the p
63
63
64
64
The accented counterpart of и is й and is represented by a separate letter, *j*.
65
65
66
-
*Example:*Zhurs'kyj (Згурський)
66
+
*Example:*Zgurs'kyj (Згурський)
67
67
68
68
#### Soft Signs and Apostrophes
69
69
The second change to National 2010 is that we try to restore soft signs and apostrophes:
In National 2010, *g* gets mapped to *ґ* which is phonetically accurate, though the letter *ґ* is fairly uncommon in Ukrainian. Therefore, we represent *ґ* by the bi-gram *g'*.
75
+
74
76
This feature is experimental and can be disabled by setting `apostrophes` to `false`.
75
77
76
78
#### Convenience mappings
77
79
Another modification was to provide the following mappings:
78
80
79
81
* c → ц
82
+
* h → х
80
83
* q → щ
81
84
* w → ш
82
85
* x → ж
@@ -91,9 +94,8 @@ Note that these mappings are phonetically inaccurate. However, using them still
91
94
* Another advantage is the proximity on the English keyboard layout:
92
95
**q* and *w* are located next to each other; *ш* and *щ* characters are phonetically close
93
96
**z* and *x* are located next to each other; *з* and *ж* characters are phonetically close
94
-
95
-
#### Precedence
96
-
The replacement patterns are applied sequentially by traversing the input character-by-character. In some cases, a rule spanning multiple characters should not be applied. An example is the word: схильність. The transliteration of *сх* corresponds to two separate letters *s* and *h*, which would map to *ш*. To prevent this, one can place a vertical bar between the two characters. The full transliteration then looks as follows: *s|hyl'nist*
97
+
**h* is mapped to *х* since it is a common letter, *kh* is only needed in case *h* is ambiguous
98
+
* An example is the word: схильність. The transliteration of *сх* corresponds to two separate letters *s* and *h*, which would map to *ш*. To prevent this, one can use the bi-gram *kh* instead to represent *х*. The full transliteration then looks as follows: *skhyl'nist*
97
99
98
100
## Russian
99
101
The Russian rules are similar to the Ukrainian ones.
@@ -103,13 +105,7 @@ Some differences are:
103
105
**i* corresponds to *и*, whereas *y* to *ы*
104
106
* Russian distinguishes between soft and hard signs. It does not have apostrophes. The following mappings are used:
105
107
* Soft sign: *'* for ь
106
-
* Hard sign: *"* for ъ
107
-
108
-
### Precedence
109
-
As with the Ukrainian rules, a vertical bar can be placed to avoid certain rules from being applied.
110
-
111
-
* красивые: krasivy|e
112
-
* сходить: s|hodit
108
+
* Hard sign: *`* for ъ
113
109
114
110
### Mapping
115
111
| Latin | Cyrillic |
@@ -121,7 +117,7 @@ As with the Ukrainian rules, a vertical bar can be placed to avoid certain rules
121
117
| e | е |
122
118
| f | ф |
123
119
| g | г |
124
-
| h| х |
120
+
| h, kh| х |
125
121
| i | и |
126
122
| j | й |
127
123
| k | к |
@@ -141,7 +137,7 @@ As with the Ukrainian rules, a vertical bar can be placed to avoid certain rules
141
137
| y | ы |
142
138
| z | з |
143
139
| ' | ь |
144
-
|" | ъ |
140
+
|\`| ъ |
145
141
| ch | ч |
146
142
| sh | ш |
147
143
| ya | я |
@@ -151,15 +147,30 @@ As with the Ukrainian rules, a vertical bar can be placed to avoid certain rules
151
147
| yu | ю |
152
148
| shch | щ |
153
149
154
-
#### Examples
155
-
| Russian | Transliterated |
156
-
|---------|----------------|
157
-
| Привет | Privet |
158
-
| Съел | S"el |
159
-
| Щётка | Shchyotka |
160
-
| Льдина | L'dina |
150
+
### Examples
151
+
| Russian | Transliterated |
152
+
|----------|----------------|
153
+
| Привет | Privet |
154
+
| Съел | S\`el |
155
+
| Щётка | Shchyotka |
156
+
| Льдина | L'dina |
157
+
| красивые | krasivye |
158
+
| сходить | skhodit' |
159
+
160
+
## Internals
161
+
The replacement patterns are applied sequentially by traversing the input character-by-character. The functions `latinToCyrillicIncremental` and `cyrillicToLatinIncremental` take the left context which is needed by some rules, for example to determine the correct case of soft/hard signs. The result of the functions indicates the number of characters to remove on the right as well as their string replacement.
162
+
163
+
```scala
164
+
deflatinToCyrillicIncremental(
165
+
latin: String, cyrillic: String, append: Char
166
+
): (Int, String)
167
+
168
+
defcyrillicToLatinIncremental(
169
+
cyrillic: String, letter: Char
170
+
): (Int, String)
171
+
```
161
172
162
-
###Credits
173
+
## Credits
163
174
The rules and examples were adapted from the following libraries:
0 commit comments