You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the process of implementing the reverse direction, several limitations in the rules were addressed. The adapted rules were tested on a small Wikipedia text corpus and a decrease in the error rate was witnessed.
Notable changes include the removal of the vertical bar (|) for precedence. Instead, the bi-gram "kh" was introduced. "ъ" is now mapped onto ` instead of " since double quotes are commonly used in Slavic texts. The rules now also handle different cases of capital letters.
Closes#2.
translit-scala is a transliteration library for Scala and Scala.js. It implements transliteration rules for Slavic languages. It supports converting texts from the Latin to the Cyrillic alphabet.
5
+
translit-scala is a transliteration library for Scala and Scala.js. It implements transliteration rules for Slavic languages. It supports converting texts from the Latin to the Cyrillic alphabet and vice-versa.
6
6
7
7
## Compatibility
8
8
| Back end | Scala versions |
@@ -52,7 +52,7 @@ We decompose letters in their Latin transliteration more consistently than Natio
52
52
* Volodymyr (Володимир)
53
53
* blyz'ko (близько)
54
54
55
-
The Latin letter *y* is also the phonetic basis of four letters in the Slavic alphabet: я, є, ї, ю. They get transliterated accordingly:
55
+
The Latin letter *y* is also the phonetic basis of four letters (iotated vowels) in the Ukrainian alphabet: я, є, ї, ю. They get transliterated accordingly:
56
56
57
57
* ya → я
58
58
* ye → є
@@ -63,20 +63,23 @@ Unlike National 2010, we always use the same transliteration regardless of the p
63
63
64
64
The accented counterpart of и is й and is represented by a separate letter, *j*.
65
65
66
-
*Example:*Zhurs'kyj (Згурський)
66
+
*Example:*Zgurs'kyj (Згурський)
67
67
68
68
#### Soft Signs and Apostrophes
69
69
The second change to National 2010 is that we try to restore soft signs and apostrophes:
In National 2010, *g* is mapped to *ґ* which is phonetically accurate, though the letter is fairly uncommon in Ukrainian. Therefore, *ґ* is represented by *g'*.
75
+
74
76
This feature is experimental and can be disabled by setting `apostrophes` to `false`.
75
77
76
78
#### Convenience mappings
77
79
Another modification was to provide the following mappings:
78
80
79
81
* c → ц
82
+
* h → х
80
83
* q → щ
81
84
* w → ш
82
85
* x → ж
@@ -91,9 +94,7 @@ Note that these mappings are phonetically inaccurate. However, using them still
91
94
* Another advantage is the proximity on the English keyboard layout:
92
95
**q* and *w* are located next to each other; *ш* and *щ* characters are phonetically close
93
96
**z* and *x* are located next to each other; *з* and *ж* characters are phonetically close
94
-
95
-
#### Precedence
96
-
The replacement patterns are applied sequentially by traversing the input character-by-character. In some cases, a rule spanning multiple characters should not be applied. An example is the word: схильність. The transliteration of *сх* corresponds to two separate letters *s* and *h*, which would map to *ш*. To prevent this, one can place a vertical bar between the two characters. The full transliteration then looks as follows: *s|hyl'nist*
97
+
**h* is mapped to *х* since it is a common letter, *kh* is only needed in case *h* is ambiguous
97
98
98
99
## Russian
99
100
The Russian rules are similar to the Ukrainian ones.
@@ -103,13 +104,7 @@ Some differences are:
103
104
**i* corresponds to *и*, whereas *y* to *ы*
104
105
* Russian distinguishes between soft and hard signs. It does not have apostrophes. The following mappings are used:
105
106
* Soft sign: *'* for ь
106
-
* Hard sign: *"* for ъ
107
-
108
-
### Precedence
109
-
As with the Ukrainian rules, a vertical bar can be placed to avoid certain rules from being applied.
110
-
111
-
* красивые: krasivy|e
112
-
* сходить: s|hodit
107
+
* Hard sign: *`* for ъ
113
108
114
109
### Mapping
115
110
| Latin | Cyrillic |
@@ -121,7 +116,7 @@ As with the Ukrainian rules, a vertical bar can be placed to avoid certain rules
121
116
| e | е |
122
117
| f | ф |
123
118
| g | г |
124
-
| h| х |
119
+
| h, kh| х |
125
120
| i | и |
126
121
| j | й |
127
122
| k | к |
@@ -141,7 +136,7 @@ As with the Ukrainian rules, a vertical bar can be placed to avoid certain rules
141
136
| y | ы |
142
137
| z | з |
143
138
| ' | ь |
144
-
|" | ъ |
139
+
|\`| ъ |
145
140
| ch | ч |
146
141
| sh | ш |
147
142
| ya | я |
@@ -151,15 +146,20 @@ As with the Ukrainian rules, a vertical bar can be placed to avoid certain rules
151
146
| yu | ю |
152
147
| shch | щ |
153
148
154
-
#### Examples
155
-
| Russian | Transliterated |
156
-
|---------|----------------|
157
-
| Привет | Privet |
158
-
| Съел | S"el |
159
-
| Щётка | Shchyotka |
160
-
| Льдина | L'dina |
149
+
### Examples
150
+
| Russian | Transliterated |
151
+
|----------|----------------|
152
+
| Привет | Privet |
153
+
| Съел | S\`el |
154
+
| Щётка | Shchyotka |
155
+
| Льдина | L'dina |
156
+
| красивые | krasivye |
157
+
| сходить | skhodit' |
158
+
159
+
## Internals
160
+
The replacement patterns are applied sequentially by traversing the input character-by-character. The functions `latinToCyrillicIncremental` and `cyrillicToLatinIncremental` take the left context which is needed for some rules. The result indicates the number of characters to remove and a replacement string.
161
161
162
-
###Credits
162
+
## Credits
163
163
The rules and examples were adapted from the following libraries:
0 commit comments