-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Data Files
- Special Data Files
- Data Files for Version 4.00
- Data Files for Version 3.04
- Cube Data Files for Version 3.04
- Fraktur Data Files
- Data Files for Version 3.02
- Data Files for Version 2.0x
| Lang Code | Description | 4.0/3.0x traineddata |
|---|---|---|
| osd | Orientation and script detection | osd.traineddata |
| equ | Math / equation detection | equ.traineddata |
Note: These two data files are compatible with older versions of Tesseract. osd is compatible with version 3.01 and up, and equ is compatible with version 3.02 and up.
Note: The kur data file was not updated from 3.04. For Fraktur see the section Fraktur Data Files.
| Lang Code | Language | 4.0 traineddata |
|---|---|---|
| afr | Afrikaans | afr.traineddata |
| amh | Amharic | amh.traineddata |
| ara | Arabic | ara.traineddata |
| asm | Assamese | asm.traineddata |
| aze | Azerbaijani | aze.traineddata |
| aze_cyrl | Azerbaijani - Cyrilic | aze_cyrl.traineddata |
| bel | Belarusian | bel.traineddata |
| ben | Bengali | ben.traineddata |
| bod | Tibetan | bod.traineddata |
| bos | Bosnian | bos.traineddata |
| bul | Bulgarian | bul.traineddata |
| cat | Catalan; Valencian | cat.traineddata |
| ceb | Cebuano | ceb.traineddata |
| ces | Czech | ces.traineddata |
| chi_sim | Chinese - Simplified | chi_sim.traineddata |
| chi_tra | Chinese - Traditional | chi_tra.traineddata |
| chr | Cherokee | chr.traineddata |
| cym | Welsh | cym.traineddata |
| dan | Danish | dan.traineddata |
| deu | German | deu.traineddata |
| dzo | Dzongkha | dzo.traineddata |
| ell | Greek, Modern (1453-) | ell.traineddata |
| eng | English | eng.traineddata |
| enm | English, Middle (1100-1500) | enm.traineddata |
| epo | Esperanto | epo.traineddata |
| est | Estonian | est.traineddata |
| eus | Basque | eus.traineddata |
| fas | Persian | fas.traineddata |
| fin | Finnish | fin.traineddata |
| fra | French | fra.traineddata |
| frk | Frankish | frk.traineddata |
| frm | French, Middle (ca. 1400-1600) | frm.traineddata |
| gle | Irish | gle.traineddata |
| glg | Galician | glg.traineddata |
| grc | Greek, Ancient (-1453) | grc.traineddata |
| guj | Gujarati | guj.traineddata |
| hat | Haitian; Haitian Creole | hat.traineddata |
| heb | Hebrew | heb.traineddata |
| hin | Hindi | hin.traineddata |
| hrv | Croatian | hrv.traineddata |
| hun | Hungarian | hun.traineddata |
| iku | Inuktitut | iku.traineddata |
| ind | Indonesian | ind.traineddata |
| isl | Icelandic | isl.traineddata |
| ita | Italian | ita.traineddata |
| ita_old | Italian - Old | ita_old.traineddata |
| jav | Javanese | jav.traineddata |
| jpn | Japanese | jpn.traineddata |
| kan | Kannada | kan.traineddata |
| kat | Georgian | kat.traineddata |
| kat_old | Georgian - Old | kat_old.traineddata |
| kaz | Kazakh | kaz.traineddata |
| khm | Central Khmer | khm.traineddata |
| kir | Kirghiz; Kyrgyz | kir.traineddata |
| kor | Korean | kor.traineddata |
| kur | Kurdish | kur.traineddata |
| lao | Lao | lao.traineddata |
| lat | Latin | lat.traineddata |
| lav | Latvian | lav.traineddata |
| lit | Lithuanian | lit.traineddata |
| mal | Malayalam | mal.traineddata |
| mar | Marathi | mar.traineddata |
| mkd | Macedonian | mkd.traineddata |
| mlt | Maltese | mlt.traineddata |
| msa | Malay | msa.traineddata |
| mya | Burmese | mya.traineddata |
| nep | Nepali | nep.traineddata |
| nld | Dutch; Flemish | nld.traineddata |
| nor | Norwegian | nor.traineddata |
| ori | Oriya | ori.traineddata |
| pan | Panjabi; Punjabi | pan.traineddata |
| pol | Polish | pol.traineddata |
| por | Portuguese | por.traineddata |
| pus | Pushto; Pashto | pus.traineddata |
| ron | Romanian; Moldavian; Moldovan | ron.traineddata |
| rus | Russian | rus.traineddata |
| san | Sanskrit | san.traineddata |
| sin | Sinhala; Sinhalese | sin.traineddata |
| slk | Slovak | slk.traineddata |
| slv | Slovenian | slv.traineddata |
| spa | Spanish; Castilian | spa.traineddata |
| spa_old | Spanish; Castilian - Old | spa_old.traineddata |
| sqi | Albanian | sqi.traineddata |
| srp | Serbian | srp.traineddata |
| srp_latn | Serbian - Latin | srp_latn.traineddata |
| swa | Swahili | swa.traineddata |
| swe | Swedish | swe.traineddata |
| syr | Syriac | syr.traineddata |
| tam | Tamil | tam.traineddata |
| tel | Telugu | tel.traineddata |
| tgk | Tajik | tgk.traineddata |
| tgl | Tagalog | tgl.traineddata |
| tha | Thai | tha.traineddata |
| tir | Tigrinya | tir.traineddata |
| tur | Turkish | tur.traineddata |
| uig | Uighur; Uyghur | uig.traineddata |
| ukr | Ukrainian | ukr.traineddata |
| urd | Urdu | urd.traineddata |
| uzb | Uzbek | uzb.traineddata |
| uzb_cyrl | Uzbek - Cyrilic | uzb_cyrl.traineddata |
| vie | Vietnamese | vie.traineddata |
| yid | Yiddish | yid.traineddata |
| Lang Code | Language | 3.04 traineddata |
|---|---|---|
| afr | Afrikaans | afr.traineddata |
| amh | Amharic | amh.traineddata |
| ara | Arabic | ara.traineddata |
| asm | Assamese | asm.traineddata |
| aze | Azerbaijani | aze.traineddata |
| aze_cyrl | Azerbaijani - Cyrilic | aze_cyrl.traineddata |
| bel | Belarusian | bel.traineddata |
| ben | Bengali | ben.traineddata |
| bod | Tibetan | [bod.traineddata](https://github.com/tesseract-ocr/tessdata/Data Filesraw/4.00/bod.traineddata) |
| bos | Bosnian | bos.traineddata |
| bul | Bulgarian | bul.traineddata |
| cat | Catalan; Valencian | cat.traineddata |
| ceb | Cebuano | ceb.traineddata |
| ces | Czech | ces.traineddata |
| chi_sim | Chinese - Simplified | chi_sim.traineddata |
| chi_tra | Chinese - Traditional | chi_tra.traineddata |
| chr | Cherokee | chr.traineddata |
| cym | Welsh | cym.traineddata |
| dan | Danish | dan.traineddata |
| deu | German | deu.traineddata |
| dzo | Dzongkha | dzo.traineddata |
| ell | Greek, Modern (1453-) | ell.traineddata |
| eng | English | eng.traineddata |
| enm | English, Middle (1100-1500) | enm.traineddata |
| epo | Esperanto | epo.traineddata |
| est | Estonian | est.traineddata |
| eus | Basque | eus.traineddata |
| fas | Persian | fas.traineddata |
| fin | Finnish | fin.traineddata |
| fra | French | fra.traineddata |
| frk | Frankish | frk.traineddata |
| frm | French, Middle (ca. 1400-1600) | frm.traineddata |
| gle | Irish | gle.traineddata |
| glg | Galician | glg.traineddata |
| grc | Greek, Ancient (-1453) | grc.traineddata |
| guj | Gujarati | guj.traineddata |
| hat | Haitian; Haitian Creole | hat.traineddata |
| heb | Hebrew | heb.traineddata |
| hin | Hindi | hin.traineddata |
| hrv | Croatian | hrv.traineddata |
| hun | Hungarian | hun.traineddata |
| iku | Inuktitut | iku.traineddata |
| ind | Indonesian | ind.traineddata |
| isl | Icelandic | isl.traineddata |
| ita | Italian | ita.traineddata |
| ita_old | Italian - Old | ita_old.traineddata |
| jav | Javanese | jav.traineddata |
| jpn | Japanese | jpn.traineddata |
| kan | Kannada | kan.traineddata |
| kat | Georgian | kat.traineddata |
| kat_old | Georgian - Old | kat_old.traineddata |
| kaz | Kazakh | kaz.traineddata |
| khm | Central Khmer | khm.traineddata |
| kir | Kirghiz; Kyrgyz | kir.traineddata |
| kor | Korean | kor.traineddata |
| kur | Kurdish | kur.traineddata |
| lao | Lao | lao.traineddata |
| lat | Latin | lat.traineddata |
| lav | Latvian | lav.traineddata |
| lit | Lithuanian | lit.traineddata |
| mal | Malayalam | mal.traineddata |
| mar | Marathi | mar.traineddata |
| mkd | Macedonian | mkd.traineddata |
| mlt | Maltese | mlt.traineddata |
| msa | Malay | msa.traineddata |
| mya | Burmese | mya.traineddata |
| nep | Nepali | nep.traineddata |
| nld | Dutch; Flemish | nld.traineddata |
| nor | Norwegian | nor.traineddata |
| ori | Oriya | ori.traineddata |
| pan | Panjabi; Punjabi | pan.traineddata |
| pol | Polish | pol.traineddata |
| por | Portuguese | por.traineddata |
| pus | Pushto; Pashto | pus.traineddata |
| ron | Romanian; Moldavian; Moldovan | ron.traineddata |
| rus | Russian | rus.traineddata |
| san | Sanskrit | san.traineddata |
| sin | Sinhala; Sinhalese | sin.traineddata |
| slk | Slovak | slk.traineddata |
| slv | Slovenian | slv.traineddata |
| spa | Spanish; Castilian | spa.traineddata |
| spa_old | Spanish; Castilian - Old | spa_old.traineddata |
| sqi | Albanian | sqi.traineddata |
| srp | Serbian | srp.traineddata |
| srp_latn | Serbian - Latin | srp_latn.traineddata |
| swa | Swahili | swa.traineddata |
| swe | Swedish | swe.traineddata |
| syr | Syriac | syr.traineddata |
| tam | Tamil | tam.traineddata |
| tel | Telugu | tel.traineddata |
| tgk | Tajik | tgk.traineddata |
| tgl | Tagalog | tgl.traineddata |
| tha | Thai | tha.traineddata |
| tir | Tigrinya | tir.traineddata |
| tur | Turkish | tur.traineddata |
| uig | Uighur; Uyghur | uig.traineddata |
| ukr | Ukrainian | ukr.traineddata |
| urd | Urdu | urd.traineddata |
| uzb | Uzbek | uzb.traineddata |
| uzb_cyrl | Uzbek - Cyrilic | uzb_cyrl.traineddata |
| vie | Vietnamese | vie.traineddata |
| yid | Yiddish | yid.traineddata |
Hindi: hin.cube.bigrams, hin.cube.fold, hin.cube.lm, hin.cube.nn, hin.cube.params, hin.cube.word-freq, hin.tesseract_cube.nn
Arabic: ara.cube.bigrams, ara.cube.fold, ara.cube.lm, ara.cube.nn, ara.cube.params, ara.cube.word-freq, ara.cube.size, ara.tesseract_cube.nn
These data files were prepared by @paalberti for some old versions of Tesseract. dan_frak and deu_frak were prepared for version 3.00, slk_frak was prepared for 3.01. Updates to these files are available at paalberti/tesseract-dan-fraktur.
| Lang Code | Language | 3.0x traineddata |
|---|---|---|
| dan_frak | Danish - Fraktur | dan_frak.traineddata |
| deu_frak | German - Fraktur | deu_frak.traineddata |
| slk_frak | Slovak - Fraktur | slk_frak.traineddata |
| Lang Code | Language | 2.0x traineddata |
|---|---|---|
| deu | German | tesseract-2.00.deu.tar.gz |
| deu-f | _German - Fraktur | tesseract-2.01.deu-f.tar.gz |
| eng | English | tesseract-2.00.eng.tar.gz |
| eus | Basque | tesseract-2.04-eus.tar.gz |
| fra | French | tesseract-2.00.fra.tar.gz |
| ita | Italian | tesseract-2.00.ita.tar.gz |
| nld | Dutch; Flemish | tesseract-2.00.nld.tar.gz |
| por | Portuguese | tesseract-2.01.por.tar.gz |
| spa | Spanish; Castilian | tesseract-2.00.spa.tar.gz |
| vie | Vietnamese | tesseract-2.01.vie.tar.gz |
Old wiki - no longer maintained. The pages were moved, see the new documentation.
These wiki pages are no longer maintained.
All pages were moved to tesseract-ocr/tessdoc.
The latest documentation is available at https://tesseract-ocr.github.io/.