Wordlist and Spell checking for Amharic and TigrignaBiniam Gebremichael |
|
|
Corpus buildingTo help Geez Natural Language Processing (NLP) developers, I have created a web crawler that collects Amharic and Tigrigna texts from the Internet. I wordlist is generated for both languages sorted by the number of occurances, as shown below. This Geez Crawler software is similar to Kevin Scannell's Crubadan Corpus builder, except that the former is specific to Geez languages. If you want to know more about web crawling, read Kevin's site.
The word-lists is updated periodically, and it is free to download and use for research purpose. You will need a software to unzip the files and unicode font to properly display it.
|