Swedish language resources for OKboard (dictionary & prediction data).
Created from balanced data resources like newspapers, novels, blogs, discussion forums and comments.
Attachment | Size | Date |
---|---|---|
okb-lang-sv-0.1-1.noarch.rpm | 2.37 MB | 02/01/2016 - 19:21 |
okb-lang-sv-0.2-1.noarch.rpm | 4.58 MB | 29/01/2016 - 17:47 |
okb-lang-sv-0.2-2.noarch.rpm | 4.58 MB | 04/02/2016 - 16:26 |
okb-lang-sv-0.2-3.noarch.rpm | 4.53 MB | 11/01/2017 - 19:52 |
- Updated db.version to 16 to match the one engine v0.6-1 uses.
Comments
cizi
Wed, 2018/01/03 - 22:21
Permalink
Hello there, I don't want to bother you. I'm trying to build language files for my mother language (Czech). I followed the instructions but I'm not sure how should looks like the dictionary files (corpus). Can you please give me a hint or example of you language files? Will be really appreciated :-).
Regards
Jan
ellefj
Tue, 2018/03/13 - 17:17
Permalink
@cizi, like i wrote in the description I used "newspaper, novels, blogs, discussion forum and comments data resources". They were all converted to text-files (if they contained markup) and stripped from some contents that the ngram-db-creation-tool complained about.
We don't have that much Czech resources here at the Swedish language bank, but at your national library and perhaps LINDAT you can find some useful corpora. There are some hints on other materials in OKB-Engine-README :
"Corpus files should include different chat style. E.g. recommendation is to use formal speech (newletters, wikipedia ...) and informal style (e-mails, IRC and chat logs, movies subtitles). As they are plain text file you can just concatenate them before bzip2 compression."
smatkovi
Tue, 2018/03/13 - 15:24
Permalink
i didn't even find the instructions
ellefj
Tue, 2018/03/13 - 17:08
Permalink
@smatkovi OK. The instructions are here: OKB-Engine-README
smatkovi
Tue, 2018/03/13 - 15:30
Permalink
now i found them in the engine part