Swedish language resources for OKboard

Rating: 
5
Your rating: None Average: 5 (1 vote)

Swedish language resources for OKboard (dictionary & prediction data).
Created from balanced data resources like newspapers, novels, blogs, discussion forums and comments.

Keywords:

Application versions: 
AttachmentSizeDate
File okb-lang-sv-0.1-1.noarch.rpm2.37 MB02/01/2016 - 19:21
File okb-lang-sv-0.2-1.noarch.rpm4.58 MB29/01/2016 - 17:47
File okb-lang-sv-0.2-2.noarch.rpm4.58 MB04/02/2016 - 16:26
File okb-lang-sv-0.2-3.noarch.rpm4.53 MB11/01/2017 - 19:52
Changelog: 

- Updated db.version to 16 to match the one engine v0.6-1 uses.

Comments

cizi's picture

Hello there, I don't want to bother you. I'm trying to build language files for my mother language (Czech). I followed the instructions but I'm not sure how should looks like the dictionary files (corpus). Can you please give me a hint or example of you language files? Will be really appreciated :-). 

Regards 
Jan

ellefj's picture

@cizi, like i wrote in the description I used  "newspaper, novels, blogs, discussion forum and comments data resources". They were all converted to text-files (if they contained markup) and stripped from some contents that the ngram-db-creation-tool complained about.

We don't have that much Czech resources here at the Swedish language bank, but at your national library and perhaps LINDAT you can find some useful corpora. There are some hints on other materials in OKB-Engine-README :

"Corpus files should include different chat style. E.g. recommendation is to use formal speech (newletters, wikipedia ...) and informal style (e-mails, IRC and chat logs, movies subtitles). As they are plain text file you can just concatenate them before bzip2 compression."

smatkovi's picture

i   didn't even find the instructions

ellefj's picture

@smatkovi OK. The instructions are here: OKB-Engine-README

smatkovi's picture

now i  found them in the engine part