Converting Freedict.org dictionaries for OmegaT
Thread poster: Samuel Murray
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 16:04
Member (2006)
English to Afrikaans
+ ...
Jun 5, 2019

Hello everyone

OmegaT can read two dictionary formats, namely StarDict (only version 2.4.2, with sametypesequence "m" or "g") and DSL. DSL dictionaries can be complex or simple. The dictionaries on Freedict.org are available in three formats, namely the .slob format (for mobile apps), the DICT.org format (an .index file), and a format named .dict.dz. I managed to convert some .index files to a .dsl files that OmegaT can read. I use Windows 7.

You need:
* Python
... See more
Hello everyone

OmegaT can read two dictionary formats, namely StarDict (only version 2.4.2, with sametypesequence "m" or "g") and DSL. DSL dictionaries can be complex or simple. The dictionaries on Freedict.org are available in three formats, namely the .slob format (for mobile apps), the DICT.org format (an .index file), and a format named .dict.dz. I managed to convert some .index files to a .dsl files that OmegaT can read. I use Windows 7.

You need:
* Python 3 installed
* the PyGlossary converter
* a text editor capable of saving UTF8N to UTF16LE
* a text editor capable of find/replace-ing tabs and line breaks

Not all files will convert. This may be due to problems with the PyGlossary converter or with the free dictionary files on Freedict.org. Also, only some types of conversions that PyGlossary appear to offer, will work -- but converting .index to .txt (tab-delimited) works fine in a number of cases.

For the conversion, I used the program PyGlossary. It requires Python 3, and must be started from a command window. I recommend installing Python 3 in an easy-to-type location, e.g. C:\Python3\ instead of the usual "Program Files\Python37-32\" location. To download PyGlossary, visit it’s web site, click the green "Clone/download" button, and select "Download ZIP". Unzip it into a separate folder anywhere (using e.g. 7-zip). Then open a command window in that folder (on my computer, I navigate one folder upwards and use Shift + right-click on the folder to reveal an option called "Open command window here"). In the command window, type C:\Python3\python.exe pyglossary.pyw --ui=tk. This should open the PyGlossary converter.

To download a dictionary from FreeDict.org, visit this URL, scroll down to "Dictionary downloads", and select one of your languages. It will show a list of available dictionaries. For demonstration purposes, let’s choose French-Portuguese. Unzip the file twice (using e.g. 7-zip) until you get an .index file (in this case, named fra-por.index).

In PyGlossary, click the "Read from format" button and select "DICT.org file format (.index)", then click the Browse button and navigate to the downloaded .index file (and select it, obviously). Click the "Write to format" button and select "Tabfile (txt, dic)". PyGlossary should automatically fill in the path. Click the Convert button to start the conversion.

1 read format

2 converted

We now have a file called fra-por.txt, which is a tab-delimited file in UTF8N format. You have to open this file in a text editor that can read it correctly, and save it as a TXT file in UTF16LE format. I use Akelpad, for example.

We then need to do some find/replace-ing to convert this into a DSL file that OmegaT will accept. Replace tabs with a line break, a space, a pipe (or any character you prefer), and a space. Also replace "\n" with a line break, a space, a pipe (or any character you prefer), and a space. Replace "<" and "[" with something else (because OmegaT ignores text between "<>" and "[]").

In Akelpad, these replacements work:

3 replacements

Finally, rename the file so that it has the file extension ".dsl".

This is what it appears like in OmegaT:

4 in omegat

[You'll notice that the headwords are repeated in lowercase. If you want to avoid this, you have to copy/paste the text from the TXT file into e.g. Excel, and then copy/paste the second column back into the TXT file, and then add line breaks between the headwords and parts of speech indicator. However, then it will match fewer words, for some reason that I can't determine.]

Your comments? Do let us know which files did convert, and which didn't.


[Edited at 2019-06-05 09:26 GMT]
Collapse


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Converting Freedict.org dictionaries for OmegaT






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »