Chinese Dictionaries for OmegaT

by Weedy Tan on January 25, 2014

When I first started using OmegaT, I couldn’t figure out how to find and install a suitable Chinese dictionary. I didn’t care so much as I can use online Chinese <> English dictionaries while still learning OmegaT. However, after reading some old posts and discussions in the OmegaT Yahoo support group, I decided to research on this and find a way to install the Chinese dictionaries.

There are, in fact, many resources when it comes to available Chinese <> English dictionaries in both Traditional and Simplified Chinese. However, as a novice OmegaT user, I couldn’t understand the differences amongst those numerous dictionaries. Based on the OmegaT manual, I need to find a packed or zipped file with *.tar.bz2 file extension name. When unzipped, it should have 3 files with file extension names as follows:

    1. *.dict.dz
    2. *.idx
    3. *.ifo

And all the above should have the same file name.

One of the best websites where I found very good dictionaries was at http://abloz.com/huzheng/stardict-dic/ and under it, there were 2 particular resources where I found what I needed.

http://abloz.com/huzheng/stardict-dic/zh_CN/
Here, you can find various Simplified Chinese dictionaries.
Some of the dictionaries (ZH-CN) I have tested and found working are:

    • langdao-ce-gb dictionary(zh_CN – en) 朗道汉英字典 – (stardict-langdao-ce-gb-2.4.2.tar.bz2)
    • the MDBG CC-CEDICT Chinese-English dictionary – ( stardict-mdbg-cc-cedict-2.4.2.tar.bz2)

http://abloz.com/huzheng/stardict-dic/zh_TW/
And here, you can find Traditional Chinese dictionaries.
Here are the tested and working dictionaries for ZH-TW:

    • langdao-ce-big5 dictionary(zh_TW – en) 朗道漢英字典 – (stardict-langdao-ce-big5-2.4.2.tar.bz2)
    • xdict-ce-big5 dictionary(zh_TW – en) – (stardict-xdict-ce-big5-2.4.2.tar.bz2)

After downloading the above “*.tar.bz2” files and unpacking it into the 3 extension files, put it in the subdirectory “dictionary” of your “project” (ex: c:/project name/dictionary – where “project name” is the name you gave to the project when you created a new “Project” in OmegaT).

Now you are all set and ready to go!

You opened your “project”, loaded your “source file”, and you were hoping to see some Chinese to English dictionary words popped-up in the “Dictionary” pane of OmegaT as you try to go from one sentence segment to another.

Surprise! Surprise! No dictionary words came out! Why?

If you are like me, then your “Source Language Tokenizer” in the project “Properties” was set to the default “LuceneSmartChineseTokenizer”.

This particular tokenizer cannot seem to find any dictionary words though it is not entirely true. I will explain in another blog what I found out regarding the different tokenizers’ behavior based on what I had observed.

Meantime, to see the dictionary words, I suggest you change the tokenizer to “LuceneCJKTokenizer”. You do this by going to “Project”, “Properties”, and then choose “LuceneCJKTokenizer” in the “Source Language Tokenizer”. This tokenizer can find 2 Chinese characters that are in your dictionary. Example: 書籍,預約。

LuceneCJKTokenizer
LuceneCJKTokenizer (Click on the above to enlarge)

Unless you have done something terribly wrong <g>, you should see some dictionary words as you go from one segment to another.

If for some reasons you are still having trouble seeing the Chinese dictionary words, feel free to let me know and I’ll try my best to help you.

Enjoy!

Advertisements

4 Comments

  1. Thank you very much for this how-to-do explanation! Really made my day! Best from Germany!
    …..Do you have an idea why the entries from MDBG dictionary dont come up in the order they are in the text, also not alphabetically, also not stroke number? Is there an option to change that? I’m still beginner with Omaga T ….

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s