why the glossary can not be used properly
ناشر الموضوع: frankleng
frankleng
frankleng
الصين
Local time: 03:51
أنجليزي إلى صيني
+ ...
Mar 24, 2010

i learnt how to create a glossary for OmegaT.
However,the glossary from english to chinese can be identified by OmegaT.
But,the glossary from chinese to english can not, why? Any settings to be made in OmegaT??
Anyone can tell me about this,pls? Thank you very much.


 
Didier Briel
Didier Briel  Identity Verified
فرنسا
Local time: 21:51
أنجليزي إلى فرنسي
+ ...
Chinese requires the use of tokenizers Mar 25, 2010

frankleng wrote:
However,the glossary from english to chinese can be identified by OmegaT.
But,the glossary from chinese to english can not, why? Any settings to be made in OmegaT??
Anyone can tell me about this,pls? Thank you very much.


By default, OmegaT uses Sun's tokenizer to identify words. While it works for a lot of languages, it doesn't work for Chinese, which means only isolated words (not in a sentence) can be found.

To improve the result, it is possible to install a specific tokenizer.
I recommend reading Marc Prior's Howto on OmegaT tokenizers.

Once you have the plugin installed, you can use the
org.omegat.plugins.tokenizer.LuceneChineseTokenizer
tokenizer.

This should improve word recognition.

If you need further help, I recommend subscribing to the OmegaT Yahoo support group.

Didier


 
frankleng
frankleng
الصين
Local time: 03:51
أنجليزي إلى صيني
+ ...
بادئ الموضوع
thanks Mar 25, 2010

thank you very much.

 
frankleng
frankleng
الصين
Local time: 03:51
أنجليزي إلى صيني
+ ...
بادئ الموضوع
you are right,but new problem occurs Mar 25, 2010

Didier Briel wrote:
--------------------
By default, OmegaT uses Sun's tokenizer to identify words. While it works for a lot of languages, it doesn't work for Chinese, which means only isolated words (not in a sentence) can be found.

To improve the result, it is possible to install a specific tokenizer.
I recommend reading Marc Prior's Howto on OmegaT tokenizers.

Once you have the plugin installed, you can use the
org.omegat.plugins.tokenizer.LuceneChineseTokenizer
tokenizer.

This should improve word recognition.

If you need further help, I recommend subscribing to the OmegaT Yahoo support group.

Didier


Your method works. However, the menu and context can not be shown properly. Nonsense characters are shown in chinese.
Do you know how OmegaT use fonts and how to change it to have it shown properly,pls? Or do something.


 
frankleng
frankleng
الصين
Local time: 03:51
أنجليزي إلى صيني
+ ...
بادئ الموضوع
thanks, solved, Mar 25, 2010

Thanks, the problem is solved now.
And, here, i'd like to record it so that the next green hand can know how to deal with it.

I run the program OmegaT-tokenizers.sh under terminal. (just change the property to make it executable),doulbe click and run.
Then, i saw this under terminal:
------------------------------
18152: Info: OmegaT-2.0.5_2 (Fri Mar 26 07:11:56 CST 2010) Locale zh_CN
18152: Info: Java: Sun Microsystems Inc. ver. 1.6.0_10, executed f
... See more
Thanks, the problem is solved now.
And, here, i'd like to record it so that the next green hand can know how to deal with it.

I run the program OmegaT-tokenizers.sh under terminal. (just change the property to make it executable),doulbe click and run.
Then, i saw this under terminal:
------------------------------
18152: Info: OmegaT-2.0.5_2 (Fri Mar 26 07:11:56 CST 2010) Locale zh_CN
18152: Info: Java: Sun Microsystems Inc. ver. 1.6.0_10, executed from '/usr/lib/jvm/java-6-sun-1.6.0.10/jre' (LOG_STARTUP_INFO)
18152: Info: Docking Framework version: 2.1.4
18152: Info: Hunspell loaded successfully from /home/frank/OmegaT/./native/libhunspell-i386.so
18152: Info: Event: application startup (LOG_INFO_EVENT_APPLICATION_STARTUP)
--------------------------------------------

I found that the Java program OmegaT-Tokenizer use is located in the system directory /usr/lib/jvm/java-6-sun-1.6.0.10/jre, but not OmegaT//lib/jre.
So,I created a folder called "fallback" under /usr/lib/jvm/java-6-sun-1.6.0.10/jre/fonts/, and copied a font file called uming.ttc into it.

Restart OmegaT-tokenizers.sh, it works. Chinese are recognized and the glossary too.

Thank you,Didier. Your idea very helpful.


[修改时间: 2010-03-25 23:51 GMT]
Collapse


 
Didier Briel
Didier Briel  Identity Verified
فرنسا
Local time: 21:51
أنجليزي إلى فرنسي
+ ...
Maybe a font issue Mar 26, 2010

frankleng wrote:
Your method works. However, the menu and context can not be shown properly. Nonsense characters are shown in chinese.

What do you call "context"? Is it the Fuzzy Matches pane?

Is this situation a result of using the tokenizer, or was it the case before?

Do you know how OmegaT use fonts and how to change it to have it shown properly,pls? Or do something.

If you are speaking of the font for the "content" (i.e., Editor, Fuzzy Matches, etc.), it can be selected in Options/Fonts...

You must select a font compatible both with your source and your target language.

For the menu, the font is selected automatically according to the user interface language used, and should always be able to display the required characters.

I have no problem here, either with zh_CN or zh_TW, with or without the tokenizer.

Again, for more detailed answers and explanations, you should go to the Yahoo support group, where you are more likely to find other users with similar configurations.

Didier


 


لم يتم تعيين مشرف خاص بهذا المنتدى
للإبلاغ عن انتهاكات لقواعد الموقع أو الحصول على مساعدة، يرجى الاتصال بـ العاملين في الموقع »


why the glossary can not be used properly






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »