How to segment 15th to 17th century Arabic manuscripts for CAT use
Thread poster: Haytham Abulela
Haytham Abulela
Haytham Abulela  Identity Verified
Canada
Local time: 02:32
Member (2008)
Arabic to English
+ ...
Mar 11, 2019

I have been translating Arabic alchemical manuscripts into English for few years, and wanted to use CAT tools to compile a translation memory and glossary, to use my previous translations in case similar text occurs. Various manuscripts contain quotations or relatively long allegories, which might be repeated in other manuscripts, though not verbatim. The challenge I have is that such manuscripts have little punctuation which makes segmentation hard. If I try to force my own interpretation of se... See more
I have been translating Arabic alchemical manuscripts into English for few years, and wanted to use CAT tools to compile a translation memory and glossary, to use my previous translations in case similar text occurs. Various manuscripts contain quotations or relatively long allegories, which might be repeated in other manuscripts, though not verbatim. The challenge I have is that such manuscripts have little punctuation which makes segmentation hard. If I try to force my own interpretation of segment length, I will have either big chunks of text, or small segments that risk nullifying the benefits of translation memory if segmentation was done differently for similar texts. In addition to this, subject specific glossaries are rare and I have to consult several sources and thesauri to find equivalents, and as such I wish to compile my own glossary to ensure quality and consistency.

You may check this printed book from the Asiatic Society of Bengal which compiles three of Ibn Umail's treatises, which demonstrates how an Arabic manuscript of the same period looks like: pahar.in/mountains/Journals/Asiatic%20Society%20of%20Bengal%201788-1921/Memoirs%20of%20Asiatic%20Society%20of%20Bengal/1933%20Memoirs%20of%20Asiatic%20Society%20of%20Bengal%20Vol%2012%20s.pdf It is a 228 page PDF file, but a quick look at pages 7-17 should be enough.
Collapse


 
Suzanne Chabot
Suzanne Chabot
Local time: 04:32
Arabic to French
Slip segments Sep 16, 2019

Hello

I am translating from Arabic into French and I have the same problems.

The solution I found is to make the i'rab of the sentence in my brain, and when I understand that the 'wa' is only a way to connect two sentences, I split the paragraph at this point, using the 'Split segment' functionality in SDL Trados Studio. I know many CAT tools enable you to do so.

This cannot be done automatically, since the 'wa' in Arabic has many meanings, but also, it is
... See more
Hello

I am translating from Arabic into French and I have the same problems.

The solution I found is to make the i'rab of the sentence in my brain, and when I understand that the 'wa' is only a way to connect two sentences, I split the paragraph at this point, using the 'Split segment' functionality in SDL Trados Studio. I know many CAT tools enable you to do so.

This cannot be done automatically, since the 'wa' in Arabic has many meanings, but also, it is often attached to the first word of the sentence.

Concerning the term recognition, I did not find any other solution than to add each term with all the different 'damîr' attached to it and the broken plural forms and so on... These CAT tools are so bad to recognize the Arabic terms. Hope they will fix this problem one day. The problem become worse when there are 'harakat' attached to a word, which is very frequent when you translate classical Arabic documents.

Wish you all the best. Baraka Allahu fik.
Collapse


 
Haytham Abulela
Haytham Abulela  Identity Verified
Canada
Local time: 02:32
Member (2008)
Arabic to English
+ ...
TOPIC STARTER
Solution found Oct 22, 2020

I have been trying to figure a way out of this dilemma, and fortunately I found a way.

Since I type the manuscripts into an MS Word file to avoid relying on scans that are at times blurry or blotted with stains, the digital text is available for use. I thought about a highlighting method to highlight words in MS Word so I have a visual indication that I have this word in my glossary which is in MS Excel spreadsheet. Designing a macro to highlight words in MS Word is easy, but macro
... See more
I have been trying to figure a way out of this dilemma, and fortunately I found a way.

Since I type the manuscripts into an MS Word file to avoid relying on scans that are at times blurry or blotted with stains, the digital text is available for use. I thought about a highlighting method to highlight words in MS Word so I have a visual indication that I have this word in my glossary which is in MS Excel spreadsheet. Designing a macro to highlight words in MS Word is easy, but macro code does not recognize non-Latin characters. This means that adding Arabic words to a macro is impractical, since editing or updating them in the macro becomes impossible or will require additional steps that will complicate the process beyond practical use. So I reached a conclusion that the only viable option is to have a macro designed that highlights words in MS Word files which calls words from an MS Excel spreadsheet. I posted this task on Freelancer.com and hoped that it will be a possible code. Fortunately I found someone who wrote an MS Excel macro that reads/calls words from within the glossary by means of turning it into a macro enabled spreadsheet, which has a action button. Clicking this button will open an "Open file" dialogue box, where the user chooses the MS Word file to apply the process to, then the process will open the MS Word file and start highlighting until the list of words in MS Excel spreadsheet is finished. After this you have the highlights you want and can now save the MS Word file. This process will require manual changes to remove false positives, since it is better to ignore diacritics, hamzas, final yaa, and matching whole words. As such, a review will be required to remove wrong highlights.

I hope you find this workaround useful.
Collapse


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


How to segment 15th to 17th century Arabic manuscripts for CAT use






Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »