Trouble with charts in OmegaT
ناشر الموضوع: Nina Halperin

Nina Halperin  Identity Verified
بيرو
Local time: 14:27
إسباني إلى أنجليزي
+ ...
Jul 21

Hello,

I'm completing a practice project in OmegaT 4.3.2, in which I am translating a Word document that I had converted from a PDF using Wordfast Anywhere. Although almost all the formatting was conserved in its entirety, some of the charts came out as images as opposed to editable charts. Consequently, they do not show up at all in OmegaT. In cases like this, do I need to recreate the charts manually in the resulting Word document before uploading it to OmegaT?

Addit
... See more
Hello,

I'm completing a practice project in OmegaT 4.3.2, in which I am translating a Word document that I had converted from a PDF using Wordfast Anywhere. Although almost all the formatting was conserved in its entirety, some of the charts came out as images as opposed to editable charts. Consequently, they do not show up at all in OmegaT. In cases like this, do I need to recreate the charts manually in the resulting Word document before uploading it to OmegaT?

Additionally, in the charts that did come out correctly, OmegaT has segmented by sentence but also by line of each box, so there are several sentences that came out as two or more segments. That is, just say a sentence within a box of the chart goes into three different lines of the box. OmegaT has divided that sentence into three different segments, when it should just be one. Is there a way to fix that without putting in a "quick-fix rule" for every single sentence under the segmentation section, which would be extremely tedious and time-consuming? I did notice that in these charts, some of the boxes were subdivided into a few different boxes, so I merged the sub-boxes together in the source document within OmegaT. I think that may have helped just a little bit, but for the most part it did not fix the problem. Thanks in advance!
Collapse


 

esperantisto  Identity Verified
Local time: 22:27
عضو (2006)
أنجليزي إلى روسي
+ ...
SITE LOCALIZER
Charts Jul 22

Charts, as any other embedded objects that have no translatable content, will be reproduced in target documents as they appear in respective source documents. Obviously, you will have to edit / replace them if anything requires translation.

[Адрэдагавана 2020-07-22 08:07 GMT]


 

Susan Welsh  Identity Verified
الولايات المتحدة
Local time: 15:27
عضو (2008)
روسي إلى أنجليزي
+ ...
OCR quality? Jul 22

I suspect that your problems here and with headers/footers may have to do with the quality of your OCR conversion from PDF to Word. For what it's worth to you, here is my checklist for OCR from PDF via ABBYY Finereader (I made this after struggling with endless problems):
1. Mark footnotes and callouts on hard copy.
2. In Finereader, remove footers.
3. Spellcheck in source language.
4. Scroll through and fix OCR errors.
5. Save as "formatted" or "editable" text (or
... See more
I suspect that your problems here and with headers/footers may have to do with the quality of your OCR conversion from PDF to Word. For what it's worth to you, here is my checklist for OCR from PDF via ABBYY Finereader (I made this after struggling with endless problems):
1. Mark footnotes and callouts on hard copy.
2. In Finereader, remove footers.
3. Spellcheck in source language.
4. Scroll through and fix OCR errors.
5. Save as "formatted" or "editable" text (or both, if there are tables).
6. Proof PDF against printout of source text for missing copy, format problems.
Collapse


 

Samuel Murray  Identity Verified
هولندا
Local time: 21:27
عضو (2006)
أنجليزي إلى أفيقاني
+ ...
@Nina Jul 22

Nina Halperin wrote:
Do I need to recreate the charts manually in the resulting Word document before uploading it to OmegaT?


Yes, OmegaT can only translate editable text. So if your PDF converter fails to convert some parts of the file, you have to edit the converted file in e.g. Microsoft Word and fix (or type in, or recreate) the parts that were not converted properly, before loading it into OmegaT. The same thing is true of other CAT tools.

Additionally, in the charts that did come out correctly, OmegaT has segmented by sentence but also by line of each box, so there are several sentences that came out as two or more segments.


Yes, OmegaT splits text into segments when there is a line break. You have to fix your charts or tables so that there are no line breaks in the middle of sentences or phrases, before loading the file into OmegaT. There may be CAT tools that are smart enough to know that all text inside a table cell should be treated as a single sentence, but I don't know of any.

esperantisto wrote:
Charts, as any other embedded objects that have no translatable content, will be reproduced in target documents as they appear in respective source documents.


While I agree that embedded objects will not be translated (but OmegaT does handle text boxes correctly), I suspect Nina's converted charts are not embedded objects, since Nina's file was converted from PDF using Wordfast Anywhere.


[Edited at 2020-07-22 18:31 GMT]


 

Hans Lenting  Identity Verified
هولندا
عضو (2006)
ألماني إلى هولندي
mQ etc. Jul 22

.



Yes, OmegaT splits text into segments when there is a line break. You have to fix your charts or tables so that there are no line breaks in the middle of sentences or phrases, before loading the file into OmegaT. There may be CAT tools that are smart enough to know that all text inside a table cell should be treated as a single sentence, but I don't know of any .


https://helpcenter.memoq.com/hc/en-us/articles/360010377519-Importing-the-content-of-an-Excel-spreadsheet-on-a-cell-by-cell-basis

Probably Transit too.


 

Nina Halperin  Identity Verified
بيرو
Local time: 14:27
إسباني إلى أنجليزي
+ ...
بادئ الموضوع
A few questions about eliminating line breaks Jul 22

Thank you so much to everyone for your replies. Ok, I see that I would have to recreate any parts of the resulting Word document, including charts, that did not come out as editable text.

In the source document within OmegaT, I clicked on the paragraph sign and then saw that there was a paragraph sign at the end of every line in the chart, so I deleted those and then reloaded the source document. That seems to have fixed the problem! Now the segments are as they should be, with two
... See more
Thank you so much to everyone for your replies. Ok, I see that I would have to recreate any parts of the resulting Word document, including charts, that did not come out as editable text.

In the source document within OmegaT, I clicked on the paragraph sign and then saw that there was a paragraph sign at the end of every line in the chart, so I deleted those and then reloaded the source document. That seems to have fixed the problem! Now the segments are as they should be, with two tags in the space where the line break had been.

I also reopened the original document that I had gotten from Wordfast Anywhere and investigated the charts with the paragraph sign enabled, but in that case there were no paragraph signs at the end of every line, but rather little blue circles with four lines sticking out of them. I could not erase them. I think those little circles indicate the end of a cell, because, like I mentioned, in the original Wordfast document every cell got separated into sub-cells. In fact, it appears that every line within each of the original cells got separated into its own sub-cell. Correct me if I'm wrong, but it seems that it was necessary to first merge all the sub-cells into one larger cell like I did originally and then eliminate the paragraph sign at the end of every line. In the case of a text with a lot of charts, this whole process seems like it could be time-consuming. Is there no work-around?

Hans, I took a look at the link you sent. Is there a comparable option for OmegaT?

Susan, are those things one is able to do without ABBYY Finereader? For example, how do you save a Word document as "formatted" or "editable" text? Are you saying that, if I do that in Word, it will preserve the formatting in OmegaT?

Does anyone have suggestions for a free PDF-to-Word converter that might be better quality than Wordfast Anywhere? I used the latter because I had seen it recommended in a ProZ forum. Like I said, overall it maintained the formatting really well, at least for this document. On the other hand, I tried converting an educational document called an IEP, which has a ton of charts and check boxes, with Wordfast Anywhere and the resulting document was of pretty poor quality. Thanks so much again!
Collapse


 

Samuel Murray  Identity Verified
هولندا
Local time: 21:27
عضو (2006)
أنجليزي إلى أفيقاني
+ ...
@Nina Jul 23

Nina Halperin wrote:
It seems that it was necessary to first merge all the sub-cells into one larger cell like I did originally and then eliminate the paragraph sign at the end of every line. In the case of a text with a lot of charts, this whole process seems like it could be time-consuming. Is there no work-around?


No, you have to either fix the tables that were created by the OCR program, or you can have recreate the tables from scratch and then copy/type the text into the newly created tables.

Hans, I took a look at the link you sent. Is there a comparable option for OmegaT?


AFAICT, Hans' link relates to Excel only (not Word). No, there is no such option in OmegaT.

How do you save a Word document as "formatted" or "editable" text? Are you saying that, if I do that in Word, it will preserve the formatting in OmegaT?


Susan was referring to tweaking settings in a proper OCR program, such as FineReader. In the OCR program, you can view every page after it was scanned and before it is converted, to tell the program how it should handle each table. You can also choose to convert with *more* formatting retained (which is less good for CAT tools) or with *less* formatting retained (which is better for CAT tools but requires more work to re-format the final file).

Does anyone have suggestions for a free PDF-to-Word converter that might be better quality than Wordfast Anywhere?


All OCR programs struggle with some types of files. There are online PDF-to-DOC converters that offer OCR. Sometimes, your printer comes bundled with a free or demo version of an OCR program.


 

Nina Halperin  Identity Verified
بيرو
Local time: 14:27
إسباني إلى أنجليزي
+ ...
بادئ الموضوع
Is it necessary to merge all the sub-cells into one larger cell? Jul 23

Hi Samuel, thank you so much for your response. I was just wondering if you could confirm this question I posed in my last post: Correct me if I'm wrong, but it seems that it was necessary to first merge all the sub-cells into one larger cell like I did originally and then eliminate the paragraph sign at the end of every line.

 

Samuel Murray  Identity Verified
هولندا
Local time: 21:27
عضو (2006)
أنجليزي إلى أفيقاني
+ ...
@Nina Jul 23

Nina Halperin wrote:
It seems that it was necessary to first merge all the sub-cells into one larger cell like I did originally and then eliminate the paragraph sign at the end of every line.


You can do whatever you want, but the fact is that the OCR system you're using splits sentences in two, and OmeagT can't unsplit them. So, either translate them while they're split (not ideal) or fix them so that they are no longer split. And one way to fix sentences split over multiple cells is to merge the cells.


 

Nina Halperin  Identity Verified
بيرو
Local time: 14:27
إسباني إلى أنجليزي
+ ...
بادئ الموضوع
Thank you Jul 23

Ok perfect, thanks Samuel!

 

Stanislav Okhvat
Local time: 22:27
أنجليزي إلى روسي
Preparing Word documents converted from PDF Jul 24

Hello Nina,

Last year I presented for UTIC Webinars on how to convert PDF to Word with Finereader and prepare the converted document for CAT tools. Specifically, the presentation talks about removing those incorrect paragraph and line breaks in a faster way using TransTools that I developed. Here is the link to the webinar recording and downloadable materials: How
... See more
Hello Nina,

Last year I presented for UTIC Webinars on how to convert PDF to Word with Finereader and prepare the converted document for CAT tools. Specifically, the presentation talks about removing those incorrect paragraph and line breaks in a faster way using TransTools that I developed. Here is the link to the webinar recording and downloadable materials: How to convert PDF to Word format and prepare it for translation properly.

Hope it helps.

Best regards,
Stanislav Okhvat
TransTools – Useful tools for every translator
Collapse


 


لم يتم تعيين مشرف خاص بهذا المنتدى
للإبلاغ عن انتهاكات لقواعد الموقع أو الحصول على مساعدة، يرجى الاتصال بـ العاملين في الموقع »


Trouble with charts in OmegaT

Advanced search






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
SDL Trados Business Manager Lite
Create customer quotes and invoices from within SDL Trados Studio

SDL Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »



Forums
  • All of ProZ.com
  • البحث عن مصطلح
  • عروض العمل
  • منتديات
  • Multiple search