Wordcount in tagged SGML files
ناشر الموضوع: OTMed (X)
OTMed (X)
OTMed (X)
بولندا
Local time: 07:55
أنجليزي إلى بولندي
+ ...
Jan 10, 2004

Hello everyone,
Having finished translation of a 60 000 K+ chars SGML tagged text I need to calculate the wordcount for our linguistic editor.
She has worked with bilingual Trados segmented rtf doc only and now I need to count 'words + spaces', a standard count we use for linguistic editors.
Theoretically, extracting a green text style done in Trados to a new doc and counting the chars+spaces should do the trick. The problem is it does not. The green text is marked with variou
... See more
Hello everyone,
Having finished translation of a 60 000 K+ chars SGML tagged text I need to calculate the wordcount for our linguistic editor.
She has worked with bilingual Trados segmented rtf doc only and now I need to count 'words + spaces', a standard count we use for linguistic editors.
Theoretically, extracting a green text style done in Trados to a new doc and counting the chars+spaces should do the trick. The problem is it does not. The green text is marked with various styles (e.g. 'green', 'normal') and what is worse, the styles overlap with int. and ext. taggs.
Have you come across a similar problem? Does anybody know a tool that would reliably extract translated text or tags only?
Any input will be sincerely and wholeheartedly appreciated
Collapse


 
Hynek Palatin
Hynek Palatin  Identity Verified
جمهورية التشيك
Local time: 07:55
أنجليزي إلى تشيكي
+ ...
Wordcount Jan 10, 2004

How about just removing all the tags (search for certain style, replace with an empty string) and counting the rest, which should be only the translated text? And keeping the source hidden, of course. And doing it with a temporary copy of the document.

 
OTMed (X)
OTMed (X)
بولندا
Local time: 07:55
أنجليزي إلى بولندي
+ ...
بادئ الموضوع
Problem solved Jan 10, 2004

Hynek Palatin wrote:

How about just removing all the tags (search for certain style, replace with an empty string) and counting the rest


This is exactly what I have tried originally, coming to a conclusion that styles overlap, making this method unusable.

What did not overlap though was font colors.
So I have used ol'good Search and replace leaving only green-coloured Trados made text. Perhaps not most elegant of options, but it did work for me (and the editor).
Cheers, Greg


 
Jerzy Czopik
Jerzy Czopik  Identity Verified
ألمانيا
Local time: 07:55
عضو (2003)
بولندي إلى ألماني
+ ...
Why not use the Trados analysis? Jan 11, 2004

You can analyse the file prior to translation and get the word count for source language.
After the translation, you can clean up a copy of your file and analyse it against an empty Trados TM with reversed languages (if the translation was ie EN-PL with the PL-EN TM). Then you´ll get the number of words in your translated file without too much work.
This should do the trick for any format, compatible with trados.

Kind regards
Jerzy


 
Oksana Kornitskaja
Oksana Kornitskaja  Identity Verified
Local time: 08:55
ألماني إلى اوكراني
+ ...
Try PractiCount Jan 11, 2004

You can try PractiCount - http://www.practiline.com. The new beta version (2.4) supports XML and SGML files.

Best regards,
Oksana Kornitskaya


 
OTMed (X)
OTMed (X)
بولندا
Local time: 07:55
أنجليزي إلى بولندي
+ ...
بادئ الموضوع
This option wouldn't work Jan 12, 2004

Jerzy Czopik wrote:
...you´ll get the number of words in your translated file without too much work..


As I said, the problem was I needed characters+spaces count for a lang. editor.


 
Jerzy Czopik
Jerzy Czopik  Identity Verified
ألمانيا
Local time: 07:55
عضو (2003)
بولندي إلى ألماني
+ ...
But Trados delivers that data too... Jan 12, 2004

After Trados anylysisi you get number of words, number of characters (without spaces AFAIK) and average number of characters per word.
This is easy to prove: multiply the number of words with the average number of characters per word. The result is nearly the same as the given number of all characters in text, what means, characters are given without spaces.
If you now add the number of characters and the number of word, you will get the number of characters INCLUNDING spaces, as usu
... See more
After Trados anylysisi you get number of words, number of characters (without spaces AFAIK) and average number of characters per word.
This is easy to prove: multiply the number of words with the average number of characters per word. The result is nearly the same as the given number of all characters in text, what means, characters are given without spaces.
If you now add the number of characters and the number of word, you will get the number of characters INCLUNDING spaces, as usually any word is followed by a space.
Thats all - so why bother with other methods, if you have all the results allready?

Kind redards
Jerzy
Collapse


 


To report site rules violations or get help, contact a site moderator:

مشرفو هذا المنتدى
Laureana Pavon[Call to this topic]

You can also contact site staff by submitting a support request »

Wordcount in tagged SGML files






Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »