Offline tool to compare two word lists ناشر الموضوع: Hans Lenting
|
I'm looking for an offline tool (script, macro, ...) to compare two word lists, either case-sensitive or case-insensitive, and create a third list, containing all words that are present in both compared lists. Both lists contain exactly one word per line. The higher ASCII range (ä, ß etc.) should be supported. | | | Tony M فرنسا Local time: 19:50 عضو فرنسي إلى أنجليزي + ... مترجم الموقع
Could you do it going via Excel? Somthing like IF (value in Column A) = (value in Column B), THEN Column C = (value in Column A), ELSE [0] Then when you copy back to word, it would be easy enough to sort the table on Column C, and manually remove all the lines where C is empty, finally resorting alphabetically on (say) C if that's important. | | |
I'd look for a diff tool for text files/directories (Meld, Diffuse, Beyond Compare, etc.) or one that is specifically for Excel (ExcelMerge) if you prefer that route. | | | esperantisto Local time: 20:50 عضو (2006) أنجليزي إلى روسي + ... مترجم الموقع
|
|
Samuel Murray هولندا Local time: 19:50 عضو (2006) أنجليزي إلى أفيقاني + ... Try my little glossary comparison scripts (AutoIt) | Oct 26, 2019 |
Hans Lenting wrote: I'm looking for an offline tool (script, macro, ...) to compare two word lists, either case-sensitive or case-insensitive, and create a third list, containing all words that are present in both compared lists. ... Both lists contain exactly one word per line. The higher ASCII range (ä, ß etc.) should be supported. Oh, dear. Well, I may have something that you can use while you search for the perfect solution: http://www.leuce.com/autoit/WFC%20Glossary%20Comparer.zip Each of these two scripts attempts to compare two Wordfast Classic glossaries (which are tab-delimited files). I tried to quickly adapt one of them for comparing word lists that contain only 1 column (i.e. your scenario), but I'm afraid I'm too stoned right now. So, what you need to do, is temporarily replace any existing tabs in your files with a marker, e.g. "|||", and then add a tab to the end of each line (i.e. replace \n with \t\n, or replace CRLF with TAB & CRLF, or whatever), and then use the "compare column 1" script. Also type "NONE" when prompted. The readme file is your friend. The script outputs two additional files, named after the two original files. If an entry occurs in both files, it gets the word [BOTH] added in front of it. If an entry occurs in one file only, then, well, it just remains in that file. Look, I used these scripts during a large translation project but did not develop them beyond the point where they were useful to me at the time. These scripts are SLOW with large files, though.
[Edited at 2019-10-26 13:00 GMT] | | | Jean Lachaud الولايات المتحدة Local time: 13:50 أنجليزي إلى فرنسي + ...
Top off my head: Add the content of one list to the other import/copy into an Excel column Sort the column (if required) ([Data Tab | Sort]) Remove Duplicates ([Data Tab | Remove Duplicates]) | | | Samuel Murray هولندا Local time: 19:50 عضو (2006) أنجليزي إلى أفيقاني + ...
Jean Lachaud wrote: Add the content of one list to the other Import/copy into an Excel column Remove Duplicates ([Data | Data Tools | Remove Duplicates]) If you do this, then you end up with a column that contains all terms. The way I understand it, Hans wants only terms that occur in both files. If a term occurs only in one file, then he doesn't what that term. In other words (if we assume that duplicates (except one instance, of course) were already removed from each list individually), then step #3 should be something like "remove non-duplicates" (i.e. remove all terms that appear only once in the list). | | | Jean Lachaud الولايات المتحدة Local time: 13:50 أنجليزي إلى فرنسي + ...
You are right. Still, I'm pretty sure there is a quick way to do that in Excel, but I don't have time today to research it. Samuel Murray wrote: If you do this, then you end up with a column that contains all terms. The way I understand it, Hans wants only terms that occur in both files. If a term occurs only in one file, then he doesn't what that term. In other words (if we assume that duplicates (except one instance, of course) were already removed from each list individually), then step #3 should be something like "remove non-duplicates" (i.e. remove all terms that appear only once in the list). | |
|
|
Samuel Murray هولندا Local time: 19:50 عضو (2006) أنجليزي إلى أفيقاني + ... @Hans, here's a superfast one | Oct 26, 2019 |
Hans Lenting wrote: I'm looking for an offline tool [etc.] I found an AutoIt script that can do this, cannibalized it a bit, and here you go: http://www.leuce.com/autoit/compare_two_lists.zip It's super, super fast. It doesn't sort the files. It creates three files: one with terms that occur only in file 1, one with terms that occur only in file 2, and one with only terms that occur in both files. Note that the script counts all instances of a term in either file as a single term (put differently: so if a term occurs twice in the same file, the script counts it as one term only; put differently: the script removes all duplicates from each file's content before comparing the two files). It leaves the original files intact.
[Edited at 2019-10-26 15:13 GMT] | | | Luca Tutino إيطاليا عضو (2002) أنجليزي إلى إيطالي + ... Just add a couple of variations to Jean solution (case sensitive) | Oct 26, 2019 |
Before merging the lists your should eliminate any repetition from each list separately, by using the Excel remove duplicates command. Then you merge them and sort the merged list as suggested by Jean. Now, you can add a formula like this in Cell B2: =identical(A2;A1). Copy the cell B2 in the remaining rows of column B, and you automatically get =identical(A3;A2) in Cell B3 and so on. The formula will indicate "True" for the terms appearing twice, which m... See more Before merging the lists your should eliminate any repetition from each list separately, by using the Excel remove duplicates command. Then you merge them and sort the merged list as suggested by Jean. Now, you can add a formula like this in Cell B2: =identical(A2;A1). Copy the cell B2 in the remaining rows of column B, and you automatically get =identical(A3;A2) in Cell B3 and so on. The formula will indicate "True" for the terms appearing twice, which means originally appearing in both lists, and false for all the other terms, as well as for the first appearance of the double terms. Use the Automatic Filter in column B to select the "True" rows. Copy the filtered column A in a new worksheet, and you have your desired list.
[Edited at 2019-10-26 16:22 GMT]
[Edited at 2019-10-26 16:24 GMT] ▲ Collapse | | | Luca Tutino إيطاليا عضو (2002) أنجليزي إلى إيطالي + ... Additional step for case insensitive | Oct 26, 2019 |
Just add the function "=upper(A1)" in B1 and copy Cell B1 in the remaining rows of column B. Then proceed as above by referring the "identical" formula to column B rather than column A and placing it in column C rather than column B.
[Edited at 2019-10-26 16:24 GMT] | | | Samuel Murray هولندا Local time: 19:50 عضو (2006) أنجليزي إلى أفيقاني + ...
The script assumes Windows line breaks (CRLF), so if your files have Unix line breaks, try changing CRLF to LF in the script. | |
|
|
Hans Lenting هولندا عضو (2006) ألماني إلى هولندي بادئ الموضوع
Samuel Murray wrote: The script assumes Windows line breaks (CRLF), so if your files have Unix line breaks, try changing CRLF to LF in the script. Thank you all! I've used the second script that Samuel provided. @Samuel, if you can find a case-insensitive solution, I'd be much obliged. @Jean: I'll test the Mac version of Beyond Compare. | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Offline tool to compare two word lists Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
| Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |