Extracting glossary/TM from existing files • Mac
Thread poster: BabelOn-line
BabelOn-line
BabelOn-line
United Kingdom
Local time: 02:47
English to French
+ ...
Apr 4, 2012

Hello All

I am a fairly unfrequent and not-very-advanced user of OmegaT as most of my work is mainly for magazines with few repeats (even though i am very, very happy to have OmegaT at hand for some jobs).

I have quite a number of Word files, both in source and target versions, with source and target structures that are almost identical.

For one given client, i'd like to reuse this resource in the future for consistency.

I'd like is to be abl
... See more
Hello All

I am a fairly unfrequent and not-very-advanced user of OmegaT as most of my work is mainly for magazines with few repeats (even though i am very, very happy to have OmegaT at hand for some jobs).

I have quite a number of Word files, both in source and target versions, with source and target structures that are almost identical.

For one given client, i'd like to reuse this resource in the future for consistency.

I'd like is to be able to "align" (not sure this is the right term here) my existing source and target in order to do the following: i would like OmegaT to spot the expressions that i have already translated in the past and suggest the corresponding translation.

Most of the time, the "reusable wordings" won't be a whole segment, but rather short wordings like e.g. "Fully Integrated management module", "central dashboard" or possibly acronyms like "MSP".

I am not even sure that a CAT or OmegaT is the right tool for this.

1/ Do you think this is achievable with OmegaT?

2/ What kind of tool could I use to create a translation memory based on my source/target Word files? I work on a Mac, but i can use a PC emulator (VM ware) if the application you know only works on PC.

Thanks a lot for your help.
Collapse


 
Joakim Braun
Joakim Braun  Identity Verified
Sweden
Local time: 03:47
German to Swedish
+ ...
I'm working on it Apr 4, 2012

I'm working on a TMX editor for MacOSX with a built-in aligner. It's not even beta-stage, but you can certainly create multi-language TMX memories from text that you paste. (No formatting is preserved.)

E-mail me if you're interested in a pre-alpha version...


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 03:47
English to Hungarian
+ ...
Aligner Apr 4, 2012

Alignment is indeed the name of the procedure, so you need an aligner.

There are various options, the best of the free ones that work on mac are probably bitext2tmx and LF Aligner, and possibly PlusTools (if it works in Word for mac). I'm the author of LF Aligner.
If you're willing to pay, perhaps have a look at ABBYY Aligner Online, which by nature should be platform independent.

LF Aligner currently has no GUI - you've been warned.


 
lidija68
lidija68  Identity Verified
Italy
Local time: 03:47
Italian to Serbian
+ ...
lf aligner Apr 4, 2012

perhaps you should look here:
http://www.proz.com/forum/cat_tools_technical_help/184708-new_free_open_source_aligner_for_windows_os_x_and_linux.html

It has a command line interface, but if you read instructions carefully it works great (I've tried it on windows xp and windows 7)


 
Milan Condak
Milan Condak  Identity Verified
Local time: 03:47
English to Czech
Free tools Apr 4, 2012

BabelOn-line wrote:

I am not even sure that a CAT or OmegaT is the right tool for this.

1/ Do you think this is achievable with OmegaT?

2/ What kind of tool could I use to create a translation memory based on my source/target Word files? I work on a Mac, but i can use a PC emulator (VM ware) if the application you know only works on PC.

Thanks a lot for your help.


1/ OmegaT can search in glossary or in dictionary. User can create both files.

2/ Two free tools for Windows (or others):


a) Tool for alignment - LF Aligner:

http://www.condak.net/tools/align-sentence/lf-aligner/cs/00.html

http://www.condak.net/cat_other/omegat/2011-08-10/cs/09.html

On my site look at camel logo (LF Aligner logo).

b) Lexical extractor = Lexterm

http://www.condak.net/cat_other/omegat/2011-08-10/cs/10.html

http://www.condak.net/tools/align-word/lexterm/cs/00.html

Milan Condak


 
Rodolfo Raya
Rodolfo Raya  Identity Verified
Local time: 22:47
English to Spanish
Stingray Document Aligner Apr 4, 2012

Take a look at Stingray (http://www.maxprograms.com/products/stingray.html). You can use it to align different types of documents.

Regards,
Rodolfo


 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 03:47
English to French
+ ...
Look also at term extraction Apr 4, 2012

BabelOn-line wrote:
I'd like is to be able to "align" (not sure this is the right term here) my existing source and target in order to do the following: i would like OmegaT to spot the expressions that i have already translated in the past and suggest the corresponding translation.

Most of the time, the "reusable wordings" won't be a whole segment, but rather short wordings like e.g. "Fully Integrated management module", "central dashboard" or possibly acronyms like "MSP".

To extract short terms (rather than full segments), what you need is "term extraction", which is different from alignment (but is often based on already aligned translation memories).

Okapi Rainbow provides monolingual term extraction:
http://www.opentag.com/okapi/wiki/index.php?title=Term_Extraction_Step

One has then to provide the translation. If you have done a previous "classical" alignment, this is easier to find. (For instance, using OmegaT's search function.)

I'm not aware of any free tool offering bilingual term extraction.
There are commercial tools offering that feature.

Didier


 
BabelOn-line
BabelOn-line
United Kingdom
Local time: 02:47
English to French
+ ...
TOPIC STARTER
A big thank you. Apr 5, 2012

Thanks all for your input, I have quite a lot of apps that i can try now.

I have tried Stingray (thanks Rodolfo), but the main pitfall quickly became apparent: as i translate "editorial" pieces, i quite often split a long sentence into two or change the structure of the sentence around to make it sound more French.

As a result, my source and target, while similar in terms of paragraph and overall length, do not match sentence to sentence.

Thanks to Didier f
... See more
Thanks all for your input, I have quite a lot of apps that i can try now.

I have tried Stingray (thanks Rodolfo), but the main pitfall quickly became apparent: as i translate "editorial" pieces, i quite often split a long sentence into two or change the structure of the sentence around to make it sound more French.

As a result, my source and target, while similar in terms of paragraph and overall length, do not match sentence to sentence.

Thanks to Didier for pointing out that what i am after is actually "term extraction"a nd for offering an more targeted app for this.

Anyway, big thanks to all for you help. Have a great Easter!
Collapse


 
Rodolfo Raya
Rodolfo Raya  Identity Verified
Local time: 22:47
English to Spanish
Align paragraphs Apr 5, 2012

BabelOn-line wrote:

I have tried Stingray (thanks Rodolfo), but the main pitfall quickly became apparent: as i translate "editorial" pieces, i quite often split a long sentence into two or change the structure of the sentence around to make it sound more French.


With Stingray you can align paragraphs. Simply check the "Paragraph Segmentation" box when creting a project. You can also join, split and reorder sentences if you work at sentence level.

You can use Anchovy (http://www.maxprograms.com/products/anchovy.html) for term extraction. It is free and lets you generate monolingual glossaries from existing documents and bilingual glossaries from TMX files.

Regards,
Rodolfo


 
Milan Condak
Milan Condak  Identity Verified
Local time: 03:47
English to Czech
I made a short presentation of Lexterm Apr 7, 2012

Milan Condak wrote:

b) Lexical extractor = Lexterm



I made a short presentation of Lexterm and creating glossaries and dictionaries for CATs.

http://www.condak.net/cat_other/omegat/2012-04-07/cs/00.html

In an example are Glossary in Wordfast Classic and Dictionary in OmegaT. Here is one screenshot of OmegaT.

I made a short presentation of Lexterm and creating glosaries and dictionaries for CAT

http://www.condak.net/cat_other/omegat/2012-04-07/cs/00.html

In an example are Glossary in Wordfast Classic and Dictionary in OmegaT.

HTH

Milan


[Upraveno: 2012-04-08 09:07 GMT]


 
BabelOn-line
BabelOn-line
United Kingdom
Local time: 02:47
English to French
+ ...
TOPIC STARTER
Thanks all for your help Apr 10, 2012

I have a lot of apps to try. but at least in know that i am after "term extraction" rather than alignment.

Thansks again


 
Milan Condak
Milan Condak  Identity Verified
Local time: 03:47
English to Czech
Word alignment is OK, too Apr 11, 2012

BabelOn-line wrote:

I have a lot of apps to try, but at least in know that i am after "term extraction" rather than alignment.

Thanks again


First step is paragraph or sentence alignment.

Second step is term extraction, or "word alignment". See,

http://en.wikipedia.org/wiki/Bitext_word_alignment

On the site are links to some toolkits.

Or, you can google for "word alignment toolkit".

Milan


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Extracting glossary/TM from existing files • Mac






Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »