Extracting glossary/TM from existing files • Mac (OmegaT support)

Technical forums » OmegaT support »
Extracting glossary/TM from existing files • Mac
Track this topic

Extracting glossary/TM from existing files • Mac

Thread poster: BabelOn-line

BabelOn-line
United Kingdom
Local time: 02:47
English to French
+ ...

Apr 4, 2012

Hello All

I am a fairly unfrequent and not-very-advanced user of OmegaT as most of my work is mainly for magazines with few repeats (even though i am very, very happy to have OmegaT at hand for some jobs).

I have quite a number of Word files, both in source and target versions, with source and target structures that are almost identical.

For one given client, i'd like to reuse this resource in the future for consistency.

I'd like is to be able to "align" (not sure this is the right term here) my existing source and target in order to do the following: i would like OmegaT to spot the expressions that i have already translated in the past and suggest the corresponding translation.

Most of the time, the "reusable wordings" won't be a whole segment, but rather short wordings like e.g. "Fully Integrated management module", "central dashboard" or possibly acronyms like "MSP".

I am not even sure that a CAT or OmegaT is the right tool for this.

1/ Do you think this is achievable with OmegaT?

2/ What kind of tool could I use to create a translation memory based on my source/target Word files? I work on a Mac, but i can use a PC emulator (VM ware) if the application you know only works on PC.

Thanks a lot for your help. ▲ Collapse

Joakim Braun

Sweden
Local time: 03:47
German to Swedish
+ ...

I'm working on it

Apr 4, 2012

I'm working on a TMX editor for MacOSX with a built-in aligner. It's not even beta-stage, but you can certainly create multi-language TMX memories from text that you paste. (No formatting is preserved.)

E-mail me if you're interested in a pre-alpha version...

FarkasAndras

Local time: 03:47
English to Hungarian
+ ...

Aligner

Apr 4, 2012

Alignment is indeed the name of the procedure, so you need an aligner.

There are various options, the best of the free ones that work on mac are probably bitext2tmx and LF Aligner, and possibly PlusTools (if it works in Word for mac). I'm the author of LF Aligner.
If you're willing to pay, perhaps have a look at ABBYY Aligner Online, which by nature should be platform independent.

LF Aligner currently has no GUI - you've been warned.

lidija68

Italy
Local time: 03:47
Italian to Serbian
+ ...

lf aligner

Apr 4, 2012

perhaps you should look here:
http://www.proz.com/forum/cat_tools_technical_help/184708-new_free_open_source_aligner_for_windows_os_x_and_linux.html

It has a command line interface, but if you read instructions carefully it works great (I've tried it on windows xp and windows 7)

Milan Condak

Local time: 03:47
English to Czech

Free tools

Apr 4, 2012

BabelOn-line wrote:

I am not even sure that a CAT or OmegaT is the right tool for this.

1/ Do you think this is achievable with OmegaT?

2/ What kind of tool could I use to create a translation memory based on my source/target Word files? I work on a Mac, but i can use a PC emulator (VM ware) if the application you know only works on PC.

Thanks a lot for your help.

1/ OmegaT can search in glossary or in dictionary. User can create both files.

2/ Two free tools for Windows (or others):

a) Tool for alignment - LF Aligner:

http://www.condak.net/tools/align-sentence/lf-aligner/cs/00.html

http://www.condak.net/cat_other/omegat/2011-08-10/cs/09.html

On my site look at camel logo (LF Aligner logo).

b) Lexical extractor = Lexterm

http://www.condak.net/cat_other/omegat/2011-08-10/cs/10.html

http://www.condak.net/tools/align-word/lexterm/cs/00.html

Milan Condak

Rodolfo Raya

Local time: 22:47
English to Spanish

Stingray Document Aligner

Apr 4, 2012

Take a look at Stingray (http://www.maxprograms.com/products/stingray.html). You can use it to align different types of documents.

Regards,
Rodolfo

Didier Briel

France
Local time: 03:47
English to French
+ ...

Look also at term extraction

Apr 4, 2012

BabelOn-line wrote:
I'd like is to be able to "align" (not sure this is the right term here) my existing source and target in order to do the following: i would like OmegaT to spot the expressions that i have already translated in the past and suggest the corresponding translation.

Most of the time, the "reusable wordings" won't be a whole segment, but rather short wordings like e.g. "Fully Integrated management module", "central dashboard" or possibly acronyms like "MSP".

To extract short terms (rather than full segments), what you need is "term extraction", which is different from alignment (but is often based on already aligned translation memories).

Okapi Rainbow provides monolingual term extraction:
http://www.opentag.com/okapi/wiki/index.php?title=Term_Extraction_Step

One has then to provide the translation. If you have done a previous "classical" alignment, this is easier to find. (For instance, using OmegaT's search function.)

I'm not aware of any free tool offering bilingual term extraction.
There are commercial tools offering that feature.

Didier

BabelOn-line
United Kingdom
Local time: 02:47
English to French
+ ...

TOPIC STARTER

A big thank you.

Apr 5, 2012

Thanks all for your input, I have quite a lot of apps that i can try now.

I have tried Stingray (thanks Rodolfo), but the main pitfall quickly became apparent: as i translate "editorial" pieces, i quite often split a long sentence into two or change the structure of the sentence around to make it sound more French.

As a result, my source and target, while similar in terms of paragraph and overall length, do not match sentence to sentence.

Thanks to Didier f... See more

Rodolfo Raya

Local time: 22:47
English to Spanish

Align paragraphs

Apr 5, 2012

BabelOn-line wrote:

I have tried Stingray (thanks Rodolfo), but the main pitfall quickly became apparent: as i translate "editorial" pieces, i quite often split a long sentence into two or change the structure of the sentence around to make it sound more French.

With Stingray you can align paragraphs. Simply check the "Paragraph Segmentation" box when creting a project. You can also join, split and reorder sentences if you work at sentence level.

You can use Anchovy (http://www.maxprograms.com/products/anchovy.html) for term extraction. It is free and lets you generate monolingual glossaries from existing documents and bilingual glossaries from TMX files.

Regards,
Rodolfo

Milan Condak

Local time: 03:47
English to Czech

I made a short presentation of Lexterm

Apr 7, 2012

Milan Condak wrote:

b) Lexical extractor = Lexterm

I made a short presentation of Lexterm and creating glossaries and dictionaries for CATs.

http://www.condak.net/cat_other/omegat/2012-04-07/cs/00.html

In an example are Glossary in Wordfast Classic and Dictionary in OmegaT. Here is one screenshot of OmegaT.

I made a short presentation of Lexterm and creating glosaries and dictionaries for CAT

http://www.condak.net/cat_other/omegat/2012-04-07/cs/00.html

In an example are Glossary in Wordfast Classic and Dictionary in OmegaT.

HTH

Milan

[Upraveno: 2012-04-08 09:07 GMT]

BabelOn-line
United Kingdom
Local time: 02:47
English to French
+ ...

TOPIC STARTER

Thanks all for your help

Apr 10, 2012

I have a lot of apps to try. but at least in know that i am after "term extraction" rather than alignment.

Thansks again

Milan Condak

Local time: 03:47
English to Czech

Word alignment is OK, too

Apr 11, 2012

BabelOn-line wrote:

I have a lot of apps to try, but at least in know that i am after "term extraction" rather than alignment.

Thanks again

First step is paragraph or sentence alignment.

Second step is term extraction, or "word alignment". See,

http://en.wikipedia.org/wiki/Bitext_word_alignment

On the site are links to some toolkits.

Or, you can google for "word alignment toolkit".

Milan

Login to reply/comment

There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »

Extracting glossary/TM from existing files • Mac

Forum rules

Help and orientation

Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers! The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc. More info »

CafeTran Espresso
You've never met a CAT tool this clever! Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free Buy now! »

Recent posts | FAQ | Rules | Moderators | Article knowledgebase

Your current localization setting

English

Select a language

More languages...

Extracting glossary/TM from existing files • Mac

Extracting glossary/TM from existing files • Mac

You have native languages that can be verified

Your current localization setting

Select a language