PDFs and new CAT tools Penyiaran jaluran : Miroslav Jeftic
| Miroslav Jeftic Local time: 07:41 Ahli (2009) Bahasa Inggeris hingga Bahasa Serbia + ...
Recently I've noticed several CATs in their latest versions (from SDL, Alchemy, etc) promise PDF support. I haven't tried any of them, but it doesn't sound very convincing to me. Has anyone tried them out, any truth in all that or OCR is still the way to go?
I don't doubt that a text-only PDF will probably go well, but I really would like to hear if anyone tried to load a complex PDF, text with a lot of pictures, tables, pictures in tables, etc and what kind of result was produced i... See more Recently I've noticed several CATs in their latest versions (from SDL, Alchemy, etc) promise PDF support. I haven't tried any of them, but it doesn't sound very convincing to me. Has anyone tried them out, any truth in all that or OCR is still the way to go?
I don't doubt that a text-only PDF will probably go well, but I really would like to hear if anyone tried to load a complex PDF, text with a lot of pictures, tables, pictures in tables, etc and what kind of result was produced in the end.
[Edited at 2010-05-10 09:53 GMT] ▲ Collapse | | | Stanislav Pokorny Republik Czech Local time: 07:41 Bahasa Inggeris hingga Bahasa Czech + ... Very limited | May 10, 2010 |
Hi Miroslav,
in my experience, the PDF filter (in fact an OCR add-in) in SDL Studio works quite well in the following scenario:
- PDF with a text layer
- text only or a picture now and then
- no complex tables
- no tight layout
- small size
It won't work for scanned PDFs, PDFs with tight layout or large (several MB) PDFs. Moreover, the converted PDF is usually full of tags, most of them unnecessary of course. So, I still prefer the traditional met... See more Hi Miroslav,
in my experience, the PDF filter (in fact an OCR add-in) in SDL Studio works quite well in the following scenario:
- PDF with a text layer
- text only or a picture now and then
- no complex tables
- no tight layout
- small size
It won't work for scanned PDFs, PDFs with tight layout or large (several MB) PDFs. Moreover, the converted PDF is usually full of tags, most of them unnecessary of course. So, I still prefer the traditional method:
1. Getting the editable source files, if possible.
2. If the client fails to provide me with them, I run an OCR, "clean" the converted text in terms of removing any redundant formatting and, finally, translate. ▲ Collapse | | | legalads India Local time: 12:11 Bahasa Inggeris hingga Bahasa Hindi + ... In Studio2009, it works as follows | May 10, 2010 |
Hi Miroslav,
It is very simple to open .pdf in studio2009
[img]http://www.public.fotki.com/legalads/pdf-to-studio/1.html[/img]
[img]http://www.public.fotki.com/legalads/pdf-to-studio/2.html[/img]
[img]http://www.public.fotki.com/legalads/pdf-to-studio/3.html[/img]
[img]http://www.public.fotki.com/legalads/pdf-to-studio/5.html[/img]
[img]http://www.public.fotki.com/legalads/pdf-to-studio/6.html[/img]
the li... See more Hi Miroslav,
It is very simple to open .pdf in studio2009
[img]http://www.public.fotki.com/legalads/pdf-to-studio/1.html[/img]
[img]http://www.public.fotki.com/legalads/pdf-to-studio/2.html[/img]
[img]http://www.public.fotki.com/legalads/pdf-to-studio/3.html[/img]
[img]http://www.public.fotki.com/legalads/pdf-to-studio/5.html[/img]
[img]http://www.public.fotki.com/legalads/pdf-to-studio/6.html[/img]
the links for snapshots of process are here above; but I don't know why it is not showing the snaps specially taken and uploaded for you.
anyway its a public album!
Regards,
Sushan
[Edited at 2010-05-10 11:20 GMT]
[Edited at 2010-05-10 11:27 GMT] ▲ Collapse | | | Miroslav Jeftic Local time: 07:41 Ahli (2009) Bahasa Inggeris hingga Bahasa Serbia + ... TOPIC STARTER
Thanks Stanislav! I guess it is as I have thought, we are still far away from good support for 10MB+ worth of scanned pages, unfortunately. | |
|
|
Try the latest version of WORDFAST ANYWHERE with support for scanned PDFs | Apr 12, 2011 |
Hi Miroslav,
Last week, we released a new version of Wordfast Anywhere which features support for scanned PDFs. Using server-side OCR technology, translators have the ability to upload and convert scanned PDFs to RTF for translation.
Wordfast Anywhere is the world's leading web-based translation memory tool. It is offered free to all translators. As always, all content that you upload remains completely confidential inside of your private, password-protected workspac... See more Hi Miroslav,
Last week, we released a new version of Wordfast Anywhere which features support for scanned PDFs. Using server-side OCR technology, translators have the ability to upload and convert scanned PDFs to RTF for translation.
Wordfast Anywhere is the world's leading web-based translation memory tool. It is offered free to all translators. As always, all content that you upload remains completely confidential inside of your private, password-protected workspace. We invite you to try Wordfast Anywhere today at www.FreeTM.com.
Hope this helps,
Kristyna Marrero
Director, Sales & Marketing ▲ Collapse | | | Miroslav Jeftic Local time: 07:41 Ahli (2009) Bahasa Inggeris hingga Bahasa Serbia + ... TOPIC STARTER
Hi Kristyna,
Actually I have tried Wordfast Anywhere, few days ago I think, and while it was ok with the simpler pdfs I uploaded, as soon as I tried one of my "difficult" ones it returned conversion error | | | Michal Glowacki Poland Local time: 07:41 Ahli (2010) Bahasa Inggeris hingga Bahasa Poland + ... CATs don't like PDFs | Apr 13, 2011 |
As far as I know even if a currently developed CAT "handles" PDFs the best you can get is the same result as when using your own OCR or a TXT copy of the text. I wouldn't expect this to change any soon. And no wonders, we need to remember that PDF was actually designed to be uneditable. I think most boasting about PDF handling is just marketing and sales, which crumbles easily when put into real action. | | | Miroslav Jeftic Local time: 07:41 Ahli (2009) Bahasa Inggeris hingga Bahasa Serbia + ... TOPIC STARTER
Michal Glowacki wrote:
As far as I know even if a currently developed CAT "handles" PDFs the best you can get is the same result as when using your own OCR or a TXT copy of the text. I wouldn't expect this to change any soon. And no wonders, we need to remember that PDF was actually designed to be uneditable. I think most boasting about PDF handling is just marketing and sales, which crumbles easily when put into real action.
Fully agree | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » PDFs and new CAT tools Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
| LinguaCore | AI Translation at Your Fingertips
The underlying LLM technology of LinguaCore offers AI translations of unprecedented quality. Quick and simple. Add a human linguistic review at the end for expert-level quality at a fraction of the cost and time.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |