Pages in topic: [1 2] > | Counting words in a txt file within quotation marks Thread poster: Afew
| Afew Kazakhstan Local time: 04:48 English to Kazakh
Hello fellow translators, I have a txt file with software strings in it to be localized. It looks like: #command some text "text to be localized" // comment I want to count the words within quotation marks. Is there any way to do it, except manual counting? I tried importing txt to MS Excel, but it seems the file is not correctly delimited. So, the words I need may appear on different columns. Any help will be much appreciated. | | | A job for Perl | Feb 9, 2012 |
Give the file to somebody you know who uses the Perl programming language, and ask them to run this: perl -i.bak -pe "s/^.+?\"//; s/\".+$//" yourfilename That will remove everything from your file except the parts in quotes (the original file will be renamed as yourfilename.bak). You can then count the words in the new file. This assumes that all the lines have the same format. | | | Afew Kazakhstan Local time: 04:48 English to Kazakh TOPIC STARTER Some strings are different | Feb 9, 2012 |
Thanks Philip, Unfortunately, some lines contain only comments and there are lines that contain #command... and "text" but no comments. I was able to count the number of quotation marks in excel using countif function but it was useless, since there are lines with sentences in quotation marks. | | | Amit Evron Vietnam Local time: 05:48 Spanish to English + ...
If it's not confidential and if the file isn't too big, feel free to send it over and I'll write a quick perl script. Shouldn't take more than 5 minutes. Just send me a message through Proz and I'll reply with my e-mail address. | |
|
|
Tony M France Local time: 00:48 Member French to English + ... SITE LOCALIZER Paste into Word | Feb 9, 2012 |
Haven't tested it, but why not try this: Select all your text and paste it into Word (etc.) Do a 'replace all' on the " (careful to get the right character!), replacing with (say) Tab Select all and convert text to table, using the character you replaced above (e.g. Tab) as the delimiter. This should enable you to get a column that just has your text to be translated in, and you can take it from there If you have any lines with no ... See more Haven't tested it, but why not try this: Select all your text and paste it into Word (etc.) Do a 'replace all' on the " (careful to get the right character!), replacing with (say) Tab Select all and convert text to table, using the character you replaced above (e.g. Tab) as the delimiter. This should enable you to get a column that just has your text to be translated in, and you can take it from there If you have any lines with no " " at all, they should just appear all in the first column. Theoretically at least, you ought to be able to reverse the process at the end... One proviso: one has to assume that each line does end with a Return character or similar; if necessary, you might need to go through and replace whatever the end-of-line delimiter is with something that will work in Word for the conversion to table. ▲ Collapse | | | Should still work | Feb 9, 2012 |
Nurzhan Nagashbekov wrote: Unfortunately, some lines contain only comments and there are lines that contain #command... and "text" but no comments. I think my script should still work with a small modification (for the comment only lines), but as Amit has kindly offered to take it on I'm happy to hand over to him. | | | Afew Kazakhstan Local time: 04:48 English to Kazakh TOPIC STARTER Thanks for suggestions! | Feb 9, 2012 |
Amit Evron wrote: If it's not confidential and if the file isn't too big, feel free to send it over and I'll write a quick perl script. Shouldn't take more than 5 minutes. Just send me a message through Proz and I'll reply with my e-mail address. It is confidential | | | Afew Kazakhstan Local time: 04:48 English to Kazakh TOPIC STARTER This may work... | Feb 9, 2012 |
Tony M wrote: Haven't tested it, but why not try this: Select all your text and paste it into Word (etc.) Do a 'replace all' on the " .... Thanks Tony, I will try your method. | |
|
|
Jaroslaw Michalak Poland Local time: 00:48 Member (2004) English to Polish SITE LOCALIZER Okapi Rainbow | Feb 9, 2012 |
I think the best option would be to use Okapi Rainbow, especially if you expect more such work form the client. Basically, it would allow you to extract the text you require (using regular expressions) and then calculate the wordcount. Trados 2007 also has an option to import text based on regular expressions. You have to use a separate application Filter Settings for this. After the import you just analyze the resulting ttx file as usual. I realize that having to learn... See more I think the best option would be to use Okapi Rainbow, especially if you expect more such work form the client. Basically, it would allow you to extract the text you require (using regular expressions) and then calculate the wordcount. Trados 2007 also has an option to import text based on regular expressions. You have to use a separate application Filter Settings for this. After the import you just analyze the resulting ttx file as usual. I realize that having to learn regular expressions might seem daunting, but if you plan to translate such texts it will be a sensible investment of your time... ▲ Collapse | | |
I fervently hope that you'll be using a CAT for this job. The localization of SW strings requires strict formatting consistency and there are a lot of repetitions etc., so it' really not the job you'd want to do by typing over the original. Now, If you do use a CAT, just do the word count there. Studio has the required capabilities (i.e. you can specify regex rules that separate the translatable text from the rest), and the Studio package also comes with a specialized sw localization... See more I fervently hope that you'll be using a CAT for this job. The localization of SW strings requires strict formatting consistency and there are a lot of repetitions etc., so it' really not the job you'd want to do by typing over the original. Now, If you do use a CAT, just do the word count there. Studio has the required capabilities (i.e. you can specify regex rules that separate the translatable text from the rest), and the Studio package also comes with a specialized sw localization tool (Passolo). Of course there are lots of other tools that'll work, too. The more interesting question is: who is in charge of this project? Isn't there a PM/client who sorts these things out before you get involved? ▲ Collapse | | |
I had a few minutes to spare, so I set up this: http://quote.writewords.eu/ If you paste your text in the box and click Submit, it should return you only the stuff that's between quotes. | | |
Philip Lees wrote: Give the file to somebody you know who uses the Perl programming language, and ask them to run this: perl -i.bak -pe "s/^.+?\"//; s/\".+$//" yourfilename That will remove everything from your file except the parts in quotes (the original file will be renamed as yourfilename.bak). You can then count the words in the new file. This assumes that all the lines have the same format. It also assumes that there is only one pair of quotes in one line and that there are no escaped quotes inside quoted strings. It'll fail with lines like this: StringID:4567267; text:"Press the \"Browse\" button to pick a file"; Button:"Browse" And it doesn't skip lines that have no translatable content at all. Also, .+? is better written as .* and the " may very well be the last character on the line so .+$// should be .*$//. So, I'd rewrite your one-liner as: perl -i.bak -pe "s/^.*\"(.*)\".*$/$1/" yourfilename ...but this still doesn't handle the problem cases I mentioned above. You could do this (untested) to delete lines that don't contain any quoted string: perl -i.bak -pe "next unless /\".*\"/; s/^.*\"(.*)\".*$/$1/" yourfilename ... but the bottom line is, it's still only usable if the input file is "simple". You could add negative lookahead/lookbehind to cater for escaped quotes inside the quoted strings etc. to make it work and then somehow adapt it for multiple strings per line, but it starts to get tricky there, and you need to see the input file (or know its spec) to take a reasonable stab at solving the problem.
[Edited at 2012-02-09 10:54 GMT] | |
|
|
Afew Kazakhstan Local time: 04:48 English to Kazakh TOPIC STARTER Initial stage of the project | Feb 9, 2012 |
I am at the very beginning of the project and just wanted to know what is the wordcount for now. I will definitely try regex. Thanks! | | | Nobody's perfect | Feb 9, 2012 |
FarkasAndras wrote: It also assumes that there is only one pair of quotes in one line and that there are no escaped quotes inside quoted strings. It'll fail with lines like this: StringID:4567267; text:"Press the \"Browse\" button to pick a file"; Button:"Browse" Oh, sure, it breaks in lots of cases, as does the simpler match I used on the web version: /"(.+?)"/ I am well aware of the pitfalls of text parsing, which is why I added the caveat about all lines having the same format as the example provided. As this is not a Perl or a regex forum, I'll leave it at that. | | | simplier perl code | Feb 9, 2012 |
I think this single line of perl should suffice: perl -nle 'print $1 if /#command\s+"([^"]*)"/' This assumes that double quotation can’t occur inside the pair of double quotation marks that marks the string to be translated. Usually this is not the case and (assuming that " is escaped with a single backslash) the perl needed will more likely be perl -nle 'print $1 if /#command\s+"((?:\\"|[^"])*)"/' Of course, if escaping of quotation marks oc... See more I think this single line of perl should suffice: perl -nle 'print $1 if /#command\s+"([^"]*)"/' This assumes that double quotation can’t occur inside the pair of double quotation marks that marks the string to be translated. Usually this is not the case and (assuming that " is escaped with a single backslash) the perl needed will more likely be perl -nle 'print $1 if /#command\s+"((?:\\"|[^"])*)"/' Of course, if escaping of quotation marks occurs but is not signalled by backslashes then the perl code needed will be different. ETA: The above assumes that continuations don’t occur. If continuations do occur the above won’t work and one-liner solutions might not be sufficient…
[Edited at 2012-02-09 19:03 GMT] ▲ Collapse | | | Pages in topic: [1 2] > | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Counting words in a txt file within quotation marks Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
| Trados Studio 2022 Freelance | The leading translation software used by over 270,000 translators.
Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop
and cloud solution, empowering you to work in the most efficient and cost-effective way.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |