Do we need to look for new software ?
In an unguarded moment of misguided enthusiasm (and, there is no other way to put it) I volunteered to translate a couple of my favorite TED talks. The idea was simple – challenging myself enough to learn the literary side of translating whole pieces of text would allow me to get to the innards of the language that is my mother tongue and, I use for conversation. Turns out that there was an area that I never factored in.
Talks have transcripts and, they are whole blocks of dialogue which have a different feel when undergoing translations than the User Interface artifacts that make of the components of the software I translate. In some kind of confusion I turned to the person who does this so often that she’s real good at poking holes in any theory I propound. In reality, it was my turn to be shocked. When she does translations of documents, Runa faces problems far deeper than what I faced during the translation of transcripts. And, her current toolset is woefully inadequate because they are tuned to the software translation way of doing things rather than document/transcript/pieces of text translation.
In a nutshell, the problem relates to the breaking of text into chunks that are malleable for translation. More often than not, if the complete text is a paragraph or, at least a couple of sentences – the underlying grammar and the construction are built to project a particular line of thought – a single idea. Chunking causes that seamless thread to be broken. Additionally, when using our standard tools viz. Lokalize/KBabel, Virtaal, Lotte, Pootle, such chunks of text make coherent translation more difficult because of the need to fit things within tags.
Here’s an example from the TED talk by Alan Kay. It is not representative, but would suffice to provide an idea. If you consider it as a complete paragraph expressing a single idea, you could look at something like:
“So let's take a look now at how we might use the computer for some of this. And, so the first idea here is just to how you the kind of things that children can do. I am using the software that we're putting on the 100 dollar laptop. So, I'd like to draw a little car here. I'll just do this very quickly. And put a big tire on him. And I get a little object here, and I can look inside this object. I'll call it a car. And here's a little behavior car forward. Each time I click it, car turn. If I want to make a little script to do this over and over again, I just drag these guys out and set them going.”
Do you see what is happening ? If you read the entire text as a block, and, if you are grasping the idea, the context based translation that can present the same thing lucidly in your target language starts taking shape.
Now, check what happens if we chunk it in the way TED does it for translation.
So let's take a look now at how we might use the computer for some of this.
And, so the first idea here is
just to how you the kind of things that children can do.
I am using the software that we're putting on the 100 dollar laptop.
So, I'd like to draw a little car here.
I'll just do this very quickly. And put a big tire on him.
And I get a little object here, and I can look inside this object.
I'll call it a car. And here's a little behavior car forward.
Each time I click it, car turn.
If I want to make a little script to do this over and over again,
I just drag these guys out and set them going.
Get them out of context and, it does make threading the idea together somewhat difficult. At least, it seems difficult for me. So, what’s the deal here ? How do other languages deal with similar issues ? I am assuming you just will not be considering the entire paragraph, translating accordingly and then slicing and dicing according to the chunks. That is difficult isn’t it ?
On a side note, the TED folks could start looking at an easier interface to allow translation. I could not figure out how one could translate and save as draft, and, return again to pick up from where one left off. It looks like it mandates a single session sitdown-deliver mode of work. That isn’t how I am used to doing translations in the FOSS world that it makes it awkward. Integrating translation memories which would be helpful for languages with substantial work and, auto translation tools would be sweet too. Plus, they need to create a forum to ask questions – the email address seems to be unresponsive at best.






The way TED is handling translations is obviously completely broken, I wonder if whoever came up with that system actually did any form of translation, ever. The best way to translate that kind of text is to fire up a text editor and progressively (in linear sequence) replace the original with the translated text. At least that’s how I (a complete amateur at translations, though I occasionally have to translate something between 2 of the 4 languages I speak) translate that kind of text. Tools are just making things harder for continuous texts, and chunking is artificial, pointless and counterproductive.
Reply
Kevin Kofler
6 Dec 09 at 8:08 am
The task you are describing is called ’sentence segmentation’ (that should make it easier to google
. OmegaT (http://www.omegat.org) is probably the best all-round solution to your general problem: it’s similar to tools like Virtaal, but aimed at translators (rather than localisers): it performs sentence segmentation itself, and has translation memory, etc.
For sentence segmentation alone, there are quite a number of options. Maligna (http://sf.net/projects/align/) will segment text files, but it will also perform bilingual segmentation of translation, aligning both files for use as a translation memory. Hunalign will also do that, but I don’t have a link at hand.
Reply
sankarshan Reply:
December 7th, 2009 at 2:49 pm
Exactly. I did not want to put out sentence segmentation and lack thereof as the problem. In effect, the TED folks require the subtitles to be translated. Which is different from pure application software or, website or even document translation.
I am a dilettante when it comes to translations and am game for acquiring new skills. I’d love to learn from those who translate sub-titles as to how it is done. Although, I am somewhat certain that the software backend being used by TED isn’t the most optimal piece.
Reply
Jim Reply:
December 7th, 2009 at 5:30 pm
First of all, I think my link may have muddied the discussion below a little. Full disclosure: yes, I work on open source machine translation; no, that’s not what I intended to recommend.
Yes, you’re quite right, subtitle translation is quite a different task to document translation. Even ‘document translation’ can have vastly different requirements, as Kenneth and Leonardo mention: Kenneth, it seems to me, does not work with ‘functional’ documents – there should be no need to ‘get inside the mind of the author’ in a well written work that is intended as a reference, such as a manual.
OmegaT, which I mentioned, is a Translation Memory tool – *not* MT – and provides as much context as will fit on the screen, both original and translated. It has some support for subtitles, though I don’t think it has any special facilities for the requirements of subtitle translation, per se – subtitles have a temporal aspect that documents don’t: the translation has to be such that it may be read within the time available to display it.
Jubler (http://www.jubler.org/), a subtitle editor, has a ‘translation mode’, which provides a pair of editors. Perhaps that might be a little better?
Reply
Jim
6 Dec 09 at 8:09 am
There’s no way to perform a decent translation sentence-by-sentence on such text. We translate documentation paragraph-by-paragraph and I don’t see any reason to no do so with the transcripts.
Reply
sankarshan Reply:
December 7th, 2009 at 2:51 pm
Precisely. Which means that one has to junk the TED provided interface, use a standard text editor to translate. This of course brings up a different question – how do I then shove the sentences in according to the English ones on the TED UI ?
Reply
Leonardo Fontenelle
6 Dec 09 at 6:27 pm
translation can not be done para-by-para also. You have to read the whole thing, get inside the mind of the author and progressively churn out a translation. This requires several passes through the *whole* document. That is the only way a consistent translation can be done.
Reply
sankarshan Reply:
December 7th, 2009 at 2:47 pm
I agree with you. It is impossible to just read parts of text and then translate. The idea of a piece of text is that the author is trying to project a central idea – constant and repeated reading of the text allows the idea to be formed in the mind of the translator. This is opposed to the translator interpreting the idea and trying to translate.
My concern is the way the TED folks are asking the translation to be done. For example, this is more of subtitle translation which, I think, is a new skill one would need to acquire. At least I’d need to.
Reply
Kenneth Gonsalves
7 Dec 09 at 7:38 am
what is akisment? is it some new version of akismet?
Reply
sankarshan Reply:
December 7th, 2009 at 2:44 pm
Thank you for spotting it. Spelling mistake on my part. Now corrected.
Reply
Kenneth Gonsalves
7 Dec 09 at 7:39 am
[...] your search Sankarshan Mukhopadhyay: Do we need to look for new software ? is now available in this link…: News [...]
Sankarshan Mukhopadhyay: Do we need to look for new software ? | Full-Linux.com
7 Dec 09 at 12:38 pm
Trust techies to try to bung IT at every issue
Seriously though, something as precious as a TED Talk needs to be done by mind/hand/body rather than machine intelligence.
Reply
Ashwin Baindur
14 Dec 09 at 8:21 pm