Translation – Random thoughts and serendipity

Do we need to look for new software ?

sankarshan — Sun, 06 Dec 2009 01:46:22 +0000

In an unguarded moment of misguided enthusiasm (and, there is no other way to put it) I volunteered to translate a couple of my favorite TED talks. The idea was simple – challenging myself enough to learn the literary side of translating whole pieces of text would allow me to get to the innards of the language that is my mother tongue and, I use for conversation. Turns out that there was an area that I never factored in.

Talks have transcripts and, they are whole blocks of dialogue which have a different feel when undergoing translations than the User Interface artifacts that make of the components of the software I translate. In some kind of confusion I turned to the person who does this so often that she’s real good at poking holes in any theory I propound. In reality, it was my turn to be shocked. When she does translations of documents, Runa faces problems far deeper than what I faced during the translation of transcripts. And, her current toolset is woefully inadequate because they are tuned to the software translation way of doing things rather than document/transcript/pieces of text translation.

In a nutshell, the problem relates to the breaking of text into chunks that are malleable for translation. More often than not, if the complete text is a paragraph or, at least a couple of sentences – the underlying grammar and the construction are built to project a particular line of thought – a single idea. Chunking causes that seamless thread to be broken. Additionally, when using our standard tools viz. Lokalize/KBabel, Virtaal, Lotte, Pootle, such chunks of text make coherent translation more difficult because of the need to fit things within tags.

Here’s an example from the TED talk by Alan Kay. It is not representative, but would suffice to provide an idea. If you consider it as a complete paragraph expressing a single idea, you could look at something like:

“So let's take a look now at how we might use the computer for some of this. And, so the first idea here is just to how you the kind of things that children can do. I am using the software that we're putting on the 100 dollar laptop. So, I'd like to draw a little car here. I'll just do this very quickly. And put a big tire on him. And I get a little object here, and I can look inside this object. I'll call it a car. And here's a little behavior car forward. Each time I click it, car turn. If I want to make a little script to do this over and over again, I just drag these guys out and set them going.”

Do you see what is happening ? If you read the entire text as a block, and, if you are grasping the idea, the context based translation that can present the same thing lucidly in your target language starts taking shape.

Now, check what happens if we chunk it in the way TED does it for translation.

So let's take a look now at how we might use the computer for some of this.


And, so the first idea here is 
just to how you the kind of things that children can do. 
I am using the software that we're putting on the 100 dollar laptop. 
So, I'd like to draw a little car here.
 I'll just do this very quickly. And put a big tire on him. 
And I get a little object here, and I can look inside this object. 
I'll call it a car. And here's a little behavior car forward. 
Each time I click it, car turn. 
If I want to make a little script to do this over and over again,

I just drag these guys out and set them going.
Get them out of context and, it does make threading the idea together somewhat difficult. At least, it seems difficult for me. So, what’s the deal here ? How do other languages deal with similar issues ? I am assuming you just will not be considering the entire paragraph, translating accordingly and then slicing and dicing according to the chunks. That is difficult isn’t it ?

On a side note, the TED folks could start looking at an easier interface to allow translation. I could not figure out how one could translate and save as draft, and, return again to pick up from where one left off. It looks like it mandates a single session sitdown-deliver mode of work. That isn’t how I am used to doing translations in the FOSS world that it makes it awkward. Integrating translation memories which would be helpful for languages with substantial work and, auto translation tools would be sweet too. Plus, they need to create a forum to ask questions – the email address seems to be unresponsive at best.

Context,subtext and inter-text

sankarshan — Sun, 22 Nov 2009 05:50:24 +0000

There are two points with which I’d like to begin:

One, in their Credits to Contributors section, Mozilla (for both Firefox and Thunderbird) state that “We would like to thank our contributors, whose efforts make this software what it is. These people have helped by writing code and documentation, and by testing. They have created and maintained this product, its associated development kits, our build tools and our web sites.” (Open Firefox, go to Help -> About Mozilla Firefox -> Credits, and click on the Contributors hyperlink)
Two, whether with design or, with inadvertent serendipity, projects using Transifex tend to end up defining their portals as “translate..domain_name”. Translation, as an aesthetic requirement is squarely in the forefront. And, in addition to the enmeshed meaning with localization, the mere usage of the word translation provides an elevated meaning to the action and, the end result.

A quick use of the Dictionary applet in GNOME provides the following definition of the word ‘translation’:

The act of rendering into another language; Â interpretation; as, the translation of idioms is Â difficult. [1913 Webster]

With each passing day innovative software is released under the umbrella of various Free and Open Source Software (FOSS) projects. For software that is to be consumed as a desktop application, the ability to be localized into various languages makes the difference in wide adoption and usage. Localization (or, translation) projects form important and integral sub-projects of various upstream software development projects.

In somewhat trivial off-the-cuff remarks which make translation appear easier than it actually is, it is often said that translation is the act of rendering into a target language the content available in the source language. However, localization and translation are not merely replacing the appropriate word or phrases from one language (mostly English) to another language. It requires an understanding of the context, the form, the function and most importantly the idiom of the target language ie. the local language. And yet, in addition to this, there is the fine requirement of the localized interface being usable, while being able to appropriate communicate the message to users of the software – technical and non-technical alike.

There are multiple areas that were briefly touched in the above paragraph. The most important of them being the interplay of context–subtext and inter-text. Translation, by all accounts, provides a referential equivalence. This is because languages and, word forms evolve separately. And, in spite of adoption and assimilation of words from languages, the core framework of a language remains remarkably unique. Add to this mix the extent with which various themes (technology, knowledge, education, social studies, religion) organically evolve and, there is a distinct chance that idioms and meta-data of words,phrases which are so commonplace in a source language, may not be relevant or, present at all in the target language.

This brings about two different problems. The first, whether to stay true to the source language or, whether to adapt the form to the target language. And, the second, as to how far would losses in translations be acceptable. The second is somewhat unique – translations, by their very nature have the capacity to add/augment to the content, to take away/subtract from the content thereby creating a ‘loss’ or, they can adjust and hence provide an arbitrary measure of compensation. The amount of improvement or, comprehension a piece of translated term can bring forward is completely dependent on the strength of the local language and, the grasp over the idiomatic usage of the same that the translator brings to the task at hand. More importantly, it becomes a paramount necessity that the translator be very well versed in the idioms of the source language in additional to being colloquially fluent in the target language.

The first problem is somewhat more delicate – it differs when doing translations for content as opposed to when translating strings of the UI. Additionally, it can differ when doing translations for a desktop environment like, for example, Sugar. The known user model of such a desktop provides a reference, a context that can be used easily when thinking through the context of words/strings that need to be translated. A trivial example is the need to stress on terms that are more prevalent or, commonly used. A pit-fall is of course it might make the desktop “colloquial”. And yet, that would perhaps be what makes it more user-friendly. This paradox of whether to be source-centric or, target-friendly is amplified when it comes to terms which are yet to evolve their local equivalents in common usage. For example, terms like “Emulator” or, “Tooltip” or, “Iconify”being some of the trivial and quick examples.

I can pick up the recent example of “Unmove” from PDFMod to illustrate a need to appreciate the evolution of English as a language and, to point to the need for the developers to listen to the translators and localization communities. The currently available tools and, processes do not allow a proper elaboration of the context of the word. In English, within the context of an action word “move” it is fairly easy to take a guess at what “Unmove” would mean. In languages where the usage of the action word “move” in the context of an operation on a computer desktop (here’s a quirk – the desktop is a metaphor that is being adopted to be used within the context of a computation device) is evolving, Unmove itself would not lend itself well to translation. Such “absent contexts” are the ones which create a “loss in translation”.

The singularity here is that the source language strings can evolve beautifully if feedback is obtained from the translated language in terms of what does improve the software. The trick is perhaps how best to document the context of the words and phrases to enable a much richer and useful translated UI. And, work on tooling that can include and incorporate such feedback. For example, there are enormous enhancements that can be trivially (and sometimes non-trivially) made to translation memory or, machine translation software so as to enable a much sharper equivalence.

(The above is a somewhat blog representation of what I planned to talk about at GNOME.Asia had my travel agent not made a major mess of the visa papers.)

Tools of the translation trade

sankarshan — Wed, 19 Aug 2009 05:53:13 +0000

I begin with a caveat – I am a dilettante translator and hence the tools of my trade (these are the tools I have used in the past or, use daily) or, the steps I follow might not reflect reality or, how the “real folks” do translation. I depend to a large extent on folks doing translation-localization bits for my language and, build heavily on their works.

KBabel

I used it only infrequently when it was around in Fedora (it is still available in Red Hat Enterprise Linux 5) but once I did get over the somewhat klunky interface, it was a joy to work with. Seriously rugged and, well formed into the ways of doing translations, KBabel was the tool of choice. However, it was replaced by Lokalize (more on that later) and so, I moved on to Lokalize.

Lokalize

This has so much promise and yet, there is so much left to be desired in terms of stability. For example, a recent quirk that I noticed is that in some cases, translating the files using Lokalize and, then viewing it using a text editor shows the translated strings. However, loading them in KBabel or, another tool shows the lines as empty. The Kbabel -> Lokalize transformation within KDE could have perhaps done with a bit of structured requirements definition and, testing (I am unaware as to whether such things were actually done and, would be glad to read up any existing content on that). Then there’s this quirk for the files in the recent GNOME release – copying across the content when it is in the form Address leaves the copied form as empty space. The alternative is to input the tags again. Which is a cumbersome process. There are a number of issues reported against the Lokalize releases which actually gives me enough hope, because more issues mean more consumers and hence a need to have a stable and functional application.

Virtaal

I have used it very infrequently. The one reason for that is that it takes some time to get used to the application/tool itself. I guess sometimes too much sparseness in UI is a factor in shying away from the tool. The singular good point which merits a mention is the “Help” or, documentation in Virtaal – it is very well done and, actually demonstrates how best to use the application for day to day usage in translation. This looks to be a promising tool and, with the other parts like translation memory, terminology creator etc tagged on, it will have the makings of a strong toolchain

Pootle

I had been initially reluctant to use a web-based tool to do translations. This however might have been a factor of the early days of Pootle. With the recent Pootle releases, having a web-based translation tool is a good plus. However, it isn’t without its queer flaws – for example, it doesn’t allow one to browse to a specific phrase to translate (or, in other words, in a 290 line file, if you last left it at 175, the choices are either to traverse from the start in bunches of 10 or, 7 or, traverse from the end till one reaches the 176th line), the instances of Pootle that I have used don’t use any translation memory or, terminology add-ons to provide suggestions.

I have this evolving feeling that having a robust web-based tool would provide a better way of handling translations and, help manage content. That is perhaps one of the reasons I have high expectations from the upcoming Pootle releases and, of course, Lotte.

Irrespective of the tools, some specific things that I’d see being handled include the following. I hope that someone who develops tools to help get translations done takes some time out to talk with the folks doing it daily to understand the areas which can do with significant improvements.

the ability to provide a base glossary of words (for a specific language) and, the system allowing it to be consumed during translation so as to provide a semblance of consistency
the ability to take as input a set of base glossaries across languages (for example, a couple of Indic languages do check how other Indic languages have handled the translation) and, the system allowing the translator/reviewer to exercise the option of choosing any of the glossaries to consult
provide robust translation suggestions facilitating re-use and, increasing consistency
a higher level of handling terminology than what is present now
a stronger set of spell checking plumbing
store and display the translation history of a file
the ability to browse to a specific string/line which helps a lot when doing review sprints or, just doing translation sprints

Update: Updated the first line to ensure that it isn’t implied that these are the only tools anyone interested in translation can use. These are tools I have used or, use daily.

Update: Updated the “wish-list” to reflect the needs across tools as opposed to the implied part about they being requested only in Pootle

Lost in translation ?

sankarshan — Sun, 16 Aug 2009 00:40:46 +0000

From a recent mail on the Foundation list, here’s an interesting quote:

Collaboration among advisory board members: Now that we have a sys admin team in place would like to find ways that we can collaborate better. Mentioned an article by J5 that talked about that RH, Novell and others are less involved because of the maintenance burden.They spend time on money on things like translations. No process to get them upstream and so they do it all over again next year.

It is the last line that I find a bit off-key and, out of context.

The post is brought to you by lekhonee v0.7

For the win !

sankarshan — Sat, 04 Jul 2009 01:56:02 +0000

“What does it take to be good at something at which failure is so easy,so effortless ? ” : a quote from Better: A Surgeon’s Notes on Performance by Atul Gawande which is a highly recommended reading for those who have not read it yet (that’s a link to the flipkart.com entry for those who are local).

Last evening over dinner, among other things, Runa and me got talking about translations and, translation quality. That is one of our favorite shop-talk items and, since the morning blog had bits about my performance with spellings, it was a bit more significant. It is a somewhat known issue that most translation teams measure the length of the sprint, that is, how many strings were completed or, the percentage of the coverage for a particular project. Some projects attach badges like “supported” / “unsupported“, “main” / “beta” to the coverage and thus make the rush to the tape more important. At some point in time, it is important for the teams to sit down, understand and make notes about the quality of translations. Left to itself, the phrase “quality of translations” doesn’t mean anything does it ? For example, if the phrase was “Disconnect from VPN…” and, you were required to translate it – how wrong can you go ?

It seems you can go wrong, and, most often do.

One of the reasons that I have observed is that translating strings in application and, translating content like documentation/release_notes/guides require different kind of mind patterns.
The second reason is the lack of fluency in the source language. So, if you are a translator/reviewer for any language, if you are using English source files (as most of us do), you need to be extremely proficient in the language. The way the sentences, phrases and sub-phrases arrange themselves in English may or may not lend themselves to direct translations
The third reason is that most translators do not take time out to first use the application in English (or, read the documentation completely in English) and, use it again (or, read it again) after translation. That is a recipe for disaster. English is a funny language and, sometimes, due to the structure of the source files, the context of the content is lost. What does look like a simple word might have a funny implication if the comprehension about how it is placed within the UI or, the user-interaction flow is not made a note of.

Now that most projects have some kind of “localization steering committees” it would be a good small project to observe which locales are coming up with the highest quality of translations and, attempting to understand what they are doing. Asking the language teams about the reasons that inhibit them from maintaining a high quality would also enable deeper understanding of how a project can help itself become a better one (in a somewhat strange loop way). Such discussions would enable coming up with Guidelines for Quality which are important to have. I firmly believe that all developers desire that their applications be consumed by the largest number of audience possible and, at heart, they are willing to sit down and listen to constructive suggestions about how best they can help the localization teams make it happen. That is the sweet spot the “LSCo” folks need to converge on and get going. In fact, for projects like OLPC, where a lot of new paradigms are being created, understanding translation processes and, chipping away at improving translation quality is highly requested.

Translation is still an activity that requires a fanatical attention to detail and, that little bit of ingenuity. There is something not right about committing a translation that smacks of a “letting go of the disciplined focus on detail” and, does not contain anything new. The job is made somewhat more hard when it comes to documentation. One cannot (and, perhaps should not) go beyond what the author has written and yet, it has to be made available in the local language after “stepping into the shoes” (or, “getting into the mind”) of the original author while making it aligned with the natural flow of the target language. This is also the place where the “translator memory”, as opposed to the “Translation Memory” becomes important. The mind should be supple enough to recall how similar idioms were translated earlier or, if an error that was already reported has cropped up again. Translators have a significant bit to contribute towards making the translation source files better, cleaner, well-maintained and, well documented. And, they have to do it right every time.

All this would come together to produce high quality translations and, wider usage of applications and documentation. Collaboration for the win !

The post is brought to you by lekhonee v0.6