Tag Archives: L10n

Lost in translation ?

From a recent mail on the Foundation list, here’s an interesting quote:

Collaboration among advisory board members: Now that we have a sys admin team in place would like to find ways that we can collaborate better. Mentioned an article by J5 that talked about that RH, Novell and others are less involved because of the maintenance burden.They spend time on money on things like translations. No process to get them upstream and so they do it all over again next year.

It is the last line that I find a bit off-key and, out of context.

The post is brought to you by lekhonee v0.7

For the win !

What does it take to be good at something at which failure is so easy,so effortless ? ” : a quote from Better: A Surgeon’s Notes on Performance by Atul Gawande which is a highly recommended reading for those who have not read it yet (that’s a link to the flipkart.com entry for those who are local).

Last evening over dinner, among other things, Runa and me got talking about translations and, translation quality. That is one of our favorite shop-talk items and, since the morning blog had bits about my performance with spellings, it was a bit more significant. It is a somewhat known issue that most translation teams measure the length of the sprint, that is, how many strings were completed or, the percentage of the coverage for a particular project. Some projects attach badges like “supported” / “unsupported“, “main” / “beta” to the coverage and thus make the rush to the tape more important. At some point in time, it is important for the teams to sit down, understand and make notes about the quality of translations. Left to itself, the phrase “quality of translations” doesn’t mean anything does it ? For example, if the phrase was “Disconnect from VPN…” and, you were required to translate it – how wrong can you go ?

It seems you can go wrong, and, most often do.

  • One of the reasons that I have observed is that translating strings in application and, translating content like documentation/release_notes/guides require different kind of mind patterns.
  • The second reason is the lack of fluency in the source language. So, if you are a translator/reviewer for any language, if you are using English source files (as most of us do), you need to be extremely proficient in the language. The way the sentences, phrases and sub-phrases arrange themselves in English may or may not lend themselves to direct translations
  • The third reason is that most translators do not take time out to first use the application in English (or, read the documentation completely in English) and, use it again (or, read it again) after translation. That is a recipe for disaster. English is a funny language and, sometimes, due to the structure of the source files, the context of the content is lost. What does look like a simple word might have a funny implication if the comprehension about how it is placed within the UI or, the user-interaction flow is not made a note of.

Now that most projects have some kind of “localization steering committees” it would be a good small project to observe which locales are coming up with the highest quality of translations and, attempting to understand what they are doing. Asking the language teams about the reasons that inhibit them from maintaining a high quality would also enable deeper understanding of how a project can help itself become a better one (in a somewhat strange loop way). Such discussions would enable coming up with Guidelines for Quality which are important to have. I firmly believe that all developers desire that their applications be consumed by the largest number of audience possible and, at heart, they are willing to sit down and listen to constructive suggestions about how best they can help the localization teams make it happen. That is the sweet spot the “LSCo” folks need to converge on and get going. In fact, for projects like OLPC, where a lot of new paradigms are being created, understanding translation processes and, chipping away at improving translation quality is highly requested.

Translation is still an activity that requires a fanatical attention to detail and, that little bit of ingenuity. There is something not right about committing a translation that smacks of a “letting go of the disciplined focus on detail” and, does not contain anything new. The job is made somewhat more hard when it comes to documentation. One cannot (and, perhaps should not) go beyond what the author has written and yet, it has to be made available in the local language after “stepping into the shoes” (or, “getting into the mind”) of the original author while making it aligned with the natural flow of the target language. This is also the place where the “translator memory”, as opposed to the “Translation Memory” becomes important. The mind should be supple enough to recall how similar idioms were translated earlier or, if an error that was already reported has cropped up again. Translators have a significant bit to contribute towards making the translation source files better, cleaner, well-maintained and, well documented. And, they have to do it right every time.

All this would come together to produce high quality translations and, wider usage of applications and documentation. Collaboration for the win !

The post is brought to you by lekhonee v0.6

Digital Content in Local Languages: Technology Challenges

I was reading through an article of the same name by Vasudeva Varma. Barring a whopper of a statement, the author does a reasonable job of pointing out some of the areas that needs to be worked on. To begin with however, let’s take that statement:

For example, Hindi is rendered properly only on Windows XP and beyond. Though there are some efforts to create Indic versions of the Linux, largely there is very little support for Indian languages.

It is a bit out out of context but nevertheless it is worth pointing out that one would have expected a bit more accuracy from the author. Especially because availability of Indian languages and their ease of use on Linux distributions have improved significantly. And, folks who use the Indian language Linux desktop on a regular basis for their usual workflow are somewhat unanimous that “things do work”. In fact, it would have been nicer if the author had taken the time to test out a few Linux distributions in the native language mode to identify the weak points. Most of the upstream projects do have very active native language projects with a significant quantum of participants from Indian language communities. For example, translate.fedoraproject.org, l10n.gnome.org, l10n.kde.org etc are the ones that come to mind immediately.

At a larger level, I would whole heartedly agree with the author that there exists gaps which need to be filled up. For example, with the desktop and applications getting localized, there is an urgent need to have “Cookbook” like documentation in native languages primarily for desktop applications. There is a greater need to improve existing work on the following:

  • spell checkers
  • dictionaries
  • OCR

for the various Indic languages so as to enable a more wholesome usage of desktop applications. Sadly enough, a large bulk of the work around the above three bits are still “in captivity” at the various R&D initiatives across institutes in India with not much hope of being made available under an appropriate license allowing integration into FOSS applications.

The other part of the equation are folks who create content or, collate content ie. the writers and the publishers. To a large extent, there is a dearth of large volume of local language content on the Internet. And while it could have been said that the difficulty with Linux and Indian languages was a show stopper, it isn’t really so any more. “Better search” has been a buzzword that has been around for a while, but till the time a quantification of better does happen, it isn’t impossible to get along with what is available right now. The primary barriers to input methods, display/rendering and printing have been largely overcome and, the tools that allow content to be created in Indian languages are somewhat more encoding aware than before. With projects like Firefox taking an active interest in getting things going around Indic, I would hazard a guess that things would get better.

Which brings us to the Desktop Publishing folks. I have talked about them and the need to figure out their requirements a lot of times. Suffice to state, the DTP tools need to be able to handle Indic stuff far better than they do now. And, probably we do have the work cut out there.

Global <-> local

In all the years that I have been interacting with the various upstream FOSS projects, reasoning and convincing various groups to have a ‘local’ view of issues that complements the global strategy has been an uphill task. Sometimes it is just that interpersonal relations have been able to overcome the curve. At other times, it has just been a constant pegging away with facts, data points and a regular representation of issues that validate the need to approach and integrate local issues within the fold of the greater goals of the project. Either way, it makes me happy to see another project realize the need to align the views and inputs of the local participants and, figure out ways and means to respect their inputs and listen to their feedback.

The Regional Groups aspect of OpenOffice.org has gone a bit unnoticed and somewhat unloved (and, it has been my fault since I do not recall talking too much on this). This would be one area where it would be good to have a few folks stand up and take ownership as a steward.

In other news of the day, I have an @gnome.org alias for myself (thanks SysAdmins). Sadly, it has the usual pain of making a botched job of my actual name and, by now, I am so used to folks chopping up my first name every way they feel that I am more amused and less bewildered at the lack of appreciation of names.

2009 would be the Year of …

Og Maciel writes about the possibility of 2009 being the Year of Translations. With the coming-out of awesome tools like Transifex, Damned Lies, Vertimus etc, it sure feels good to be even marginally involved in the process of translations.

Infrastructural pieces coming together ensure that a translation workflow that appeals to all, is easy for the end-user can be put in place with much ease. And, it would also mean a disruptive playing field for startups like Indifex. Making wide open spaces for innovation in translation workflow and infrastructure is an area that is bound to be welcomed by the folks who spend countless hours making applications, desktops and operating systems available in their local languages/locales. They don’t get appreciated often. They get recognized during release times in release notes and the like, but they do keep the engines running and the lights on. This is going to be their year.

I would venture so far as to state that in a trend of “2009 would be the Year of <insert_your_favorite_prediction>” it would be a Year of Content. Free and Open content un-encumbered by restrictive rights and legalese that would be re-distributable, would be informative, would be educational and would be able to bring about a change. Over a period of the last 24 months, methods and tools that enable content creation on Linux desktops have simplified. Especially when it comes to Indian languages. So, there are fonts available (some of them quite elegant), there are keyboard layouts, on-screen keyboards (like Indic OnScreen Keyboard or iok and even Quillpad), input methods, word-lists and like bits that form the user-experience completion when using a Linux desktop to compose content. In sort, the traditional problems in the fields of input-display-printing have been substantially addressed to bring the end-user experience at a level of where it should be easy to just plug-and-create.

There is a wealth of content in Indian languages, starting right from folk-tales that are part of the oral tradition to commercially generated content which needs to start moving into the UTF-8 encoding space. Projects like the OLPC can benefit from the availability of such forms. Work on Indic OCR remains to move forward at a much aggressive pace than what is currently, but there are signs of good things coming out of it. Digitizing data would also enable a lot of content to be archived and made available for consumption.

This is the year that should see a large part of such things happening. The marriage of content creators with the infrastructure developers is something that needs to happen as well. And, this needs to include folks from fields of comparative literature, media studies and the like. Anyone who really does generate content, should be met with and talked to regarding the need to exert themselves to become part of the process. Content already takes in a large chunk of investment outlay for the mobile players and with the availability of easy means of generating content, it would not be far to start thinking about a need to consolidate, find patterns, predict trends.

The convergence of the computing and application prowess of mobile devices, content creation workflows and upswing in the production of Indic language content for the webspace promises to make 2009 an interesting year of innovations.

Season’s Greetings to all.

A gathering of the Fedora faithful

I spent a day and a half at the Freedom in Computer Technology 2008 convention on 26th and 27th of this month. Susmit has already blogged about it. Some pictures are:

People waiting to get the Fedora mediaThe stall

more pictures are available at the usual location. I missed out taking a group picture of the volunteers and the stall before we went into business. My bad. Noted that down as a mandatory picture for next time.

For various reasons it had been a while I have been at a stall, so the “buzz” at seeing folks lining up to hear about F10, L10n and getting their media was exciting enough. Somewhat strangely, not many questions were around the proprietary codec stuff (read: “I want to play mp3”). Having the GLUG-NIT,Durgapur (and, I met Debayan as well) at the next stall meant that we had converging streams of interested audience. It is always a good feeling to finally meeting up with a lot of names from IRC and mailing list. The F10 artwork got rave reviews 🙂

A big round of “Thank You” to all the volunteers (Gopal+his student, Dipanjan, Sarbartha, Ravi, Susmit and Indranil) who made time over the weekend to turn up, tirelessly stand around and answer questions with a big smile. A sizeable quantity of the media and leaflet/handouts were given away. Names of those interested to be on the list have been taken down and Susmit plans to get back in touch with them. Another good thing that came out was the ad-hoc sit down with the colleges who desire to have some “Activity Day”.

I had a small talk on the “Community Model” and how FOSS businesses should begin by looking at getting their act together on it. Had a couple of questions. However, given the audience profile, most questions were around FOSS software and licensing vis-a-vis “freeware”.

Would have been really nice to have network so as to show-off a few stuff – well next time perhaps. The LiveUSB station also got some love 🙂 so I guess that made up for the trouble taken to set it up. The next time IOTA organizes a convention like this it would be good to have a segment for Workshops as well as an Expo area for stalls to be set up. Casting the net a bit more wider in the industry does help in getting stuff being talked about.

ps: I don’t know if the Stallman speech would be having a transcript available, but it would be good to have

pps: Was nice to know that Gopal’s student has been using Fedora since F7 and is proficient with a Linux desktop. It was obvious in the way he helped manage the stall at times.

bn_IN moves out of beta for Firefox

Just read off Seth’s blog that for Firefox 3.0.5, bn_IN has moved out of beta. Thanks to Runa for making that happen. The mandatory download link

Incidentally, the same post from Seth provides pointers to what would be required to be done to move a locale out from beta. That’s a good list to have handy and a page that requires to be bookmarked.

…and here we go again

In an article on l10n and i18n published in this month’s edition of LinuxForYou (the article isn’t available online), Kenneth Gonsalves makes a statement as (italics are mine):

The vast majority of applications today are internationalised – the need of the hour is to provide translations in Indian languages. Except for some major applications, very little work is being done in this field. I don’t know whether it is because people are not aware of the need, are too lazy or they do not know how to!

This thought seems to be the new black. Adding on to the pre-existing notions of:

  • translations are very easy to do
  • translations are for hobbyists

On some days I am surprised about why such perceptions prevail. If any language team/community works on

  • a single distribution
  • two desktop environments
  • a browser
  • a mail client
  • an office suite
  • web-site content translation
  • release notes translation

then, assuming that most projects end up following a 6 month release cycle, it leaves folks with around 3+ months (on an optimistic schedule) to work with. In fact with “string freeze” (or, the time when the developers hand off the English versions to the translators) the effective window to actually translate is around 1 month. And, I have seen that for whatever little translations I have done for GNOME and OLPC.

And, the fact that schedules are tight can be seen on the mailing lists during desktop environment release times. So, if we can assume that the teams aren’t lazy and they know what they are doing, adding any more applications to be translated (and localized) would require capacity to be added. Which means that those who do go about FOSS evangelism and FOSS advocacy have to comprehend the following:

  • translation is not easy. Idiomatic English does not lend itself easily to translations and more importantly, message strings are sometimes not well constructed to be translated. For example, read this blog entry.
  • translation is not for hobbyists. It is a process of ensuring newer applications and releases are available in the local language. Thus, it means that teams working on translations improve quality of existing translations, check for consistency and still manage to work on newer releases. It is a serious business and folks take pride in a job well done.

If there are more applications that require translations/l10n, it would be a good effort to start coordinating with the language teams (via the IndLinux mailing list perhaps) rather than assuming that teams don’t know about such applications or, are lazy.

Still searching for l10n process documentation for Thunderbird

Near about the beginning of this month Simon Paquet had written to a few folks off-list about working on localization of Thunderbird.

Since the bn_IN team hasn’t had a cheery experience of working with the Mozilla team, Runa had requested that a few bits be cleared up. These relate to documentation especially around processes.

Quoting from the mail:

1. A pointed list of to-dos (preferably sequentially) that would allow even a seemingly new translator to complete most of the work without constant supervision/querying

2. An explanation of the automated processes (e.g. the scripts to create local workable copies) that would allow translators to troubleshoot

3. Segregating the inherent processes (like translation of UI, web-parts, getting started pages) and posting them in the aforementioned list of to-dos as part of the standard template.

4. Clear identification of the tools and methodologies (like tinderbox, dashboard, litmus etc.) used and mapping them to the monitoring processes to be followed by the translation group.

5. Identifying the process of querying about context of translatable strings. ( I generally ask on the mailing list, have not seen many queries in there)

I haven’t seen a response about this from either Simon or others from Mozilla. And, I don’t really find that strange.

A post of no importance

In recent times I have blogged about ‘non community based approach to l10n‘ (mail from Gora Mohanty). There is a particular mail on the gnome-i18n mailing list that provides some inputs towards formulating a plan on avoiding such repeat incidents. To quote:

CDAC is a government funded agency and takes up projects from
Government which are based on deadlines which are sometimes strict and
harsh. We work towards deadlines and are answerable to the funding
agency on things we commit. Localisation activity happens to be one of
them.

I tend to hold on to the theory that both the distributions in essence (BOSSLinux and Baishakhi Linux) should be no different that other Linux distributions who work ‘within‘ the community in harmony and collaborate to innovate. Expanding on what I already wrote about working with existing L10n communities, a means to make this possible is to have a release plan available for public view. All the major distributions have a release schedule available in public and an immediate effect of this is that it makes it possible for potential contributors and existing communities to comprehend how the pieces fit in.

Having a release schedule also makes it easy to assess how much work would be required to be put into localization of a particular language, since the components of the distribution in terms of GNOME/KDE/Xfce etc would be targeting a particular release of the desktop environments. The bits that are specific to the distribution viz. installer, configuration toolkits can then be done by the team in charge of the distribution or, the community around the distribution. Taking an example nearer to home, there is much to learn from how to work ‘in’ the community if one takes a long hard look at how Fedora operates.

The reasons that the community got lumped with a huge load of translated files was that there was a lack of communication and synchronization with the folks doing L10n and there was a lack of transparency in the infrastructure that produces the distribution. These are not insurmountable problems, but these are required to work within the community and collaborate to produce high quality of Free Software.

Here is an organisation which is willing to make crucial contributions
to the community at its own defined speed. I would want to believe
that this is one of the larger contributions any government agency has
made to the localisation efforts. Government has its interest in the
effort and so has its own temporal goals. We need to meet those goals
and so sometimes we need to take a path which satisfies our funding
agency.

Every distribution has its own defined velocity of releases and logic of cherry picking components from upstream to integrate. Taking a path to satisfy the funding agency should not be at odds with the community at large within whose framework the work is being undertaken. If, they are at odds, it is the onus of the Program Manager for the distribution to talk with both the funding agencies and the community towards providing accurate and transparent communication.

Should a major chunk of contribution go unnoticed just because we did
not satisfy the egos of those in ‘power’?

In the realm of FOSS, contributions are not merely contributions of code or content. Contributions define the nature of the group that is contributing, and, whether they desire to learn about the civic rules into which they desire to integrate themselves. Through learning comes awareness and via awareness one transforms into a good citizen in the FOSS world.