Tag Archives: Indic

Digital Content in Local Languages: Technology Challenges

I was reading through an article of the same name by Vasudeva Varma. Barring a whopper of a statement, the author does a reasonable job of pointing out some of the areas that needs to be worked on. To begin with however, let’s take that statement:

For example, Hindi is rendered properly only on Windows XP and beyond. Though there are some efforts to create Indic versions of the Linux, largely there is very little support for Indian languages.

It is a bit out out of context but nevertheless it is worth pointing out that one would have expected a bit more accuracy from the author. Especially because availability of Indian languages and their ease of use on Linux distributions have improved significantly. And, folks who use the Indian language Linux desktop on a regular basis for their usual workflow are somewhat unanimous that “things do work”. In fact, it would have been nicer if the author had taken the time to test out a few Linux distributions in the native language mode to identify the weak points. Most of the upstream projects do have very active native language projects with a significant quantum of participants from Indian language communities. For example, translate.fedoraproject.org, l10n.gnome.org, l10n.kde.org etc are the ones that come to mind immediately.

At a larger level, I would whole heartedly agree with the author that there exists gaps which need to be filled up. For example, with the desktop and applications getting localized, there is an urgent need to have “Cookbook” like documentation in native languages primarily for desktop applications. There is a greater need to improve existing work on the following:

  • spell checkers
  • dictionaries
  • OCR

for the various Indic languages so as to enable a more wholesome usage of desktop applications. Sadly enough, a large bulk of the work around the above three bits are still “in captivity” at the various R&D initiatives across institutes in India with not much hope of being made available under an appropriate license allowing integration into FOSS applications.

The other part of the equation are folks who create content or, collate content ie. the writers and the publishers. To a large extent, there is a dearth of large volume of local language content on the Internet. And while it could have been said that the difficulty with Linux and Indian languages was a show stopper, it isn’t really so any more. “Better search” has been a buzzword that has been around for a while, but till the time a quantification of better does happen, it isn’t impossible to get along with what is available right now. The primary barriers to input methods, display/rendering and printing have been largely overcome and, the tools that allow content to be created in Indian languages are somewhat more encoding aware than before. With projects like Firefox taking an active interest in getting things going around Indic, I would hazard a guess that things would get better.

Which brings us to the Desktop Publishing folks. I have talked about them and the need to figure out their requirements a lot of times. Suffice to state, the DTP tools need to be able to handle Indic stuff far better than they do now. And, probably we do have the work cut out there.

…and here we go again

In an article on l10n and i18n published in this month’s edition of LinuxForYou (the article isn’t available online), Kenneth Gonsalves makes a statement as (italics are mine):

The vast majority of applications today are internationalised – the need of the hour is to provide translations in Indian languages. Except for some major applications, very little work is being done in this field. I don’t know whether it is because people are not aware of the need, are too lazy or they do not know how to!

This thought seems to be the new black. Adding on to the pre-existing notions of:

  • translations are very easy to do
  • translations are for hobbyists

On some days I am surprised about why such perceptions prevail. If any language team/community works on

  • a single distribution
  • two desktop environments
  • a browser
  • a mail client
  • an office suite
  • web-site content translation
  • release notes translation

then, assuming that most projects end up following a 6 month release cycle, it leaves folks with around 3+ months (on an optimistic schedule) to work with. In fact with “string freeze” (or, the time when the developers hand off the English versions to the translators) the effective window to actually translate is around 1 month. And, I have seen that for whatever little translations I have done for GNOME and OLPC.

And, the fact that schedules are tight can be seen on the mailing lists during desktop environment release times. So, if we can assume that the teams aren’t lazy and they know what they are doing, adding any more applications to be translated (and localized) would require capacity to be added. Which means that those who do go about FOSS evangelism and FOSS advocacy have to comprehend the following:

  • translation is not easy. Idiomatic English does not lend itself easily to translations and more importantly, message strings are sometimes not well constructed to be translated. For example, read this blog entry.
  • translation is not for hobbyists. It is a process of ensuring newer applications and releases are available in the local language. Thus, it means that teams working on translations improve quality of existing translations, check for consistency and still manage to work on newer releases. It is a serious business and folks take pride in a job well done.

If there are more applications that require translations/l10n, it would be a good effort to start coordinating with the language teams (via the IndLinux mailing list perhaps) rather than assuming that teams don’t know about such applications or, are lazy.

Still searching for l10n process documentation for Thunderbird

Near about the beginning of this month Simon Paquet had written to a few folks off-list about working on localization of Thunderbird.

Since the bn_IN team hasn’t had a cheery experience of working with the Mozilla team, Runa had requested that a few bits be cleared up. These relate to documentation especially around processes.

Quoting from the mail:

1. A pointed list of to-dos (preferably sequentially) that would allow even a seemingly new translator to complete most of the work without constant supervision/querying

2. An explanation of the automated processes (e.g. the scripts to create local workable copies) that would allow translators to troubleshoot

3. Segregating the inherent processes (like translation of UI, web-parts, getting started pages) and posting them in the aforementioned list of to-dos as part of the standard template.

4. Clear identification of the tools and methodologies (like tinderbox, dashboard, litmus etc.) used and mapping them to the monitoring processes to be followed by the translation group.

5. Identifying the process of querying about context of translatable strings. ( I generally ask on the mailing list, have not seen many queries in there)

I haven’t seen a response about this from either Simon or others from Mozilla. And, I don’t really find that strange.

A post of no importance

In recent times I have blogged about ‘non community based approach to l10n‘ (mail from Gora Mohanty). There is a particular mail on the gnome-i18n mailing list that provides some inputs towards formulating a plan on avoiding such repeat incidents. To quote:

CDAC is a government funded agency and takes up projects from
Government which are based on deadlines which are sometimes strict and
harsh. We work towards deadlines and are answerable to the funding
agency on things we commit. Localisation activity happens to be one of
them.

I tend to hold on to the theory that both the distributions in essence (BOSSLinux and Baishakhi Linux) should be no different that other Linux distributions who work ‘within‘ the community in harmony and collaborate to innovate. Expanding on what I already wrote about working with existing L10n communities, a means to make this possible is to have a release plan available for public view. All the major distributions have a release schedule available in public and an immediate effect of this is that it makes it possible for potential contributors and existing communities to comprehend how the pieces fit in.

Having a release schedule also makes it easy to assess how much work would be required to be put into localization of a particular language, since the components of the distribution in terms of GNOME/KDE/Xfce etc would be targeting a particular release of the desktop environments. The bits that are specific to the distribution viz. installer, configuration toolkits can then be done by the team in charge of the distribution or, the community around the distribution. Taking an example nearer to home, there is much to learn from how to work ‘in’ the community if one takes a long hard look at how Fedora operates.

The reasons that the community got lumped with a huge load of translated files was that there was a lack of communication and synchronization with the folks doing L10n and there was a lack of transparency in the infrastructure that produces the distribution. These are not insurmountable problems, but these are required to work within the community and collaborate to produce high quality of Free Software.

Here is an organisation which is willing to make crucial contributions
to the community at its own defined speed. I would want to believe
that this is one of the larger contributions any government agency has
made to the localisation efforts. Government has its interest in the
effort and so has its own temporal goals. We need to meet those goals
and so sometimes we need to take a path which satisfies our funding
agency.

Every distribution has its own defined velocity of releases and logic of cherry picking components from upstream to integrate. Taking a path to satisfy the funding agency should not be at odds with the community at large within whose framework the work is being undertaken. If, they are at odds, it is the onus of the Program Manager for the distribution to talk with both the funding agencies and the community towards providing accurate and transparent communication.

Should a major chunk of contribution go unnoticed just because we did
not satisfy the egos of those in ‘power’?

In the realm of FOSS, contributions are not merely contributions of code or content. Contributions define the nature of the group that is contributing, and, whether they desire to learn about the civic rules into which they desire to integrate themselves. Through learning comes awareness and via awareness one transforms into a good citizen in the FOSS world.