Digital Content in Local Languages: Technology Challenges

I was reading through an article of the same name by Vasudeva Varma. Barring a whopper of a statement, the author does a reasonable job of pointing out some of the areas that needs to be worked on. To begin with however, let’s take that statement:

For example, Hindi is rendered properly only on Windows XP and beyond. Though there are some efforts to create Indic versions of the Linux, largely there is very little support for Indian languages.

It is a bit out out of context but nevertheless it is worth pointing out that one would have expected a bit more accuracy from the author. Especially because availability of Indian languages and their ease of use on Linux distributions have improved significantly. And, folks who use the Indian language Linux desktop on a regular basis for their usual workflow are somewhat unanimous that “things do work”. In fact, it would have been nicer if the author had taken the time to test out a few Linux distributions in the native language mode to identify the weak points. Most of the upstream projects do have very active native language projects with a significant quantum of participants from Indian language communities. For example, translate.fedoraproject.org, l10n.gnome.org, l10n.kde.org etc are the ones that come to mind immediately.

At a larger level, I would whole heartedly agree with the author that there exists gaps which need to be filled up. For example, with the desktop and applications getting localized, there is an urgent need to have “Cookbook” like documentation in native languages primarily for desktop applications. There is a greater need to improve existing work on the following:

  • spell checkers
  • dictionaries
  • OCR

for the various Indic languages so as to enable a more wholesome usage of desktop applications. Sadly enough, a large bulk of the work around the above three bits are still “in captivity” at the various R&D initiatives across institutes in India with not much hope of being made available under an appropriate license allowing integration into FOSS applications.

The other part of the equation are folks who create content or, collate content ie. the writers and the publishers. To a large extent, there is a dearth of large volume of local language content on the Internet. And while it could have been said that the difficulty with Linux and Indian languages was a show stopper, it isn’t really so any more. “Better search” has been a buzzword that has been around for a while, but till the time a quantification of better does happen, it isn’t impossible to get along with what is available right now. The primary barriers to input methods, display/rendering and printing have been largely overcome and, the tools that allow content to be created in Indian languages are somewhat more encoding aware than before. With projects like Firefox taking an active interest in getting things going around Indic, I would hazard a guess that things would get better.

Which brings us to the Desktop Publishing folks. I have talked about them and the need to figure out their requirements a lot of times. Suffice to state, the DTP tools need to be able to handle Indic stuff far better than they do now. And, probably we do have the work cut out there.