I have written on and off about Indic bits including how DTP is important when talking about a potential user base. The story so far is that while the basic blocks that make up the Indic experience on Linux ie. desktop UI l10n, input methods, fonts are somewhat in place, there is a large body of work that also needs to be attacked to ensure the “experience completeness” when it comes to Indic on the desktop. Unless the needs and requirements of the potential users are looked into, understood and translated into tasks nothing much can happen in terms of uptake. Or, we can howl till the heavens come down as to how l10n is important for ICT4D but without commits to SCMs nothing much is going to happen.
The saddest bit for me is that year after year I observe colossal waste of money, manpower and talent re-inventing the wheel or, doing re-search in the truest ironic sense of the term. Such ego massages include work done on pieces of the puzzle which if done in the FOSS way (in the open with a community around it) would have led to faster results and greater adoption. Of course, the most glamorous bits of the puzzle include Speech-to-Text and Text-to-Speech. Show me a re-search center (by any name) which is not doing the exact same thing as its peer and I’ll show you a R&D lead who has no intention of being part of high-powered-vacuum-tasked committees. Let’s take a few example areas where the “FOSS way” and not the “LOSS way” (read “loss” as laughably obtuse secretly shameful) would have provided real support:
Fonts: As on date, most distributions use a limited set of fonts, or, fonts arising from the same family (I hazard a guess that most of the time it is Lohit based anyway). Yet, there has been extensive work on Indic fonts which are adamantly refused to be let out under appropriate OSI compliant licenses. Who benefits from that stance ? Surely not the end-user who gets denied the chance to have a portfolio of fonts.
Spellcheckers: An indispensible part of the user experience, the current work done around aspell and hunspell is more a result of an obstinate push against roadblocks than a respect for the user. As Gora (who should blog and is determined not to) pointed out recently, there are two aspects to the spellchecking issue – [i] building a comprehensively proof read dictionary which also includes insights into language usage and common spelling errors and [ii] creating stand-alone spellchecking applications, plugins for applications like Scribus etc. But take a pragmatic stock of the current situation – what’s happening ? There’s claimed to be a bucketful of research being conducted in the various glitzy research bodies and nary a single instance where the results can be tracked-integrated-tested-tuned.
DTP: Not much remains to say about this other than what I hear about Scribus folks being aware and responsive but await contributions on the domain of Indic enablement.
OCR: Tesseract could form the way out given that some of the well known Indic OCR projects which have been GoI funded have been behind closed doors with the odd moment of data point being produced in public like some rabbit out of a hat (most magician hats are black though – something to do with closed source bits I guess).
TTS and STT: The Sarai FLOSS Fellowship for this year seems to have the promise of something coming out on the STT front. TTS kind of works with festival, Dhvani and espeak.