A small matter of language

Mike McGrath has an interesting post about What will Fedora be ? It has an interesting point about “Fedora will become extremely popular in non-english speaking places” in addition to “In the near future speaking English will not be a requirement to join Fedora”. Both are somewhat relevant when talking about l10n (or the ever present i18n) bits around the OS. It might be a good idea to figure out the means to arrive at how many of the users are being helped through the localisation bits that are present in the OS. Currently,

– there are no means of knowing specific numbers (smolt helps but not much)

– there are no means of figuring out how they use l10n or how bugs impact them

The last point is somewhat telling. There are not many bugzilla entries related to l10n from a usage model or even from a document consumption model. So, even though a significant number of installation, administration and deployment guides are available in the local language (at least for RHEL), there is little or nothing in terms of feedback that can improve the documents.
Sometime back I had written about users of localised operating systems. This is an area which Indic l10n might require to address to get at a seamless experience. The current method of asking for guidance on multiple lists some of which mostly end up as “change the operating system to get better experience” is not really a way forward. Rather, it is a huge step backward. For Fedora to become extremely popular in non English speaking places – we need to reach out and record the current issues that high touch users of non English systems have. The current process/workflow generally includes folks who are fluent in their mother tongue and English and generally use English as the language of choice for daily work. It is the other huge group who are perhaps fluent in one or two languages but explicitly use the mother tongue for daily work that should be the focus now.

What does it take to insist that each language in Fedora have for their language a “Fedora Tour” ?

Dell’s customer service hell

Sayamindu writes about how he’s part of the Dell Customer Service hell. Having been there and faced that, I can only have empathy for him. If anyone knows someone-who-matters at Dell, please pass on the message that “customer service representatives are not supposed to slam down calls with a curt “we will tell you when your machine is being built” irrespective of the blog.

The GSoC roundup of OPYUM

This the mandatory final post on GSoC and the project where I was a mentor. I think Debarshi would be writing in about his own experiences, but here’s an update from my side. The submitted proposal for GSoC had the following goals:

+ a process to import/export different profiles on a system and switch between them so that “YumPacks” could be created for different target systems

+ ability to create YumPacks, containing all the dependencies for installing/updating a set of packages on a particular system, which can be easily carried around in any offline media

+ ability to install/update a bunch of packages from a YumPack

+ package the program for inclusion into the Fedora repository

The current release has achieved the first two objectives – installation/updation of packages from a YumPack is performed by extracting the RPMs from the YumPack and using system-install-packages to install/update the RPMs. The package has also been submitted for review.

Coming up next are:

+ getting YumPacks to be directly recognised by system-install-packages

+ backing up /var/cache/yum to be reused by Yum

+ improvements to the UI, better error handling

So, what were the “wish we could have done” things ? Well,

– more testing : Opyum needs to be more aggressively tested across various UseCases

– issue/bug/rfe tracker: should be perhaps handled through the bugzilla if the package review is a +

– handling of non en_US locales: there’s a bug on this if you are interested in tracking

– better UI: the current UI is just functional

– localization

All summed up, it’s been good to see Opyum develop and get to the package review stage. Thanks Debarshi for the wonderful ride.

Dear Lazyweb…

Dear Lazyweb,

An old friend of mine who’s teaching a few kids about learning how to solve problems by computers called in with a query for which I had no answer. The question was “what is the alternative to Microsoft Agent in Linux or even applications on Linux ?” More on Microsoft agent is here. It will be awesome if someone could point me towards a proper answer on this one. The mail address is sankarshan dot mukhopadhyay at gmail dot com.

ps: Thanks to a wonderful person my blog now gets syndicated here too.

Could one get GOOG to do a GSoC BoF at foss.in ?

It might be a good idea if someone from GOOG decided to propose a GSoC BoF at foss.in this year. Given the large push in the media about the 50+ GSoC candidates this time around, it would be a good idea to try and get as many as possible to trade war stories, share their experiences of code push into communities and learn about how they plan to continue. Of course, have loads of fun and stories. Some of the things that can be done are at the GSoC BoF are:

+ discussion on how to ensure that the prospective GSoC applicants are aligned with their chosen organisation’s projects and roadmap

+ what are the GoodThings(TM) to do when one is a GSoC applicant/participant

+ what are the GoodThings(TM) to do when one is a GSoC mentor

+ how to get more folks to test the GSoC project during the release stages

+ how to get contributors to continue contributing to the GSoC project even beyond the tenure of that year’s GSoC

@home does not make one feel “at home”

Store in concern is @home

Incident 02:

1. Order # 6114, delivery assured for 23rd Aug 2007
2. On 23rd, delivery time confirmed at 1308 to be between 1500 and 1600
3. At 1749, the delivery time changed to 1900
4. At 1920, the delivery time changed again to 2000
5. Consignment turns up at 2020

Between steps 2 and 4 above, I get to listen to:

+ Delivery got stuck because of traffic snarl on road
+ Delivery person does not have cell phone ie means of contact
+ Delivery person and store have been trying to reach me for 45 mins
+ Delivery person has just confirmed to store about delivery in 10 mins

Contradictory statements topped by very rude post-sales handling

Incident 01:

1. Orders SOMH001002293 and SOM001002292, delivery assured for 26th November 2006 at 1230
2. Assured that time will be confirmed by 1100
3. At 1130, in response to a call delivery confirmed by 1330
4. Till 1600, no one calls up to provide delivery status
5. At 1650, the store manager assures a call back in 5 mins
6. Store manager never does call back
7. The consignment arrives post 1700

The shop is good, but the post-sales behaviour is so off-track it really does not make sense to be a repeat customer.

UPDATE: Here is how the shelf looks now. By the way, barring a really off-key sorry @home never did revert.

Notes on L10n and Language Technology recommendations

The Localisation and Language Technology Standards Recommendation for eGovernance had been put up for public review here. I did not read anyone post observations on it, so thought it would be nice to collect the small notes I have had into one place.

My notes are italicised.
It states that the existing standards and resources for Indian Language computing are not all complete. Some of the gaps are in

  • Keyboard layouts and character formations (of conjunct characters) – the way I see it is that it is actually 2 separate issues – [i] standardisation of keyboard layouts for Indic languages for conjunct characters and [ii] standardisation of character formations for conjunct characters
  • Terminologies for Indian languages (both technical and non-technical) – it would be at this point in time to make available for public download the available work on the standard/government accepted terminology. There has been a substantial amount of work completed in terms of providing acceptable localisation terms and having them available for download and usage (both commercial and non commercial) would be of great help
  • Unicode points for some Indian scripts (such as Santhali and Kashmiri) – these I believe would be required to be pushed through the Unicode Consortium process and thus would require involvement of the Ministry of Information Technology (and thus TDIL). Would be good to have the status of all such Unicode related issues that are being currently handled collated at one of the sites. Additionally, the C-DAC GIST unit out of Pune has had linguistic experience in dealing with languages that are “new” – a method to have status update on the same would go a long way.
  • Transliteration for Indian names – again it would be good to know the accepted recommendations for geographical names including standardization of their localised forms
  • A small group of experts shall be constituted for each of the 22 Official languages which will make a thorough study of the current status of all aspects of technology support (including character encoding schemes, input methods, OS and browser support, interconversion between different formats such as PDF and PostScript, search and processing etc) for the concerned language script, identify gap areas and suggest necessary action plan for bridging gaps quickly. The study may be completed within a time frame of 3 months – this is a very large chunk of a very big pie. It would be good to get this study/assessment done in [i] a transparent fashion and [ii] a way that its output can be tracked in terms of accuracy and relevance
  • A small group of experts shall be constituted for each of the 22 Official Languages which will make a thorough study of the current status of all aspects of lexical resources (including corpora, dictionaries, morphological analyzers, thesauri and wordnets, spellcheckers etc) for the concerned language/script, identify gap areas and suggest necessary action plan for bridging the gaps quickly. The study may be completed within a time frame of 3 months – again this is a very large piece and is the “may be” in the time frame indicative of the possible slippage ? For over half a decade now research institutes in the field of language technology have been tracking all these things along with trying to push the envelope of Machine Translation forward. It is possible that they already have such assessment reports in place – is it possible to make them available in public so that an inclusive process can hasten the study ?
  • A pilot study in the localisation of a selected G2C e-Governance application shall be carried out within 6 months. This will help formulate guidelines and priorities for further research and development in relevant areas – [i] 6 months from when, [ii] are the details about the application selected or what is desired in the application for the pilot available [iii] what are the acceptance criteria for the pilot
  • Local language support may reduce the language barrier to some extent but using keyboard-mouse-screen interface is still too complex and cumbersome for most people. Future lies in speech technologies. Speech technologies can be used for input, as well for output taking technology directly to the people. Emphasis may be laid on relevant R&D in this direction – is the assessment of the current work done including relevant OPEN tasks available to general public ?
  • You can buy any computer from any vendor anywhere in India and expect to be able to type in a letter, save it, print it and do all such basic operations in English without having to buy or install any specialized hardware or software or font. The case is not so with Indian scripts. Localization and specialized solutions are explicitly called for – this reads like a gross generalization of issues of defacto and dejure standards. For those who are on reasonably modern Linux distributions, the input-storage-printing-display does really not require explicit calling of specialized solutions
  • Specification for non INSCRIPT keyboard layouts should be made available by either TDIL/CDAC – the relevant part is “why” is this suggestion being made. The specification should have been available to general public for a long time and has not been made available.

Additionally, it would be nice to track how ICU is dealing with the OPEN Indic issues if any. Does anyone have pointers to that ?