The Sanskrit Library | Vedic Unicode Proposal

Extending the Unicode Standard to accomodate Vedic

Unicode is a standard for encoding world scripts that in the last few years has become the minimum implementation standard universally recognized by software and hardware developers. While the Unicode Standard has included the encoding of several Indic scripts, it failed to include characters necessary for the adequate representation of Vedic, the most ancient texts of India of greatest interest to linguists and of enormous cultural importance to Hindus. The International Digital Sanskrit Library Integration project, funded by the U.S. National Science Foundation's Division of Information and Intelligent Systems under grant number 0535207, developed a successful proposal to include Vedic characters in the Unicode Standard by coordinating its activities with those of the Script Encoding Initiative at the University of California at Berkeley, with Evertype--a typesetting firm that serves as the Irish delegate to the International Standards Organization's Working Group (ISO WG2)--with the Ministry of Communications & Information Technology, Department of Information Technology, Government of India, and with the Government of India's Centre for Development of Advanced Computing (C-DAC) in Mumbai.

After undertaking a study of Indian phonetic treatises, the Sanskrit Library hosted a workshop at Brown in January 2007 to draft a Vedic character proposal to be presented to the International Organization for Standardization and the Unicode Technical Committee at meetings during the ensuing year. Significant progress in collaborating with C-DAC was achieved in meetings Peter Scharf, the director of the Sanskrit Library, held with Swaran Lata, the Government of India representative to Unicode, at the Unicode Technical Committee (UTC) meeting in Redmond, Washington in August, and with Professor R. K. Joshi, C-DAC's Vedic encoding team leader, in Versailles in October 2007, on the occasion of the First International Sanskrit Computational Linguistics Symposium. Due to close collaboration in the intervening months, the UTC recommended accepting 59 characters in our joint Vedic Unicode Proposal at its meeting in Cupertino, California 4-8 February 2008 (See N3383R = L2/08-050). At its WG2 meeting in Seattle, 21-25 April 2008, these were moved onto Amendment 6.2 of ISO/IEC 10646:2003 and slated for balloting by the International Organization for Standardization's JTC 1/SC2/ Working Group 2 (See N3456R = L2/08-176). Six additional characters that complete the set of characters required for Vedic (4 Gomukhas, Yajurvedic Kashmiri svarita, and anusvāra ubhayato mukha) were accepted by the UTC at its meeting in San Jose, 12-16 May 2008, in which Scharf participated by telephone (See L2/08-218). A single pṛṣṭhamātrā e character used as a combining character for four vowels in pṛṣṭhamātrā notation, and two additional characters (headstroke and gap filler) necessary for the proper representation of primary cultural heritage documents (manuscripts) were accepted by the UTC at its meeting in Redmond, 11-15 August 2008, in which Scharf participated by phone. The UTC approved these additional 9 characters and the US is requesting the ISO Working Group 2 add these characters to the amendment at the WG2 meeting in October 2008. A total of 68 new characters for Vedic and historical Indic are expected to become part of the Unicode Standard 5.2, tentatively slated for Fall 2009, and amendment 6 of ISO/IEC 10646:2003 (See N3488R3 = L2/08-273R3).

Vedic scholars and Indologists worldwide are hereby invited to examine the proposal and supporting documents. Comments, criticisms, arguments, and evidence are welcome. Evidence demonstrating additional characters not slated for encoding by this proposal and evidence demonstrating the significance and use of characters is especially welcome. Please e-mail your comments to the Sanskrit Library Director.

Vedic Unicode proposal

Supporting documentation

Supplementary supporting documentation