I'm interested in applying digital approaches to research in Buddhist Studies. These tools aren't intended to replace specialist knowledge or language skills, but to assist scholars in reading and analysing texts more efficiently and accurately. The tools below are just some tentative first steps towards digital Buddhology:
Learning Pali
Along with classmates at Cornell University, I've been compiling notes on Pali grammar from textbooks into an online, searchable Google Doc. This cuts down on the time students spend looking declensions and conjugations up in Pali textbooks or grammars, although it doesn't have the detail of something like Collins' Pali Grammar for Students. Note that this still has several entries still in the "to be completed" section.
PaliPal is a tool I'm developing that helps students analyse Pali words by suggesting possible case endings or declensions based on user input. It's written in python, and it's useful both to help me better understanding programming languages and Pāli grammar. This project was inspired by the parallels I saw between programming languages and the highly structured grammar of classical languages, in which single words can express incredible depth of meaning through case endings and conjunctions. PaliPal is currently on v0.3. It's a while off being useful!
PaliPal is a tool I'm developing that helps students analyse Pali words by suggesting possible case endings or declensions based on user input. It's written in python, and it's useful both to help me better understanding programming languages and Pāli grammar. This project was inspired by the parallels I saw between programming languages and the highly structured grammar of classical languages, in which single words can express incredible depth of meaning through case endings and conjunctions. PaliPal is currently on v0.3. It's a while off being useful!
Digitising Pali literature
While there are some excellent (searchable) databases of earlier canonical and commentarial texts available (such as Tipitika.org), most later literary Pali has yet to be fully digitised. At best, we have non-searchable PDF scans of printed editions; for the most part we rely on costly and bulky physical volumes. Full text digitisation is difficult because most OCR (optical character recognition, or text recognition) software isn't trained to recognised the many diacritics of Roman-script Pali, and confuses these for punctuation or other characters. Training OCR software to recognise the characters of Roman-script printed Pali will allow us to quickly convert our various PDF scans and printed editions into searchable plaintext (copyright allowing).
One immediate benefit of this conversion will be allowing us to search for text strings in and between texts (think "Ctrl+F" in Word), rather than having to devote considerable time to manually cross-reference words or phrases. It will allow us to compare texts on a far vaster scale than any human could do manually, as scholars are beginning to do for the Chinese corpus with tools like TACL. It will also allow us to apply even more powerful text-analysis methods, many of which are already used productively on literary corpi from other traditions.
One immediate benefit of this conversion will be allowing us to search for text strings in and between texts (think "Ctrl+F" in Word), rather than having to devote considerable time to manually cross-reference words or phrases. It will allow us to compare texts on a far vaster scale than any human could do manually, as scholars are beginning to do for the Chinese corpus with tools like TACL. It will also allow us to apply even more powerful text-analysis methods, many of which are already used productively on literary corpi from other traditions.