Project maintained by LLT Hosted on GitHub Pages

Guest Post by Harry Diakoff, Alpheios Project

Thoughts on The internet as teaching and learning medium and The democraticization of the internet

One of the most powerful pedagogical resources that has recently been identified as especially appropriate and effective for language e-learning is the treebank- a corpus of texts that has been annotated to indicate the syntactic relations among the words in each sentence.

(,,, Reppen, R. (2010). Using corpora in the language classroom. Cambridge: Cambridge University Press.)

The existence of such annotated texts can be used in a variety of ways. As an annotated corpus, it is of course an essential resource for a great many kinds of linguistic research where identifying specific constructions and syntactic relations is important, synchronic structure and diachronic change, attribution and stylistics, etc, etc

But it is also a unique resource for pedagogy. While it can be used simply as an explanatory resource for a student trying to understand a sentence, it can also be used to automatically create learning and assessment tools, including ones that reflect the specific requirements of individual authors or texts as well as the current proficiency of the student.

Passages from authentic target texts can be automatically extracted and sorted by the frequency of the vocabulary or contructions in the target texts, creating a series of graded reading passages that can be automatically enriched with links into grammars and lexical resources.

Despite their great utility for both research and pedagogy, most of the most widely used treebanks have been created manually, and the expense and effort of creation has limited their use to the most popular languages.

The Perseus Project at Tufts University, the Open Philology Project at Leipzig, and the private Alpheios Project, have attempted to explore new pedagogical uses of the treebanks for the teaching of the classical languages, Latin, Greek and Arabic, but their efforts have been constrained by the absence of adequate treebanks. A private gift of more than half a million US dollars financed the creation and initial expansion of a treebank of ancient Greek in 2008, which was subsequently crowdsourced, and which currently has reached a volume of some four hundred thousand words. (

The United States National Science Foundation financed the start of a similar treebank for Latin, but their support has not extended beyond an initial 50 thousand words. Although the value of pedagogical tools using treebanks has already been demonstrated with classical languages, ( their use with Latin has been limited by the absence of any substantial Latin treebanks. Preliminary research by a group of graduate students at the University of Graz led by Gernot Hoeflechner has now suggested that contemporary technology will permit the creation of syntactic diagrams for Latin automatically at a level of about 85 percent accuracy for syntactic dependency relations. At this level of accuracy, it becomes practicable to encourage crowd-sourced correction on the model of the Greek Treebank. (

This result has been achieved with quite minimal resources and it appears that significantly better results could be achieved with only a modest increase in support. Further resources would also allow the developers to establish a workflow that integrates the output of their automatic diagramming with the variety of pedagogical tools currently available or under development at the Leipzig Open Philology Project. These tools are designed to be used by the independent learner as well as in the classroom and could be of considerable assistance to both student and teacher.