Tocharian B manuscript, c. 7th century AD
|Native to||Agni, Kucha, Turfan and Krorän|
|Extinct||9th century AD|
Tocharian, also spelled Tokharian ( or ), is an extinct branch of the Indo-European language family. It is known from manuscripts dating from the 5th to the 8th century AD, which were found in oasis cities on the northern edge of the Tarim Basin (now part of Xinjiang in northwest China) and the Lop Desert. The discovery of this language family in the early 20th century contradicted the formerly prevalent idea of an east-west division of the Indo-European language family on the centum-satem isogloss, and prompted reinvigorated study of the family. Identifying the authors with the Tokharoi people of ancient Bactria (Tokharistan), early authors called these languages "Tocharian". Although this identification is now generally considered mistaken, the name has remained.
The documents record two closely related languages, called Tocharian A (also East Tocharian, Agnean or Turfanian) and Tocharian B (West Tocharian or Kuchean). The subject matter of the texts suggests that Tocharian A was more archaic and used as a Buddhist liturgical language, while Tocharian B was more actively spoken in the entire area from Turfan in the east to Tumshuq in the west. A body of loanwords and names found in Prakrit documents from the Lop Nor basin have been dubbed Tocharian C (Kroränian). A claimed find of ten Tocharian C texts written in Kharoh? script has been discredited.
The oldest extant manuscripts in Tocharian B are now dated to the 5th or even late 4th century AD, making Tocharian a language of Late Antiquity contemporary with Gothic, Classical Armenian and Primitive Irish.
The existence of the Tocharian languages and alphabet was not even suspected until archaeological exploration of the Tarim basin by Aurel Stein in the early 20th century brought to light fragments of manuscripts in an unknown language, dating from the 6th to 8th centuries AD.
It soon became clear that these fragments were actually written in two distinct but related languages belonging to a hitherto unknown branch of Indo-European, now known as Tocharian:
Prakrit documents from 3rd-century Krorän and Niya on the southeast edge of the Tarim Basin contain loanwords and names that appear to come from a closely related language, referred to as Tocharian C.
The discovery of Tocharian upset some theories about the relations of Indo-European languages and revitalized their study. In the 19th century, it was thought that the division between centum and satem languages was a simple west-east division, with centum languages in the west. The theory was undermined in the early 20th century by the discovery of Hittite, a centum language in a relatively eastern location, and Tocharian, which was a centum language despite being the easternmost branch. The result was a new hypothesis, following the wave model of Johannes Schmidt, suggesting that the satem isogloss represents a linguistic innovation in the central part of the Proto-Indo-European home range, and the centum languages along the eastern and the western peripheries did not undergo that change.
Tocharian probably died out after 840 when the Uyghurs, expelled from Mongolia by the Kyrgyz, moved into the Tarim Basin. The theory is supported by the discovery of translations of Tocharian texts into Uyghur.
Some modern Chinese words may ultimately derive from a Tocharian or related source, eg. Old Chinese *mjit (?; ) "honey", from proto-Tocharian *t(?) (where *? is palatalized; cf. Tocharian B mit), cognate with English mead.
A colophon to a Buddhist manuscript in Old Turkish from 800 AD states that it was translated from Sanskrit via a tw?ry language. In 1907, Emil Sieg and Friedrich W. K. Müller guessed that this referred to the newly discovered language of the Turpan area. Sieg and Müller, reading this name as toxrï, connected it with the ethnonym Tócharoi (Ancient Greek: ?, Ptolemy VI, 11, 6, 2nd century AD), itself taken from Indo-Iranian (cf. Old Persian tux?ri-, Khotanese ttahv?ra, and Sanskrit tukh?ra), and proposed the name "Tocharian" (German Tocharisch). Ptolemy's Tócharoi are often associated by modern scholars with the Yuezhi of Chinese historical accounts, who founded the Kushan empire. It is now clear that these people actually spoke Bactrian, an Eastern Iranian language, rather than the language of the Tarim manuscripts, so the term "Tocharian" is considered a misnomer.
In 1938, Walter Henning found the term "four tw?ry" used in early 9th-century manuscripts in Sogdian, Middle Iranian and Uighur. He argued that it referred to the region on the northeast edge of the Tarim, including Agni and Karakhoja but not Kucha. He thus inferred that the colophon referred to the Agnean language.
Although the term tw?ry or toxrï appears to be the Old Turkic name for the Tocharians, it is not found in Tocharian texts. The apparent self-designation ?r?i appears in Tocharian A texts. Tocharian B texts use the adjective ku?iññe, derived from ku?i or ku?i, a name also known from Chinese and Turkic documents. The historian Bernard Sergent compounded these names to coin an alternative term Ar?i-Ku?i for the family, recently revised to Agni-Ku?i, but this name has not achieved widespread usage.
Tocharian is documented in manuscript fragments, mostly from the 8th century (with a few earlier ones) that were written on palm leaves, wooden tablets and Chinese paper, preserved by the extremely dry climate of the Tarim Basin. Samples of the language have been discovered at sites in Kucha and Karasahr, including many mural inscriptions.
Most of attested Tocharian was written in the Tocharian alphabet, a derivative of the Brahmi alphabetic syllabary (abugida) also referred to as North Turkestan Brahmi or slanting Brahmi. However a smaller amount was written in the Manichaean script in which Manichaean texts were recorded. It soon became apparent that a large proportion of the manuscripts were translations of known Buddhist works in Sanskrit and some of them were even bilingual, facilitating decipherment of the new language. Besides the Buddhist and Manichaean religious texts, there were also monastery correspondence and accounts, commercial documents, caravan permits, medical and magical texts, and one love poem.
Tocharian A and B are significantly different, to the point of being mutually unintelligible. A common Proto-Tocharian language must precede the attested languages by several centuries, probably dating to the late 1st millennium BC.
Tocharian A is found only in the eastern part of the Tocharian-speaking area, and all extant texts are of a religious nature. Tocharian B, however, is found throughout the range and in both religious and secular texts. As a result, it has been suggested that Tocharian A was a liturgical language, no longer spoken natively, while Tocharian B was the spoken language of the entire area. On the other hand, it is possible that the lack of a secular corpus in Tocharian A is simply an accident, due to the smaller distribution of the language and the fragmentary preservation of Tocharian texts in general.
The hypothesized relationship of Tocharian A and B as liturgical and spoken forms, respectively, is sometimes compared with the relationship between Latin and the modern Romance languages, or Classical Chinese and Mandarin. However, in both of these latter cases the liturgical language is the linguistic ancestor of the spoken language, whereas no such relationship holds between Tocharian A and B. In fact, from a phonological perspective Tocharian B is significantly more conservative than Tocharian A, and serves as the primary source for reconstructing Proto-Tocharian. Only Tocharian B preserves the following Proto-Tocharian features: stress distinctions, final vowels, diphthongs, and o vs. e distinction. In turn, the loss of final vowels in Tocharian A has led to the loss of certain Proto-Tocharian categories still found in Tocharian B, e.g. the vocative case and some of the noun, verb and adjective declensional classes.
In their declensional and conjugational endings, the two languages innovated in divergent ways, with neither clearly simpler than the other. For example, both languages show significant innovations in the present active indicative endings but in radically different ways, so that only the second-person singular ending is directly cognate between the two languages, and in most cases neither variant is directly cognate with the corresponding Proto-Indo-European (PIE) form. The agglutinative secondary case endings in the two languages likewise stem from different sources, showing parallel development of the secondary case system after the Proto-Tocharian period. Likewise, some of the verb classes show independent origins, e.g. the class II preterite, which uses reduplication in Tocharian A (possibly from the reduplicated aorist) but long PIE ? in Tocharian B (possibly from the long-vowel perfect found in Latin l?g?, f?c?, etc.).
Tocharian B shows an internal chronological development; three linguistic stages have been detected. The oldest stage is attested only in Kucha. There are also the middle ('classical'), and the late stage.
Based on 3rd-century Loulan G?ndh?r? Prakrit documents containing Tocharian loanwords such as kilme 'district', ?o?tha?ga 'tax collector', and ?ilpoga 'document', T. Burrow suggested in the 1930s the existence of a third Tocharian language, which has been labelled Tocharian C or "Kroränian", "Krorainic", or "Lolanisch".
In 2018, ten texts written in the Kharoh? alphabet from Loulan were published and analyzed in the posthumous papers of Tocharologist Klaus T. Schmidt as being written in Tocharian C. Phonetically, Tocharian C shows preservation of the Proto-Indo-European labiovelar *k? in the word okuson- "ox", compared to more divergent reflexes in B okso and A ops-. Based on morphology, Tocharian C is more closely related to Tocharian B than to Tocharian A, as shown by the secondary cases in Tocharian C are more closely related to Tocharian B than to A (e.g. ablative A -V?, B -me?, C -ma?; 3rd person singular present suffix A -?, B -?, C -?). These similarities suggest that there may have been a continuum of Tocharian dialects north of the Tarim River ranging from Tocharian B around Kucha to Tocharian C around Loulan/Kroraina. On September 15 and 16, 2019, a group of linguists led by Georges Pinault and Michaël Peyrot met in Leiden to examine Schmidt's transcriptions and the original texts, and concluded they had all been transcribed entirely incorrectly. While a full report of what languages these texts represent is not yet available, their conclusions appear to have discredited Schmidt's Tocharian C claims.
Phonetically, Tocharian languages are "centum" Indo-European languages, meaning that they merge the palatovelar consonants (*?, *?, *) of Proto Indo-European with the plain velars (*k, *g, *g?) rather than palatalizing them to affricates or sibilants. Centum languages are mostly found in western and southern Europe (Greek, Italic, Celtic, Germanic). In that sense, Tocharian (to some extent like the Greek and the Anatolian languages) seems to have been an isolate in the "satem" (i.e. palatovelar to sibilant) phonetic regions of Indo-European-speaking populations. The discovery of Tocharian contributed to doubts that Proto-Indo-European had originally split into western and eastern branches; today, the centum-satem division is not seen as a real familial division.
Tocharian A and Tocharian B have the same set of vowels, but they often do not correspond to each other. For example, the sound a did not occur in Proto-Tocharian. Tocharian B a is derived from former stressed ä or unstressed ? (reflected unchanged in Tocharian A), while Tocharian A a stems from Proto-Tocharian /?/ or /?/ (reflected as /e/ and /o/ in Tocharian B), and Tocharian A e and o stem largely from monophthongization of former diphthongs (still present in Tocharian B).
Diphthongs occur in Tocharian B only.
|Opener component is unrounded||ai /?i/||au /?u/|
|Opener component is rounded||oy /oi/|
The following table lists the reconstructed phonemes in Tocharian along with their standard transcription. Because Tocharian is written in an alphabet used originally for Sanskrit and its descendants, the transcription of the sounds is directly based on the transcription of the corresponding Sanskrit sounds. The Tocharian alphabet also has letters representing all of the remaining Sanskrit sounds, but these appear only in Sanskrit loanwords and are not thought to have had distinct pronunciations in Tocharian. There is some uncertainty as to actual pronunciation of some of the letters, particularly those representing palatalized obstruents (see below).
|Plosive||p /p/||t /t/||c /t?/?2||k /k/|
|Fricative||s /s/||? /?/||? /?/?3|
|Nasal||m /m/||n ? /n/1||ñ /?/||? /?/4|
|Approximant||y /j/||w /w/|
|Lateral approximant||l /l/||ly /?/|
Tocharian has completely re-worked the nominal declension system of Proto-Indo-European. The only cases inherited from the proto-language are nominative, genitive, accusative, and (in Tocharian B only) vocative; in Tocharian the old accusative is known as the oblique case. In addition to these primary cases, however, each Tocharian language has six cases formed by the addition of an invariant suffix to the oblique case -- although the set of six cases is not the same in each language, and the suffixes are largely non-cognate. For example, the Tocharian word yakwe (Toch B), yuk (Toch A) "horse" < PIE *e?wos is declined as follows:
|Case||Tocharian B||Tocharian A|
The Tocharian A instrumental case rarely occurs with humans.
When referring to humans, the oblique singular of most adjectives and of some nouns is marked in both varieties by an ending -(a)?, which also appears in the secondary cases. An example is e?kwe (Toch B), o?k (Toch A) "man", which belongs to the same declension as above, but has oblique singular e?kwe? (Toch B), o?ka? (Toch A), and corresponding oblique stems e?kwe?- (Toch B), o?kn- (Toch A) for the secondary cases. This is thought to stem from the generalization of n-stem adjectives as an indication of determinative semantics, seen most prominently in the weak adjective declension in the Germanic languages (where it cooccurs with definite articles and determiners), but also in Latin and Greek n-stem nouns (especially proper names) formed from adjectives, e.g. Latin Cat? (genitive Cat?nis) literally "the sly one" < catus "sly", Greek Plát?n literally "the broad-shouldered one" < platús "broad".
In contrast, the verb verbal conjugation system is quite conservative. The majority of Proto-Indo-European verbal classes and categories are represented in some manner in Tocharian, although not necessarily with the same function. Some examples: athematic and thematic present tenses, including null-, -y-, -s?-, -s-, -n- and -nH- suffixes as well as n-infixes and various laryngeal-ending stems; o-grade and possibly lengthened-grade perfects (although lacking reduplication or augment); sigmatic, reduplicated, thematic and possibly lengthened-grade aorists; optatives; imperatives; and possibly PIE subjunctives.
In addition, most PIE sets of endings are found in some form in Tocharian (although with significant innovations), including thematic and athematic endings, primary (non-past) and secondary (past) endings, active and mediopassive endings, and perfect endings. Dual endings are still found, although they are rarely attested and generally restricted to the third person. The mediopassive still reflects the distinction between primary -r and secondary -i, effaced in most Indo-European languages. Both root and suffix ablaut is still well-represented, although again with significant innovations.
Tocharian verbs are conjugated in the following categories:
A given verb belongs to one of a large number of classes, according to its conjugation. As in Sanskrit, Ancient Greek and (to a lesser extent) Latin, there are independent sets of classes in the indicative present, subjunctive, perfect, imperative, and to a limited extent optative and imperfect, and there is no general correspondence among the different sets of classes, meaning that each verb must be specified using a number of principal parts.
The most complex system is the present indicative, consisting of 12 classes, 8 thematic and 4 athematic, with distinct sets of thematic and athematic endings. The following classes occur in Tocharian B (some are missing in Tocharian A):
Palatalization of the final root consonant occurs in the 2nd singular, 3rd singular, 3rd dual and 2nd plural in thematic classes II and VIII-XII as a result of the original PIE thematic vowel e.
The subjunctive likewise has 12 classes, denoted i through xii. Most are conjugated identically to the corresponding indicative classes; indicative and subjunctive are distinguished by the fact that a verb in a given indicative class will usually belong to a different subjunctive class.
In addition, four subjunctive classes differ from the corresponding indicative classes, two "special subjunctive" classes with differing suffixes and two "varying subjunctive" classes with root ablaut reflecting the PIE perfect.
The preterite has 6 classes:
All except preterite class VI have a common set of endings that stem from the PIE perfect endings, although with significant innovations.
The imperative likewise shows 6 classes, with a unique set of endings, found only in the second person, and a prefix beginning with p-. This prefix usually reflects Proto-Tocharian *pä- but unexpected connecting vowels occasionally occur, and the prefix combines with vowel-initial and glide-initial roots in unexpected ways. The prefix is often compared with the Slavic perfective prefix po-, although the phonology is difficult to explain.
Classes i through v tend to co-occur with preterite classes I through V, although there are many exceptions. Class vi is not so much a coherent class as an "irregular" class with all verbs not fitting in other categories. The imperative classes tend to share the same suffix as the corresponding preterite (if any), but to have root vocalism that matches the vocalism of a verb's subjunctive. This includes the root ablaut of subjunctive classes i and v, which tend to co-occur with imperative class i.
The optative and imperfect have related formations. The optative is generally built by adding i onto the subjunctive stem. Tocharian B likewise forms the imperfect by adding i onto the present indicative stem, while Tocharian A has 4 separate imperfect formations: usually ? is added to the subjunctive stem, but occasionally to the indicative stem, and sometimes either ? or s is added directly onto the root. The endings differ between the two languages: Tocharian A uses present endings for the optative and preterite endings for the imperfect, while Tocharian B uses the same endings for both, which are a combination of preterite and unique endings (the latter used in the singular active).
As suggested by the above discussion, there are a large number of sets of endings. The present-tense endings come in both thematic and athematic variants, although they are related, with the thematic endings generally reflecting a theme vowel (PIE e or o) plus the athematic endings. There are different sets for the preterite classes I through V; preterite class VI; the imperative; and in Tocharian B, in the singular active of the optative and imperfect. Furthermore, each set of endings comes with both active and mediopassive forms. The mediopassive forms are quite conservative, directly reflecting the PIE variation between -r in the present and -i in the past. (Most other languages with the mediopassive have generalized one of the two.)
The present-tense endings are almost completely divergent between Tocharian A and B. The following shows the thematic endings, with their origin:
|Original PIE||Tocharian B||Tocharian A||Notes|
|PIE source||Actual form||PIE source||Actual form|
|1st sing||*-o-h?||*-o-h? + PToch -u||-?u||*-o-mi||-am||*-mi < PIE athematic present|
|2nd sing||*-e-si||*-e-th?e?||-'t||*-e-th?e||-'t||*-th?e < PIE perfect; previous consonant palatalized; Tocharian B form should be -'ta|
|3rd sing||*-e-ti||*-e-nu||-'(ä)?||*-e-se||-'?||*-nu < PIE *nu "now"; previous consonant palatalized|
|1st pl||*-o-mos?||*-o-m??||-em(o)||*-o-mes + V||-amäs|
|2nd pl||*-e-te||*-e-t?-r + V||-'cer||*-e-te||-'c||*-r < PIE mediopassive?; previous consonant palatalized|
|3rd pl||*-o-nti||*-o-nt||-e?||*-o-nti||-eñc < *-añc||*-o-nt < PIE secondary ending|
In traditional Indo-European studies, no hypothesis of a closer genealogical relationship of the Tocharian languages has been widely accepted by linguists. However, lexicostatistical and glottochronological approaches suggest the Anatolian languages, including Hittite, might be the closest relatives of Tocharian. As an example, the same Proto-Indo-European root *h?wrg(h)- (but not a common suffixed formation) can be reconstructed to underlie the words for 'wheel': Tocharian A wärkänt, Tokharian B yerkwanto and Hittite rkis.
Also arguing against equating the Tocharians with the Tocharoi is the fact that the actual language of the Tocharoi, when attested to in the second and third centuries of our era, is indubitably Iranian.
In fact, we know that the Yuezhi used Bactrian, an Iranian language written in Greek characters, as an official language. For this reason, Tocharian is a misnomer; no extant evidence suggests that the residents of the Tocharistan region of Afghanistan spoke the Tocharian language recorded in the documents found in the Kucha region.