The Mainland Southeast Asia linguistic area is a sprachbund including languages of the Sino-Tibetan, Hmong-Mien (or Miao-Yao), Kra-Dai, Austronesian and Austroasiatic families spoken in an area stretching from Thailand to China. Neighbouring languages across these families, though presumed unrelated, often have similar typological features, which are believed to have spread by diffusion. James Matisoff referred to this area as the "Sinosphere", contrasted with the "Indosphere", but viewed it as a zone of mutual influence in the ancient period.
The Austroasiatic languages include Vietnamese and Khmer, as well as many other languages spoken in scattered pockets as far afield as Malaya and eastern India. Most linguists believe that Austroasiatic languages once ranged continuously across southeast Asia and that their scattered distribution today is the result of the subsequent migration of speakers of other language groups from southern China.
Chinese civilization and the Chinese language spread from their home in the North China Plain into the Yangtze valley and then into southern China during the first millennium BC and first millennium AD. Indigenous groups in these areas either became Chinese, retreated to the hill country, or migrated to the south. Thus the Kra-Dai languages, today including Thai, Lao and Shan, were originally spoken in what is now southern China, where the greatest diversity within the family is still found, and possibly as far north as the Yangtze valley. With the exception of Zhuang, most of the Kra-Dai languages still remaining in China are spoken in isolated upland areas. Similarly the Hmong-Mien languages may originally have been spoken in the middle Yangtze. Today they are scattered across isolated hill regions of southern China. Many of them migrated to southeast Asia in the 18th and 19th centuries, after the suppression of a series of revolts in Guizhou.
The upland regions of the interior of the area, as well as the plains of Burma, are home to speakers of other Sino-Tibetan languages, the Tibeto-Burman languages. The Austronesian languages, spoken across the Pacific and Indian Oceans, are represented in MSEA by the divergent Chamic group.
Mark Post (2015) observes that the Tani languages of Arunachal Pradesh, Northeast India typologically fit into the Mainland Southeast Asia linguistic area, which typically has creoloid morphosyntactic patterns, rather than with the languages of the Tibetosphere. Post (2015) also notes that Tani culture is similar to those of Mainland Southeast Asian hill tribe cultures, and is not particularly adapted to cold montane environments.
David Gil (2015) considers the Mainland Southeast Asia linguistic area to be part of the larger Mekong-Mamberamo linguistic area, which also includes languages in Indonesia west of the Mamberamo River.
A characteristic of MSEA languages is a particular syllable structure involving monosyllabic morphemes, lexical tone, a fairly large inventory of consonants, including phonemic aspiration, limited clusters at the beginning of a syllable, and plentiful vowel contrasts. Final consonants are typically highly restricted, often limited to glides and nasals or unreleased stops at the same points of articulation, with no clusters and no voice distinction. Languages in the northern part of the area generally have fewer vowel and final contrasts but more initial contrasts.
Most MSEA languages tend to have monosyllabic morphemes, but there are exceptions. Some polysyllabic morphemes exist even in Old Chinese and Vietnamese, often loanwords from other languages. A related syllable structure found in some languages, such as the Mon-Khmer languages, is the sesquisyllable (from Latin: sesqui- meaning "one and a half"), consisting of a stressed syllable with approximately the above structure, preceded by an unstressed "minor" syllable consisting only of a consonant and a neutral vowel /?/. That structure is present in many conservative Mon-Khmer languages such as Khmer (Cambodian), as well as in Burmese, and it is reconstructed for the older stages of a number of Sino-Tibetan languages.
Phonemic tone is one of the most well-known of southeast Asian language characteristics. Many of the languages in the area have strikingly similar tone systems, which appear to have developed in the same way.
The tone systems of Middle Chinese, proto-Hmong-Mien, proto-Tai and early Vietnamese all display a three-way tonal contrast in syllables lacking stop endings. In traditional analyses, syllables ending in stops have been treated as a fourth or "checked tone", because their distribution parallels that of syllables with nasal codas. Moreover, the earliest strata of loans display a regular correspondence between tonal categories in the different languages:
|Vietnamese[a]||proto-Tai||proto-Hmong-Mien||Middle Chinese||suggested origin|
|*A (ngang-huy?n)||*A||*A||? píng "level"||-|
|*B (s?c-n?ng)||*C||*B||? sh?ng "rising"||*-?|
|*C (h?i-ngã)||*B||*C||? qù "departing"||*-h < *-s|
The incidence of these tones in Chinese, Tai and Hmong-Mien words follows a similar ratio 2:1:1. Thus rhyme dictionaries such as the Qieyun divide the level tone between two volumes while covering each of the other tones in a single volume. Vietnamese has a different distribution, with tone B four times more common than tone C.
It was long believed that tone was an invariant feature of languages, suggesting that these groups must be related. However this category cut across groups of languages with shared basic vocabulary. In 1954 André-Georges Haudricourt solved this paradox by demonstrating that Vietnamese tones corresponded to certain final consonants in other (atonal) Austroasiatic languages. He thus argued that the Austroasiatic proto-language had been atonal, and that its development in Vietnamese had been conditioned by these consonants, which had subsequently disappeared, a process now known as tonogenesis. Haudricourt further proposed that tone in the other languages had a similar origin. Other scholars have since uncovered transcriptional and other evidence for these consonants in early forms of Chinese, and many linguists now believe that Old Chinese was atonal. A smaller amount of similar evidence has been found for proto-Tai. Moreover, since the realization of tone categories as pitch contours varies so widely between languages, the correspondence observed in early loans suggests that the conditioning consonants were still present at the time of borrowing.
A characteristic sound change (a phonemic split) occurred in most southeast Asian languages around 1000 AD. First, syllables with voiced initial consonants came to be pronounced with a lower pitch than those with unvoiced initials. In most of these languages, with a few exceptions such as Wu Chinese, the voicing distinction subsequently disappeared, and the pitch contour became distinctive. In tonal languages, each of the tones split into two "registers", yielding a typical pattern of six tones in unchecked syllables and two in checked ones. Pinghua and Yue Chinese, as well as neighbouring Tai languages, have further tone splits in checked syllables, while many other Chinese varieties, including Mandarin Chinese, have merged some tonal categories.
Many non-tonal languages instead developed a register split, with voiced consonants producing breathy-voiced vowels and unvoiced consonants producing normally voiced vowels. Often, the breathy-voiced vowels subsequently went through additional, complex changes (e.g. diphthongization). Examples of languages affected this way are Mon and Khmer (Cambodian). Breathy voicing has since been lost in standard Khmer, although the vowel changes triggered by it still remain.
Many of these languages have subsequently developed some voiced obstruents. The most common such sounds are /b/ and /d/ (often pronounced with some implosion), which result from former preglottalized /?b/ and /?d/, which were common phonemes in many Asian languages and which behaved like voiceless obstruents. In addition, Vietnamese developed voiced fricatives through a different process (specifically, in words consisting of two syllables, with an initial, unstressed minor syllable, the medial stop at the beginning of the stressed major syllable turned into a voiced fricative, and then the minor syllable was lost).
Most MSEA languages are of the isolating type, with mostly mono-morphemic words, no inflection and little affixation. Nouns are derived by compounding; for example, Mandarin Chinese is rich in polysyllabic words. Grammatical relations are typically signalled by word order, particles and coverbs or prepositions. Modality is expressed using sentence-final particles. The usual word order in MSEA languages is subject-verb-object. Chinese, Bai and Karen are thought to have changed to this order from the subject-object-verb order retained by most other Sino-Tibetan languages. The order of constituents within a noun phrase varies: noun-modifier order is usual in Tai and Hmongic languages, while in Chinese varieties and Mienic languages most modifiers are placed before the noun. Topic-comment organization is also common.
MSEA languages typically have well-developed systems of numeral classifiers. The Bengali language just to the west of Southeast Asia also has numerical classifiers, even though it is an Indo-European language that does not share the other MSEA features. Bengali also lacks gender, unlike most Indo-European languages.