This blogpost will give an overview of my popular lecture earlier this week on the role of patterns in syntax, grammar and literature for the deciphering of ancient languages (link to the lecture below, in Swedish).

My own experience on ancient language deciphering is basically restricted to Tocharian. On the other hand, Tocharian texts can be very difficult to understand, in particular if parallel text in Sanskrit, Khotanese, or Uighur (the most frequent translation languages for Tocharian) are absent.

Deciphering of ancient languages basically uses three instruments: script, language (lexicon and grammar), and literature. Reading the script is fundamental to understanding the content, and also in a phase where the content of a manuscript is known, there is often reason to go back to the manuscript and check the reading, which may open for new interpretations and renewed understanding of content of the text. In case of Tocharian, the script (North-Turkestanic Brahmi script) is relatively well known, even though there are some Tocharian B texts in cursive script that are very complex and difficult to interpret. On the other hand, almost all Tocharian texts are fragmentary in some aspects (burned, broken, etc.), which means that lacunae have to be completed and reconstructed. Parts of this reconstruction is to interpret the chacacters at manuscript edges, which may be cut or damaged. This indicates that even if the script is known, the work of a philologist still implies a substantial amound of manuscript reading.

Interpreting lexicon and grammar may imply substantial problems, if the language is not well known. In the case of Tocharian, the broken contexts, again, create large difficulties when we study syntax. Morphology is easier: paradigms can be established and reconstructed from forms found in texts, and there are few missing forms in the context of grammar forms in Tocharian. However, syntactic constructions require a larger corpus of complete sentences, and in a language such as Tocharian, there are often problems of finding enough complete sentences (that are not restored) for certain constructions, for instance in combination with a specific verb.
The lexicon has its own difficulties. In a language like Tocharian, the absence of close relatives is a problem (Tocharian descends immediatly from the Indo-European proto-language). If an unknown word is found in a text, we may assume a meaning based on the meaning of a presumed cognate in another Indo-European language. However, the connection to the presumed cognate may be a complete mistake and instead the meaning of the lexeme, as well as the etymology, is something entirely different.

This brings us over to the third category, literature. Besides script, literature is probably the most important of the instruments  mentioned at the beginning of this text. The exact meaning of words, which form the basis for a correct interpretation of a text, is highly related to the possibility of "proving" the content by a parallel or bilingual text. Most Tocharian texts are translations from Sanskrit, but besides that, Tocharian had its own literary tradition. Therefore, the exact source of a text can be difficult to trace. Some texts do not have any source texts at all. Since Tocharian, like any other literary language, is constrained by its literary tradition, the identificaiton of parallel patterns in, e.g., Sanskrit literary sources, are highly important to a proper understanding of the content and a correct translation of the lexical meanings and the syntax.

Link to a public lecture at Filosolficirkeln, Lund, about deciphering ancient languages.

I am currently travelling, so this blogpost will only very briefly discuss the topic of my current research in grammar reconstruction: the role of marking hierarchies in language change.
The notion of marking hierarchies has it roots in the markedness theory by Roman Jakobsen and implies that grammatical categories (e.g., singular - plural) typically are in a mutual, hierarchical relation, where one of the categories are morphologically unmarked, whereas the other is morphologically marked. The unmarked category thus has a higher position within a hierarchy of grammatical properties (singular < plural). These grammatical relations are, according to some authors, general, or "universal", anchored in our in-born grammatical system. However, we know that this is a problematic notion: there are a substantial amount of languages where the actual morphological marking contradict the proposed markedness hierarchies. Further, not all languages have morphology. Morphological marking alone cannot be the identifyer of marking hierarchies.
On the other hand, there is an obvious connection between the observed marking hierarchies and frequency. Superior categories, "unmarked" in the traditional markedness theory, are more frequently used in speech and in text. Again, the definion may be problematic, since not all languages have corpora that enable a detailed study of category frequency. Also, marking hierarchies based on frequency may contradict marking hierachies based on general morphological marking observations.
My current study on grammar reconstruction, which I have been writing about in several blogposts, indicate a clear correlation between change rates and marking hierarchies: superior categories, which are more frequent in grammar and most likely to be unmarked grammatically, have substantially lower change rates (and slower pace of change) than inferior categories, which have higher change rates (and faster pace of change). I will continue and follow up this topic in a coming blogpost. 

Bickel, Balthasar (2008), 'On the scope of the referential hierarchy in the typology of grammatical relations', in G. Corbett Greville and Michael Noonan (eds.), Case and Grammatical Relations. Studies in honor of Bernard Comrie (Amsterdam - Philadelphia: John Benjamins), 191-210.
Croft, William (2003), Typology and universals (Cambridge textbooks in linguistics, 99-0104661-0; Cambridge: Cambridge Univ. Press).
Comrie, Bernard (1981), Language universals and linguistic typology : syntax and morphology (Oxford: Blackwell).
Dixon, Robert M V (1994), Ergativity [Elektronisk resurs] (Cambridge: Cambridge University Press).
--- (1997), The Rise and Fall of Languages [Elektronisk resurs].
--- (2010a), Basic linguistic theory. Vol. 2, Grammatical topics (Oxford: Oxford University Press).
--- (2010b), Basic linguistic theory [Elektronisk resurs]. Vol. 2, Grammatical topics (Oxford: Oxford University Press).

I have decided to move the updating of this blog to even weekends instead of Thursdays. Thursday is very often an extremely busy day, with no time left to update or complete blogposts for publication.

In this blogpost I will continue the previous topic of principles of language change. In historical linguistics, the pricinple of the particular status of the most frequent words and grammatical forms of language is well known. The most frequent lexemes and grammatical categories are more resistant to change. Lexemes, such as kinship words, body parts, numerals, fire, water, liver, and so forth, typically preserve more archaic paradigms, that may resist change for millenia. The most frequent adverbials and particles even resist phonological erosion and change. The most frequent verbs, such as 'to be' or 'to become', are typically irregular, and archaic inflection patterns and archaic categories, such as tenses, modalities, and aspectual categories, survive in these verbal stems. On the other hand, less frequent words, such as various verbs, nouns, and adjectives, are much more frequently impacted by analogy and other types of changes that harmonize and simplify language structures, making them more easy to memorize.

However, few studies investigate this from an evolutionary perspective, using phylogenetic methods. As shown by Pagel et al (2007) there is a correlation between lexical substitution and frequency in basic vocabulary. The most frequent words have generally lower substitution rates.

Frequency is very important in explaining cross-linguistic universal patterns, among others in morphological marking hierarchies in languages. More frequent categories, such as singular (in relation to plural), agent (in relation to object), present (in relation to past), are unmarked in relation to the categories, which are marked. This theory, known as the markedness theory (which has a lot of exceptions in languages) can to a large degree be explained by frequency (Greenberg 1966, Croft 1993, 2003).

In a current study I wanted to investigate the correlation between frequency and change rates of grammar, focusing on the Indo-European family. I compiled a sample of grammatical categories of word order, nominal morphology, verbal morphology and tense and organised the properties into hierarchical pairs according to the properties of present < past, pronoun < noun, agent < object, and masculine/feminine < neuter, which are well-known, universal, hierarchial relations, observed from a large number of languages. By means of an evolutionary model (performed by Chundra Cathcart), where transititions rates between property states over a tree were were reconstructed, we extracted the average number of transitions (per 1000 years) between each grammatical property in our data. 
When the results were split up into pairs of marking hierarchy, as mentioned above, it turned out that the rates of change in the lower categories (i.e., the less frequent ones from a universal perspective), was higher. The rates of the higher categories (i.e., more frequent ones from a universal perspectives), was lower. The difference was statistically significant (p=>0.005). Even if this study is based on one family (Indo-European), 149 languages and about 100 properties only, it seems likely that frequency impacts language change also in the grammar. This explains why more frequent grammatical categories preserve more archaic patterns over time.

Text has been updated 2019-03-11