Evolutionary reconstruction of grammar and the Neogrammarians

The current post is about something that I am involved in right now: the reconstruction of grammar. In comparative linguistics, grammar can be reconstructed to a proto-language on the basis of the forms and functions in daughter languages. For instance, if there is a dative case in several languages with a specific marker that can be reconstructed to the joint proto-language, and this form has the function of dative in all languages, then it is also likely the the function of this marker was a dative also in the proto-language. However, the reality is often much more complex than that. Often, the function of a marker is different in various daughter languages: in our case above, we may have genitive or ablative instead of dative, and since we don't know if a genitive is more likely to become a dative or the other way round, we cannot reconstruct a the original, proto-language function of this specific marker. The problem is known as the "correspondence problem" and is a matter of controversy in syntactic reconstruction in general (Roberts 2007) (see picture below). 
The issue is particularly prominent in the reconstruction of Proto-Indo-European syntax, where many categories of the ancient languages, such as Sanskrit, Tocharian, and Greek, are absent in Anatolian, which, on the other hand, has a high number of other categories considered to be highly archaic.

In recent years, scholars have tried to approach this problem by using evolutionary and phylogenetic methods (Marutis and Griffith 2014, Dunn et al 2014, Cathcart et al 2018). The probability of presence of a specific feature at ancestral nodes is estimated, based on gains (1 -> o) and losses (0 -> 1) of features over a reference tree (lexical or hand-crafted). As expected, the method requires some adjustment to get reliable and reproducable results. One of them is to treat grammatical properties as logically dependent (which is a very tricky and complex matter), the other one is to use ancestry and clade constraints of trees, in order to avoid unecessary noice in the results.

However, even if evolutionary and phylogenetic methods are much more sophisticated than traditional methods in terms of amounts of data and number of calculations, the principle of the programs is based on the same problem as observed in the correspondence problem. If most of the daughter languages have specific property, then it is likely that this property was there also in the proto-language. If there is a rooted outgroup with another function, then the probability of presence of this function at the proto-language state is increased.

Currently, I am working with a dataset for Indo-European, which reconstructs probabilites of grammatical features to be present at the ancestral state of Proto-Indo-European (statistics has been performed by Chundra Cathcart, University of Zurich). The results are astonishing: with very few exceptions, the program reconstructs high probabilities for grammar features that were reconstructed to Proto-Indo-European by the Neogrammarians (Brugmann & Delbrück 1893, 1897, 1900). The reconstruction of Proto-Indo-European grammar by the Neogrammarians was done before the discovery of Hittite and Tocharian, which changed the preconditions for the typological reconstruction of the proto-language grammar to a high degree. Even if Tocharian and Anatolian is there in the data, this does not change the Neogrammarian reconstruction of Proto-Indo-European grammar. I will have reason to come back to this issue in further blogposts.  

References:
Brugmann, Karl, Delbrück, Berthold, and Delbrück, Berthold (1893), Grundriss der vergleichenden Grammatik der indogermanischen Sprachen : kurzgefasste Darstellung der Geschichte des Altindischen, Altiranischen (Avestischen u. Altpersischen), Altarmenischen, Altgriechischen, Albanesischen, Lateinischen, Oskisch-Umbrischen, Altirischen, Gotischen, Althochdeutschen, Litauischen und Altkirchenslavischen. Bd 3, Vergleichende Syntax der indogermanischen Sprachen, T. 1 (Strassburg: Trübner).
--- (1897), Grundriss der vergleichenden Grammatik der indogermanischen Sprachen : kurzgefasste Darstellung der Geschichte des Altindischen, Altiranischen (Avestischen u. Altpersischen), Altarmenischen, Altgriechischen, Albanesischen, Lateinischen, Oskisch-Umbrischen, Altirischen, Gotischen, Althochdeutschen, Litauischen und Altkirchenslavischen. Bd 4, Vergleichende Syntax der indogermanischen Sprachen, T. 2 (Strassburg: Trübner).
--- (1900), Grundriss der vergleichenden Grammatik der indogermanischen Sprachen : kurzgefasste Darstellung der Geschichte des Altindischen, Altiranischen (Avestischen u. Altpersischen), Altarmenischen, Altgriechischen, Albanesischen, Lateinischen, Oskisch-Umbrischen, Altirischen, Gotischen, Althochdeutschen, Litauischen und Altkirchenslavischen. Bd 5, Vergleichende Syntax der indogermanischen Sprachen, T. 3 (Strassburg: Trübner).
Cathcart, Chundra, et al. (2018), 'Areal pressure in grammatical evolution.', Diachronica, 35 (1), 1-34.
Dunn, Michael, et al. (2017), 'Dative Sickness: A Phylogenetic Analysis of Argument Structure Evolution in Germanic', Language: Journal of the Linguistic Society of America, 93 (1), e1-e22.
Harris, Alice C. and Campbell, Lyle (1995), Historical syntax in cross-linguistic perspective (Cambridge studies in linguistics, 0068-676X ; 74; Cambridge: Cambridge Univ. Press).
Maurits, Luke and Griffiths, Thomas L. (2014), 'Tracing the roots of syntax with Bayesian phylogenetics', Proceedings of the National Academy of Sciences, 111(37), 13576-81.
Roberts, Ian G. (2007), Diachronic syntax (Oxford textbooks in linguistics, 99-2380132-2; Oxford: Oxford University Press).


The principle of evolutionary reconstruction. Gains and losses are measured against a reference tree (lexical/hand-crafted), resulting is a probability of presence at ancestral nodes.


Representation of the correspondence problem. In the figure at the top, A is more likely than B, but in the figure below, B is more likely, despite A being more frequent. This principle is applied by evolutionary methods.