The very concept of ‘secret’ languages appears as if it is taken out of a crime novel. We may think of military secret codes, jargons by criminal inmates, or suburban youth slang. However, are not all languages (except for, let us say, standard English) in some sense ‘secret’, as long as they are spoken by a close group of people and unintelligible to outsiders? This is true in many cases: for instance, minority languages, immigrant languages, local languages or dialects, youth jargons, or ethnolects – these represent communication systems that are restricted to a closed group of speakers and not shared by outsiders. So what makes a language a ‘secret’ language? The answer is complex.
A secret language is no one’s mother tongue – this is probably the most important distinction from a ‘normal’ language. Rather, secret languages represent traditionally a jargon that was transferred from father to son, together with an occupation or a life-style, the purpose of which was not just to keep outsiders, but also members of the own family, outside. Secret languages, connected to various occupations, are found in Europe as well as in Africa and South America. They are very often the idiom of occupations with a distinct social function, most typical occupations that are excerpted within the society but which have with a special, often low, status. In Europe, pedlars, dealers, chimney sweepers or circus people, but also various types of low-status occupations, such as the executioner's henchman or skinners, used to have their own secret languages. In Africa, to mention an example, we have documented secret languages among healers, skinners, and sandal flickers. 
Linguistically, secret languages do not possess their own grammar, like ‘normal’ languages do. Their grammatical system relies on the grammar system of another language, most normally the majority language of the country where they occur. The grammar is often simplified and syntactic patterns are replaced by pidgin-like structures. A frequently occurring phenomenon is to borrow the ‘appearance’ of a language, by means of stress patterns, prosody, dialectal variation and gestures, but to switch all content words, sometimes the entire lexicon. The lexicon is either taken more or less completely from another language, or it is an ad hoc-conglomerate of words from various adjacent source languages. Very often, secret languages ‘distort’ their words by various complex patterns of morphological transformation; for instance, they truncate words and add heavy suffixes, they reverse syllables or letters, or they add epenthetic vowels within words. The result is a language that ‘melts in’ – from distance they appear as if they are a native or indigenous idiom, but not one single word is understandable to outsiders.

On Scandinavian soil, there are several traditional secret languages. One is the pedlars language, which in fact is two, one in the isolated county of Dalecarlia, gråmål ‘grey language’ or monsing, the main pedlars’ secret language, which during the 20th ct. transformed into a prisoners’ language. The vocabulary of monsing is based on multiple languages. Many words are borrowed from Scandoromani, the language of the indigenous Swedish Romani speakers, other words are from Low German, Rotwelsch, the Medieval secret jargon of European outsiders, from Finnish, Russian, as well as from Swedish. Swedish words are totally changed by linguistic distortion. Sources of monsing go back to the 17th ct. and they give us a glimpse of the type of communication that monsing speakers had. Besides communication related to their occupation, much of the content is rude, such as talk is about the farmers (who are supposed to be stupid) and in particular their wives and daughters (who are target of their sexual interest).
Even though there are no ‘real’ speakers of these languages in Sweden anymore, monsing is still, together with Scandoromani and knoparmoj, the secret language of chimney-sweepers, a very important source for words in the Scandinavian vernacular languages.

Carling, Gerd, Lenny Lindell & Gilbert Ambrazaitis (2014) Scandoromani. Remnants of a Mixed Language. Boston: Brill

Samples of the Swedish secret language Monsing (from 18th and 19th ct. sources). Most of them have found their way into Swedish slang.

All languages contain loanwords. Some languages, such as English, have more of them, other languages, such as German or Icelandic, have few, and other again, such as Swedish, have a moderate level of loans. Why is this the case? The answer is very complex and is a combination of past and present history, geography, language size and power, and language structure. As a rule, languages are affected by contact from their neighbours. No language – and no part of language – is totally “loan-proof”. Any word in a language can potentially be replaced by a word from another language. But why? Why would a language replace a word for ‘mother’, ‘finger’ or ‘sky’, when they already have a word for this object (which all languages actually have)?
There are large differences between languages. Languages with more grammar, such as German or Icelandic, are more reluctant to borrowing – the grammar system of these languages run a risk of breaking down if the influx of loans is too large. Languages with lesser grammar are more open towards borrowing.
There are large differences between words. Words for modern cultural phenomena, such as computer, tea, or latte, are loanwords in almost all languages. There are languages that are exceptions, and these are typically minor languages, which naturally do not like to be overwhelmed by foreign words: these languages run a risk of disappearing anyhow. An example is the word for ‘radio’, in which Swedish borrowed the English term radio, whereas Icelandic interpreted the word as útvarp ‘out-throw’. The word for ‘airplane’ in  Scandoromani comes out as sasster-tjirklo ‘iron-bird’, and ‘webpage’ as khereske-rigg ‘house-of-side’.
On the other side, we have words that are almost never borrowed. Here, we find kinship terms, body parts, numerals (mostly), words for basic bodily functions and sense perception terms, words for natural phenomena (‘sky’, ‘stone’, ‘ground’). Linguists are particularly fond of these words, so-called basic vocabulary: these are good for almost everything in language, from investigating human cognition to establishing language families.
When it comes to cultural words apart from the modern ones, the issue is more tricky. Words that are inherent from a Eurasian perspective, meaning that they are part of the vocabulary for the cultural system of farming and pastoralism that has been present in Eurasia since the Neolithic or the Chalcolithic, are typically borrowed in languages outside of Eurasia (due to colonization).
We wanted to look at these words and compare them inside and outside of Eurasia. We looked at 100 words from the vocabulary of farming, pastoralism, hunting, and technology, which we compiled in 160 languages of different periods (4,000 BP-now) in Eurasia (Europe, Caucasus, Central and South Asia). We selected words that represented items that had been in use at least since the Chalcolithic, such as ‘wheel’, ‘axe’, ‘cow’, ‘horse’. It turned out that words were borrowed to a degree that was relatively high in Eurasia (12%), but that this level was highly diverging between languages. Compared to languages outside of Eurasia (as concluded in the WOLD project), the percentage of loans was low.  
The most interesting result was a significant correlation between language size and tendency to lend or borrow words in our corpus. Large languages were donor languages to all other languages, including other large languages. The level of mutual borrowing between languages decreased all the way down to the smallest languages, who were very frequent as recipients of loans, but which were almost never sources of loans, not even to other small languages. This demonstrates one of the most crucial causes for borrowing: power, defined not only by population size, but also by economic, material, cultural, or political power. Occasional exceptions of reverse borrowing of specialized words – think of lingonberry, fartlek, smorgasbord or ombudsman (Swedish loans in English) remain infrequent exceptions in the larger perspective.  

Presentation by Gerd Carling & Sandra Cronhamn at SLE 2018, Tallin.

Heat map demonstrating frequency of co-occurrence as Target (x) and Source (y) language defined as language size (1-5, with 5=larges and 1=smallest), based on 1308 loan events in 160 Eurasian languages spanning over 4,000 years of documentation (graph by Johan Frid).

In Greek and Roman mythology, carefully described by Ovid, there is this a tale about the four ages, which are rendered by the name of metals: the ‘golden age’, the ‘silver age’, the ‘bronze age’, and the ‘iron age’. Of the four ages, the golden age represents the perfect state, where gods dwell among humans, all humans are equal and respect each other, food and clothes are found in abundance, and war is absent. Thereupon, each of the continuing ages represent a decay in the conditions and in the morality of humans, and during the final state, the iron age, humans are at constant war, friends and brothers kill each other, families are separated, and hunger dominates. In the end, evil rules the entire earth. This myth is paralleled in the Old Indic myth about the yuga, the world ages, which are also described in terms of metals. Even today, for instance in a competition or a race, or in the counting of wedding anniversaries, we value the metals in the order gold, silver, bronze, iron. Even though, for instance, bronze has been out of industrial use for a long time.
So, the question remains: does this reflect a historical development of metallurgy or is it simply a mirror of humans’ value of the various metals? And how is this ancient evaluation of the metals reflected in language?
Like many other things, metallurgy emerged in the area of West Asia and Anatolia 10-8,000 years ago, together with the development of agriculture. In the earliest phase, metals were used to decorate pottery, and only later, a proper industrial use of metals emerged. The earliest metal objects, which were made of copper, were various tools; in particular small knifes or daggers. Around 6-5,000 years ago, the technique of smelting copper through furnaces was found in Mesopotamia, Egypt, Anatolia, Central Europe, Caucasus, and the Steppe area, and from Balkans the new technique spread quickly over all of Europe. This marks the beginning of a new era, labelled Chalcolithic, which also implied the emergence of many other important techniques, such as the wheel, the plow, and the domestication of the horse. An important result of the metallurgy, not just in mythology, was the emergence of large-scale war parties – more like ‘wars’ in a modern sense, with organized armies and massive killings in battle grounds.
Around 4,000 before present in Anatolia, the processing of a new metal would imply a historical change in the entire area and pave the way for the industrial revolution millennia later: iron. Since this metal requires higher temperatures for smelting, a technological improvement of the furnaces and technology was necessary to enable the smelting and preparation of objects of iron. This marked the beginning of a new era, the Iron Age, where tools, ploughs and weapons were hardier, sharper, and more efficient. Again, the iron affected both agriculture and warfare, something that we should suppose lies behind the mythical interpretation of the Iron Age.
Gold and silver, which are soft and shiny metals, are worthless for industrial use. Nevertheless, they are more appreciated than copper and iron. Why? The answer is complex. Iron and copper are associated with warfare and technology, gold and silver with shiny objects and wealth. Already in the Neolithic, and increasingly so during the Chalcolithic, gold was used for ornaments on weapons and objects, typically hammered out into thin, thin layers – something that indicates a high value and exclusiveness of this metal already at an early stage. First during the early antiquity, gold and silver became what they still are (at least to the abandon of the gold standard after WWI): a standard for measuring the value of all other objects.
What does language tell us about the history of metals? Actually, relatively much. All the basic metals can more or less be reconstructed to the Indo-European proto-language, just as to Caucasian families, and the reconstructed forms, as well as the meaning changes of the metal words, give us important information about their emergence and early use.
Gold in Proto-Indo-European is derived from a root with the meaning ‘yellow, green’ as well as ‘shining’ (PIE  *ǵhelh₃-to- ‘gold’ < PIE *ǵhlh₃- ‘green, yellow’). In languages, gold words typically change to meanings ‘shining’, but also to ‘coin’, ‘wealth’, and ‘currency’.
Silver has a different story. The main word for silver in most languages, including non-Indo-European, is a migratory root with an obscure origin, *silubhr-, which spread secondary at a very early state (probably at proto-language level) and became the standard word in most Eurasian languages. Again, some languages formed roots from words meaning ‘white, shining’ (PIE *h₂erǵ- ‘brilliant, white’, which among other underlies the Tocharian A-speakers word for themselves, ārśi). This indicates that gold was an indigenous metal in Indo-European and Caucasian, whereas silver spread secondary through the early stages of proto-languages.
Copper, bronze and iron have different stories. A term for copper can be reconstructed to Proto-Indo-European (PIE *h₂éi-es- ‘metal, copper, bronze’) as well as to Caucasian proto-languages, indicating that these people were familiar with copper and likely produced copper industrially. Words for copper and bronze are completely intertwined in various languages, indicating that people of the past did not distinguish these metals by their names. Iron cannot be reconstructed to any proto-language. In languages, iron-words are derived either from a root ‘red, bloody’ (Proto-Indo-European *h₁ēsh₂r-no- ‘bloody, red’, underlying, e.g., English iron, Swedish järn), or they are initially designations for other metals, which have received the meaning ‘iron’ (Proto-Uralic *waśke ‘copper, metal’). The Proto-Finnic word for iron is a loan from Proto-Germanic, *rauðan- ‘bog iron’, which originally meant ‘red’.
The value and cultural function of the various metals emerges out of the patterns of how words for metals change their meaning. Gold and silver words change their meaning to (or colexify with) ‘wealth’, ‘currency’,  ‘coin’, ‘shining’, ‘bright’, gold with the colours ‘yellow’, ‘blonde’, and silver with ‘white’. Words for copper change their meaning to (or colexify with) with ‘tin’, ‘zink’, ‘lead’ or other metals, iron with ‘sword’, ‘weapon’, ‘tool’, ‘toughness’, but also ‘red’ and ‘blood’.

Semantic network of colexifications (blue) and coetymologizations (red) or both (purple) of Indo-European metallurgy words (green). Graph by Niklas Johansson.

Liquids are not just vital to our survival, they also form a central part of our culture. Most human gathering has the procedure of drinking as its common denominator, be it water, wine, beer, tea, or coffee. This post is about ancient drinking and words for drinking in languages (coffee and tea will be in a later blog).
The two most vital liquids to humans – as well as to mammals in general – are water and milk. Water we drink all our lives; without water we cannot survive. Milk we drink our first year; during this period, milk represents our entire need for nourishment. In many cultures, individuals continue to drink milk from cows, goats or sheep, either in the form of fresh milk or as cheese or yoghurt. In other cultures, milk is not a natural part of the diet later on in life.
Looking at the words for water and milk, they are both high-conservative words, which belong to languages’ basic vocabulary. In Indo-European, both words can be reconstructed to the proto-language, and the form has not changed much during the family’s history. The Proto-Indo-European word for water *wód-r-/wéd-n-, look similar in its earliest appearance, Hittite watar, strikingly similar to English water several millennia later. The root for milk, Proto-Indo-European *h₂melǵ-, is not very different from the form in Russian molokó, Tocharian B malkwer, or in Old Norse mjǫlk, English milk. Fresh milk as a drink is most frequent in Europe and less frequent in other parts of Eurasia, and the ability to drink milk, lactose tolerance, is a genetic mutation that goes back 6,000 years in Central Europe. The mutation is not unique to Europe, other independent epicentres are also found in Saudi Arabia and Western Africa.
 At least in Europe, there is a popular generalization about ‘drinking belts’, which sometimes are used to generalize about various peoples’ mentality, typically the ‘wine belt’ and the ‘beer belt’, often also the ‘vodka belt’ and sometimes also a ‘milk drinking zone’. Beer and wine are both very ancient and central drinks in all of Eurasia. Another important drink is mead, which is tightly connected with bee keeping. Mead has lost its importance in the last millennia, probably due to the more efficient production of beer and wine. Vodka, whiskey and other distilled drinks have a short history: they are a result of distillation, which is a relatively modern process.
Among beer and wine, beer is the most archaic drink, which appears in many lexical forms. The preparation of a toxic, fermented drink, based on cereals, was invented already by the earliest Neolithic farmers in West Asia and Anatolia 10,000 years ago. With the preparation of beer came also the practice of cultic feast; occasions where people worshipped the gods, ate, drank, sacrificed, and got (probably very) drunk. A common word for beer can be reconstructed to Indo-European *h2el-u-, but it is frequently substituted (like in English beer): likely, the production of beer was divergent and different in cultures, with many local deviations, and for this purpose, many languages substituted their beer words.
Wine has a different story. The production of wine is related to farming of the domesticated grape, a practice that began in the Caucasian region about 8-7,000 years ago. The word for wine is also the same in all languages, and it is most likely that the words spread through all languages at an early state, together with the invention of wine. Wine cannot be planted in Northern parts of Eurasia, still all languages have a word for wine. The ultimate source of the wine-root is not clear. Often, Proto-Semitic or Proto-Kartvelian are believed to be the sources of the word (PIE *woh₁i-no- ‘wine’ < PIE *weh₁-i- ‘to turn, wind’; Proto-Kartvelian *ɣwin- ‘wine’, Proto-North-West Caucasian *ωwə- ‘wine; alcoholic drink’, Proto-Dagestanian *ωun- ‘wine; one-year-old vine shoot’, also found in early Semitic languages, Old Testment Hebrew yayin, Ugaritic yn). The Indo-European root, on the other hand, is derived from a verb meaning ‘to wind’ (referring to the vine), which to some indicates an Indo-European origin. This may be a secondary adaptation in Indo-European, so we cannot be certain about the origin of the word wine.

Cognacy map of words for WINE in modern (top) and ancient (bottom) languages. One rott dominates almost the entire map.

I have not shared anything in a month, since I have been on a 'road-trip', first to Arizona for the CES conference, and then to Beijing and Changsha (Hunan Province) for a lecture series on historical and evolutionary linguistics.
In Arizona, we (with Harald Hammarström and Sandra Cronhamn) presented some results of evolutionary semantic studies on culture vocabularies of our corpus, including data from Indo-European, Caucasian families, Turkic, Uralic, Basque and ancient Semitic (book of abstracts is found here). This study has two aspects: one being the causalities of change rates, the second directionality of semantic change.
In this post, I will focus on the first aspect, causalities of change rates. As our data, we used the 100-list of cultural words of farming, pastoralism, hunting, war, technology, and industry, that we have in our database DiACL. We built an evolutionary model, where we measured gain and loss rates of 21,874 meaning tokens (6,224 types) within cognate trees, contrasted against Glottolog reference trees. After adjustment for transition frequency, 3,442 meanings remained. The gain and loss rates (given as probabilites) we tested against various metrics. We had some preliminary results, but the issue is still being researched. Previous research on lexical change rates (e.g., Pagel et al, Nature 449, Vejdemo et al, PLOS 2016 11,1) have indicated a connection to word frequency (the more frequent a word is, the lower change rates), as well as to age of acquisition, synonyms, arousal, imageability and average mutual information. However, this research has been performed on basic vocabulary only, and we expect most of these causalities to be less relevant to a vocabulary such as ours. Frequency, for instance, showed no correlation at all to our results. However, we found a negative correlation to borrowability, which is highly noteworthy: apparently, lexemes that are frequently borrowed have slower change rates. Further, we found a correlation to colexifcation tendency, as well as cognacy productivity, which is to be expected (words that change their meaning often and which are diverse in geography are expected to have high change rates). Currently, we test various semantic properties of the lexemes, and this is where the interesting part begins: it is evident that inherent properties that are said to impact gender and classifiers, such as animacy, shape, mass/count etc, have no correlation to change rates. But, cultural aspects, such as labour intensity, processability, possibility to control and change, do have an impact. I am still testing various properties and aspects, and hopefully, results can soon be made ready for submission.   

This post is related to what I am currently busy with: preparing and introductory course on Tocharian. There is a long-debated dilemma in Tocharian studies, which concern the position of Tocharian within the Indo-European language tree. Due to its status as a kentum-language, most scholars of the early 20th ct. regarded Tocharian as a western Indo-European language (together with Celtic, Germanic, Italic and so forth) rather than an eastern language. This view is not supported anymore, but the position of Tocharian still remains an enigma. Today, most scholars agree that Tocharian branched off from the Indo-European proto-language directly (and is thus not more closely related to any other branch). The disagreement of contemporary scholars is whether Tocharian branched off second, after Anatolian, and before the other Indo-European branches or not. There are several arguments in favor of the second-to-branch-off theory. One argument is the occurrence of lexical archaisms in Tocharian, meaning that a handful of etymologies have preserved a more general meaning in Tocharian, whereas the other branches show a more spezialized meaning. Examples are: Toch. AB yäp- ‘enter’, Skt. yabh-, Greek oíphō, Russ. ebu ‘have intercourse’ < PIE *yebh- ‘enter’ (LIV:309) The original meaning of the verb is preserved in Tocharian. TB kärweñe ‘stone, rock’, Skt. grāvan- ‘stone for pressing out soma’, Welsh breuan ‘handmill’, Old Ch. Slav. žrǔny ‘handmill’. TB śrān-* ‘(adult) man’ < PIE *ģerh₂-ōn, Skt. járant- ‘old, fragile’, Gr. géront- ‘geriatric’, Oss. zärond ‘old’ < PIE * ģerh₂- ‘mature, grow’ (LIV:165). The meaning ‘old’, ‘geriatric’ is an innovation of the non-Tocharian languages. The idea of lexical archaisms is not totally irrelevant; as I wrote in my previous blog, we know by statistical testing, that specialization is more frequent than generalization.
The other argument is from phylogenetics. In phylogenetic trees, Tocharian consistently branches off second, after Anatolian. Again, this argument is based on lexical data, but from a completely different angle.
What about grammar? The arguments in favor of Tocharian to be second to branch off are complicated, in particular since they are dependent on which type of system we reconstruct for Proto-Indo-European. Without going too much into detail, we have two types of reconstrucitons, one relatively simple system, more similar to Anatolian, from which the other branches developed their system, and one more complex reconstruction, more similar to Sanskrit and Classical Greek, in which Anatolian lost most of its grammar. The position of Tocharian here is not clear. It is obvious that Tocharian rearranged and rebuilt most of its nominal - and partly also verbal - system, and this complicates the picture. The Tocharian reformation of the system was partly done by morphological material which is found in the other branches, partly Anatolian but also Old Indic and Classical Greek.
The enigma waits to be solved.

Currently, at least if you are in the northern hemisphere, the darkest time of the year is approaching. This is also when we celebrate one of our most awaited festivities, which in English goes by the name Christmas. How old is this custom? It is highly likely that a festival during the darkest time of the year, the winter solstice, has a very long history, earlier than the introduction of Christianity, probably all the way back into Neolithic times, when the return of the sun was important for the preparation of the growing season. The festival has many forms in various cultures, among Jews it is represented by Chanukka, a feast of light, which is celebrated somewhat earlier than Christmas.

In some northern cultures, the winter solstice marks the beginning of the winter, in other Central European cultures, the winter period begins earlier. In Indo-European languages, winter, the cold and rainy season goes by the name of *ǵh(e)im- 'winter', also 'snow', a root that is found with the meaning 'cold season' in most languages, including Indo-Aryan. Germanic languages use another word for the cold season, *wintru-, which has two possible origins, either it is related to Latin unda 'wave', referring to 'the wet time of the year', or it is related to Gaulish vindo 'white', meaning 'the white time of the year'.

The festival that marks the winter solstice, 'Christmas' goes by different names in different languages. However, the symbols and the cultural habits show striking similarities between cultures. Important components in festivities are, besides excessive eating and drinking and giving of gifts, also the presence of death and the return of dead ancestors, equality of humans, and a celebration of light. In ancient Rome and other parts of the Mediterranean, the winter solstice festival had the name Saturnalia, which was a festival devoted to the god of the earth, Saturnus. An important component of the festival, besides excessive eating, drinking, visiting of friends and giving of gifts, was that the slaves were supposed to sit and eat in company of their masters. This is paralleled by the habit in northern cultures, where servants and houseowners were supposed to eat together in the kitchen during Christmas.

The words for 'Christmas' are different in various languages. Even though we have little information about celebrations of the winter solstice in older culture without written sources, the words may give us important indications of the purpose of the feast.

Many Germanic languages have preserved an ancient and obscure word for the feast, jul, Swedish jul, Old Swedish iūl, Icelandic jól, Danish jul, Old English, geohhol, géol, English yule, Gothic (fruma) jiuleis 'the month of Christmas'. From Proto-Norse, the word has also been borrowed into Finnish joulu, Estonian jõulud. The meaning of this word is uncertain, but there are two alternatives: either the word is derived from a root related to Old Icelandic él 'storm', referring to the time of winter storms, or it is derived form a root of Indo-European *jek- 'speak out loud', which in many languages, such as Latin iocus 'joke', has the meaning of 'joke, amusement'.
The word for Indo-European 'winter',  *wintru-, recurs in Latvian Ziemassvētki.

Another group of words relate to meanings of 'holiness', such as German Weinacht, Middle High German wīhenahten (known since the 12th century), meaning 'holy night', or the word for 'God', in Slavic languages bȏgъ, Polish Boże Narodzenie, Bosnian Božić, Croatian and Serbian Božić, Macedonian Božiḱ. Lithuanian has preserved an ancient word in their term Kalėdos, which is from the name of the pagan god Koliada, who personalizes the newborn winter.

An important set of Christmas words relate to meanings of 'new' and 'birth' or 'rebirth'. We have derivations of Latin natīvitas in Spanish Navidad, Latin nātalīs in French Noël, Portuguese Natal, Italian natale, borrowed into many languages, such as Marathi Nātāḷa, or Turkic Noel, also Irish nollaig, Welsh nadolig, Scots-Gaelic nollaig (borrowed from Latin natalicia 'nativity'). Alternatively, we have Russian rozhdestvo, Belorussian roždiestvo, derived from ród 'birth' and borrowed into, e.g., Kazakh Rojdestvo, Uzbek Rojdestvo.

Another group - to which we count the English Christmas - refers immediately to the birth of Christ: Greek Χριστούγεννα, Dutch Kerstmis, Frisian Kryst, Luxembourgish Chrëschtdag, Albanian Krishtlindje. From English, the word has been borrowed into many languages, such as Hindi krisamas, Nepali Krisamasa, Malayalam krismas, Japanese kurisimasu, Samoan Kerisimasi, Tamil Kiṟistumas, Talugu Krismas, Swahili Krismasi, Thai Khris̄t̒mās̄, Xhosa Krisimesi, and so forth.

And with this little overview of Christmas words, I would like to wish you all a Merry Christmas!

-The text has been updated 2018-12-15-


The current post is about something that I am involved in right now: the reconstruction of grammar. In comparative linguistics, grammar can be reconstructed to a proto-language on the basis of the forms and functions in daughter languages. For instance, if there is a dative case in several languages with a specific marker that can be reconstructed to the joint proto-language, and this form has the function of dative in all languages, then it is also likely the the function of this marker was a dative also in the proto-language. However, the reality is often much more complex than that. Often, the function of a marker is different in various daughter languages: in our case above, we may have genitive or ablative instead of dative, and since we don't know if a genitive is more likely to become a dative or the other way round, we cannot reconstruct a the original, proto-language function of this specific marker. The problem is known as the "correspondence problem" and is a matter of controversy in syntactic reconstruction in general (Roberts 2007) (see picture below). 
The issue is particularly prominent in the reconstruction of Proto-Indo-European syntax, where many categories of the ancient languages, such as Sanskrit, Tocharian, and Greek, are absent in Anatolian, which, on the other hand, has a high number of other categories considered to be highly archaic.

In recent years, scholars have tried to approach this problem by using evolutionary and phylogenetic methods (Marutis and Griffith 2014, Dunn et al 2014, Cathcart et al 2018). The probability of presence of a specific feature at ancestral nodes is estimated, based on gains (1 -> o) and losses (0 -> 1) of features over a reference tree (lexical or hand-crafted). As expected, the method requires some adjustment to get reliable and reproducable results. One of them is to treat grammatical properties as logically dependent (which is a very tricky and complex matter), the other one is to use ancestry and clade constraints of trees, in order to avoid unecessary noice in the results.

However, even if evolutionary and phylogenetic methods are much more sophisticated than traditional methods in terms of amounts of data and number of calculations, the principle of the programs is based on the same problem as observed in the correspondence problem. If most of the daughter languages have specific property, then it is likely that this property was there also in the proto-language. If there is a rooted outgroup with another function, then the probability of presence of this function at the proto-language state is increased.

Currently, I am working with a dataset for Indo-European, which reconstructs probabilites of grammatical features to be present at the ancestral state of Proto-Indo-European (statistics has been performed by Chundra Cathcart, University of Zurich). The results are astonishing: with very few exceptions, the program reconstructs high probabilities for grammar features that were reconstructed to Proto-Indo-European by the Neogrammarians (Brugmann & Delbrück 1893, 1897, 1900). The reconstruction of Proto-Indo-European grammar by the Neogrammarians was done before the discovery of Hittite and Tocharian, which changed the preconditions for the typological reconstruction of the proto-language grammar to a high degree. Even if Tocharian and Anatolian is there in the data, this does not change the Neogrammarian reconstruction of Proto-Indo-European grammar. I will have reason to come back to this issue in further blogposts.  

Brugmann, Karl, Delbrück, Berthold, and Delbrück, Berthold (1893), Grundriss der vergleichenden Grammatik der indogermanischen Sprachen : kurzgefasste Darstellung der Geschichte des Altindischen, Altiranischen (Avestischen u. Altpersischen), Altarmenischen, Altgriechischen, Albanesischen, Lateinischen, Oskisch-Umbrischen, Altirischen, Gotischen, Althochdeutschen, Litauischen und Altkirchenslavischen. Bd 3, Vergleichende Syntax der indogermanischen Sprachen, T. 1 (Strassburg: Trübner).
--- (1897), Grundriss der vergleichenden Grammatik der indogermanischen Sprachen : kurzgefasste Darstellung der Geschichte des Altindischen, Altiranischen (Avestischen u. Altpersischen), Altarmenischen, Altgriechischen, Albanesischen, Lateinischen, Oskisch-Umbrischen, Altirischen, Gotischen, Althochdeutschen, Litauischen und Altkirchenslavischen. Bd 4, Vergleichende Syntax der indogermanischen Sprachen, T. 2 (Strassburg: Trübner).
--- (1900), Grundriss der vergleichenden Grammatik der indogermanischen Sprachen : kurzgefasste Darstellung der Geschichte des Altindischen, Altiranischen (Avestischen u. Altpersischen), Altarmenischen, Altgriechischen, Albanesischen, Lateinischen, Oskisch-Umbrischen, Altirischen, Gotischen, Althochdeutschen, Litauischen und Altkirchenslavischen. Bd 5, Vergleichende Syntax der indogermanischen Sprachen, T. 3 (Strassburg: Trübner).
Cathcart, Chundra, et al. (2018), 'Areal pressure in grammatical evolution.', Diachronica, 35 (1), 1-34.
Dunn, Michael, et al. (2017), 'Dative Sickness: A Phylogenetic Analysis of Argument Structure Evolution in Germanic', Language: Journal of the Linguistic Society of America, 93 (1), e1-e22.
Harris, Alice C. and Campbell, Lyle (1995), Historical syntax in cross-linguistic perspective (Cambridge studies in linguistics, 0068-676X ; 74; Cambridge: Cambridge Univ. Press).
Maurits, Luke and Griffiths, Thomas L. (2014), 'Tracing the roots of syntax with Bayesian phylogenetics', Proceedings of the National Academy of Sciences, 111(37), 13576-81.
Roberts, Ian G. (2007), Diachronic syntax (Oxford textbooks in linguistics, 99-2380132-2; Oxford: Oxford University Press).

The principle of evolutionary reconstruction. Gains and losses are measured against a reference tree (lexical/hand-crafted), resulting is a probability of presence at ancestral nodes.

Representation of the correspondence problem. In the figure at the top, A is more likely than B, but in the figure below, B is more likely, despite A being more frequent. This principle is applied by evolutionary methods.

This week's blogpost will continue the thread about grammatical reconstruction, with some thoughts on lineage versus areality in grammar change. 

In general, change of grammar is supposedly cyclic (or spiralic according to some researchers): over time, typological organization of features in systems recur of are re-established. We may look at this issue both from a long-term and a short term perspective. One thing for a feature is the inherent possibility to be homologous (a simirlarity may depend on inheritance only) or homoplastic (a similarity may depend on internal or areal pressure, caused by various factors). Another thin is whether a similarity is caused by areal pressure or whether it is caused by lineage. A construction or a feature may be indicative of all of all these processes. For instance, a feature like word order is by nature homoplastic (similarities in word order may be due to areal or internal pressure, such as change in order of meaningful elements), but even then, a word order feature may be due to lineage: it has been inherited by ancestry generation after generation, or it is a critial innovation restricted to a specific sub-branch of a tree. Take for instance the verb-initial order in Celtic languages: it is likely that this feature is caused by interal pressure in the verbal paradigm (McCone 1987). Because of this, verb-initiality is a features which is restricted to the Celtic sub-branch and therefore a homologuous innovation of this specific branch, not caused by areal pressure. The feature is entirely independent of other Eurasian verb-initiality. Another example is the Germanic have-perfect. It is a homoplastic typological feature (expressing perfect by an auxiliary construction), which still uses the same cognate root as the auxiliary, the verb *haban. The process took place independently in all Germanic languages, due to parallel drift and possible areal pressure. As before, it is difficult to distinguish areality from lineage.

Very interesting is the process of Indo-European alignment change, from the proto-language to the daughter branches. It is quite evident that the reconstructed language bears morphological traces of a semantic-based system, similar to active-stative systems, as has been suggested by several scholars (Bauer 2000). But does it mean that Proto-Indo-European was an active language? Probably not. This concerns the question of stability of systems in general versus language-internal variation in tendencies to other systems. Indo-European alignment took three pathways of change, towards ergativity in the South-East, nautral marking in the West, and a preservation of the ancient system in between (roughly). What is the areal pressure component here, and what changes are dependent on internal procedures in languages, and what is the role of the residual morphology? These are questions that remain to be answered. 

McCone, Kim (1987), The early Irish verb (Kildare: Maynooth). 
Bauer, Brigitte (2000), Archaic syntax in Indo-European : the spread of transitivity in Latin and French (Trends in linguistics. Studies and monographs, 99-0115958-X ; 125; Berlin: Mouton de Gruyter).