This blogpost will briefly introduce a highly interesting phenomenon in the history of Eurasian languages, namely the emergence of definiteness. Most ancient attested Indo-European languages do not have definitess marking, but the phenomenon appears relatively early on in several languages, in various forms. The emergence of the various types of definiteness markings do not seem to be areally caused, rather, most of the variants emerge through internal pressure and grammaticalization. In addition, definiteness is not restricted to the Indo-European languages but occurs also in various forms in Caucasian families, in Turkic, as well as in some Uralic languages.
There are several types of definiteness marking, which typically co-occur in languages. One type, is to have a non-bound definite article (as a special word class), as in German or English:
das Haus
def house ‘the house’

Another type is a bound definite marker, as in Scandinavian:
house-def ‘the house’

The fundamental types of definiteness are  definiteness marked on the adjective, as in Swedish:
det stor-a hus-et
DEF large-DEF house-DEF
‘the large house’

Definite marking can be obligatory, either at the end or at the beginning of a Noun Phrase, as in Bulgarian:
xubava-ta kniga
nice-DEF book
‘the nice book’

The ancient Indo-European languages lack definiteness, and this state has been preserved in a huge area of predominately Slavic and Indo-Aryan languages. The emergence of the various forms of definiteness began - apparently independently and with large variation even within branches of the families - already in ancient times, and escalates during the medieval period. A large part of the existing variation seems to be caused by parallel evolution. Still, the exact causes for the variation remain obscure.

Bauer, Brigitte. 2007. "The definite article in Indo-European. Emergence of a new grammatical category?" In Nominal Determination. Typology, context constratis, and historical emergence, edited by Elisabeth Stark, Elisabeth Leiss and Werner Abraham, 103-139. Amsterdam-Philadelphia: John Benjamins.


Variation in definiteness marking in historical Eurasian languages. Legens see map of modern languages above.

Probability levels of different types of definiteness marking in protolanguages, based on an evolutoinary test using the data of the DiACL database.

Of the 3672 entries of the Tocharian A dictionary (Carling and Pinault to appear), 772 lemma have been marked as “from Sanskrit”, which represents 21% of the entire vocabulary (of 1508 nouns, 338 are from Sanskrit, representing 22%). Of these 772 lemma, 39 are marked as “via Middle Indic”, which represents 5% of the words borrowed from Sanskrit. Compared to Sanskrit loans, other source languages are marginal: there are 22 words marked as “from Middle Iranian”, 5 “from Chinese”, 10 “from Uighur”, 10 “from Prakrit”, and 4 “from Pali”.
What does this imply? First, and foremost, of course, that Sanskrit, or rather Buddhist Hybrid Sanskrit, plays a fundamental role in Tocharian literature. “From Sanskrit” means that a word has been borrowed from Classical Sanskrit (Monier Williams 1899) or Buddhist Hybrid Sanskrit (Edgerton 1953, Bechert, Waldschmidt, and Bongard-Levin 1996) with no other change than an adaptation to the morphological system according to the languages’ rules for adapting loans (Krause and Thomas 1960). “From Prakrit” or “from Pali” means that the word can be traced back to a source attested in Pali or Prakrit texts, which apparently is much more unusual than the other way round.
So, what type of changes are we talking about when we define words as “via Middle Indic” instead of just “from Sanskrit”? (Note that the examples below are from Tocharian A: there are also similar patterns in Tocharian B (Carling 2005)). Let us look at a couple of examples.
Some of the words are almost identical to the Sanskrit word, with little change:  A pāruṣak (n.) ‘name of a mythical garden’, via Middle Indic from Sanskrit pāruṣyaka- ‘n. of one of the groves of trāyastriṃśa gods’ (BHSD:343b), as in Pali phārusaka- ‘name of one of Indra's groves’ (PED:478b).  A kās* ‘Kāśa, a species of grass’, via Middle Indic from Sanskrit kāśa- ‘a species of grass’ (MW:280b).  
In other lexemes, there is more far-gone phonological change, which were either taken over from the Middle Indic source word, or alternatively, they took place in Tocharian. This remains unclear. Examples are: A kurkal (n.) ‘bdellium, a medical ingredient’, via Middle Indic from Sanskrit gulgulu- ‘bdellium’ (MW:360b). A klawe (n.m.) ‘die, throw of the die’, via Middle Indic from Sanskrit glaha-, originally ‘throw of the dice’, and individually ‘die’ (MW:374b). A jar (n.m.) ‘topknot’, via Middle Indic from Sanskrit jaṭā- ‘the hair twisted together (as worn by ascetics)’ (MW:408a). A tāpātriś (n.m.) ‘name of a class of gods’, via Middle Indic from Sanskrit trāya(s)-triṃśa- ‘name of a class of gods’, cf. Pali tāvatiṃsa (BHSD:257b).  A patatam (adv.) ‘fortunate, gifted’, via Middle Indic from Sanskrit pradattam, neuter adv. from pradatta- ‘granted, bestowed, gifted’ (MW:679c). A nātäk (n.m.) ‘lord’, via Middle Indic from Sanskrit nāthaka-, derived from Sanskrit nātha- ‘protector, patron, owner, lord’ (MW:534c). This vocabulary, both in Tocharian A and B (which has a larger vocabulary), is very interesting. The lexemes were apparently not borrowed from the literary standard of Prakrit and Pali or from Buddhist Hybrid Sanskrit directly. Rather, they were borrowed from one or several local Indo-Aryan dialects, which became extinct, but which may be part of a general change in Middle Indo-Aryan leading to the dialectal diversity of Modern Indo-Aryan languages.
In addition, the boundaries between Indo-Aryan and Iranian in some of these lexemes are not sharp: the words may have been borrowed from Iranian, but since Indo-Aryan is much better attested (via Classical Sanskrit), an Indo-Aryan source becomes more likely.
A systematization of sound changes in these words would likely add knowledge to the evolution of sound changes in Middle Indo-Aryan leading to Modern Indo-Aryan. This will also help us to teas apart Iranian from Indo-Aryan borrowings in Tocharian.

Bechert, Heinz, Ernst Waldschmidt, and Grigorij Maksimovic Bongard-Levin. 1996. Sanskrit-Wörterbuch der buddhistischen Texte aus den Turfan-Funden. Beih. 6, Sanskrit-Texte aus dem buddhistischen Kanon: Neuentdeckungen und Neueditionen, 3. Folge. Göttingen: Vandenhoeck und Ruprecht.
Carling, Gerd. 2005. "Carling, Gerd. Proto-Tocharian, Common Tocharian, and Tocharian – on the value of linguistic connections in a reconstructed language." In Proceedings of the Sixteenth Annual UCLA Indo-European Conference, edited by Karlene Jones-Bley, Martin E. Huld, Angela Vella Volpe and Miriam Robbins Dexter, 47-70. Washington: Institute of Man.
Carling, Gerd, and Georges-Jean Pinault. to appear. A Dictionary and Thesaurus of Tocharian A. Wiesbaden: Harrassowitz.
Edgerton, Franklin. 1953. Buddhist hybrid Sanskrit grammar and dictionary, William Dwight Whitney linguistic series: Yale U.P.; Oxford U.P.
Krause, Wolfgang, and Werner Thomas. 1960. Tocharisches Elementarbuch. B. 1, Grammatik. Heidelberg.
Monier Williams, Monier. 1899. A Sanskrit-English Dictionary : Etymologically and philologically arranged with special Reference to Cognate Indo-European Languages. Oxford: At the Clarendon Press.

During the period of the first centuries BCE, the impact of Iranian becomes important in Tocharian. This is something we know from the relatively large amounts of loanwords in Tocharian from various Iranian languages, beginning with one or several unknown Old Iranian dialects (which are not Avestan or Old Persian) and continuing with loans from various known Middle Iranian languages, such as Khotanese, Sogdian, and Bactrian. As usual with loans, the exact match of the source word is seldom found, meaning that the exact source language cannot be identified.
Iranian loans in Tocharian are interesting from the viewpoint of their semantic domains, which are indicative of the cultural impact of the Iranians on the Tocahrians in Central Asia.

A majority of the words refer to administrative concepts , e.g., titles or specific concepts of merchandise or administration, indicating that the Iranians influenced the Tocharians by imposing an administrative infrastructure. Examples are: Tocharian B waipecce 'possession', from Old lranian, Avestan xʷaēpaiθya­'own' Tocharian B waipte 'separately, apart' < Common Tocharian *wai-pätæ, borrowed probably from an adjective, Old Iranian *hwai­pati in the sense of 'independent, oneself’. Tocharian A pärko, B pärkau 'advantage, profit, interest' < Common Toch. *pärkāwV, borrowed from Old Bactrian, Bactrian φρογαοο 'profit', Old lranian *fragāwa-, Sogdian prγ'w, βry'w, Parthian frg'w 'treasure'. Tocharian A pare, B peri 'debt' < Common Tocharian *pæräī is borrowed from Old Bactrian *pāra > Bactrian paro 'debt, obligation, loan, amount, due'. Tocharian A  āpṣātrik* ‘citizen of a borough or market-town’, borrowed from Old Iranian *αβþαρο < *api-xšaθra- ‘borough, sub-district (of a city)’.  
Other words clearly refer to military concepts, such as values or terms for weapons: Tocharian B tsain 'arrow' from an Old lranian *dzaina-, Avestan zaēna- 'weapon'. Tocharian A āmāṃ B amāṃ ‘pride, arrogance’, loan from Middle Iranian, cf. Buddhist Sogdian ’’m’n ‘power’. Tocharian A āṣāṃ B aṣāṃ ‘worthy’, borrowed from Middle Iranian, cf. Khotanese āṣaṇa- ‘worthy’. Tocharian A āṣānik B āṣānike ‘venerable, worthy of respect’, loan from Middle Iranian, with same sourse as A āṣāṃ B aṣāṃ A senik ‘care, pledge, guaranteee’, from Middle Iranian *zēnik (Khot. ysīnīta, Sogd. zynyh, Kroraina Prakrit jheniya-)  
A bunch of words refer to farming and the household. Examples are: Tocharian AB ās ‘she-goat’, borrowed from Middle Iranian. Tocharian A kātak* B kattāke ‘master of the house, householder’, from Common Tocharian *kāttākǝ borrowed from Middle Iranian, cf. Khotanese ggāṭhaa, itself borrowed from Middle Indic, cf. Gāndhārī Prakrit *ghahaṭha, from Sanskrit gṛhastha-. Tocharian A miṣi B miṣṣe, miṣṣi ‘field’, borrowed from Khotanese mäṣṣa, miṣṣa ‘field for seed’.  
 A small amount of words are Buddhist terms (normally, the impact of Sanskrit is enormous on both Tocharian languages here). Examples are: Tocharian A pissaṅk ‘community of monks’, from Middle Iranian from Skt. bhikṣusaṃgha-  ‘Mönchsgemeinde, Mönchsorden’ (SWTF III:298b), cf. Khotanese bisaṃga-.  
Finally, we have a group of words referring to plants and ingrediants which are unfamiliar to the Tocharian fauna (also here, Sanskrit loans are much more common). Examples are: Tocharian A kārāś B karāśe* Via TB from Khotanese karāśśa ‘climbing plant’. Tocharian A kuñcit B kwäñcit, kuñcit, from Khotanese kuṃjsata- ‘sesame’.
In conclusion, the Iranian impact on Tocharian is mainly pre-Buddhist, referring to concepts of administration, warfare, and farming. With the change to Buddhism, the impact of Old and Middle Aryan becomes completely dominating in both Tocharian languages.
The words have been extracted from these sources:
Carling Gerd (to appear). A Dictionary and Thesaurus of Tocharian A. Complete Edition. In collaboration with Georges-Jean Pinault. Wiesbaden: Harrassowitz (610p.).
Carling, Gerd (2005). Proto-Tocharian, Common Tocharian, and Tocharian – on the value of linguistic connections in a reconstructed language. In: Jones-Bley, Karlene, Huld, Martin E., Volpe, Angela Vella,  Dexter, Miriam Robbins Proceedings of the Sixteenth Annual UCLA Indo-European Conference. Journal of Indo-European Studies. Monograph Series 50, 47-70.
These sources have many references to works by, e.g., Georges-Jean Pinault, K T Schmidt, Werner Winter, Nicholas Sims-Williams, Harold Bailey, L Isebaert, Jörundur Hilmarsson.

This week, also known as the Holy Week, is part of the holiday that in English goes by the name of Easter. Easter, which is celebrated throughout all of the Judaeo-Christian world, is one of the most important festivities of the year, marking the beginning of spring or summer and the resurrection of Christ. Like most Christian holidays, the roots of Easter go back into pagan times. In particular in Northern Europe, many of the mysterious habits of an ancient spring festival have survived until today. Children chase an unvisible easter hare, which puts candy-filled eggs in the grass. Birch twigs are compiled, taken indoors, and ornamented with painted eggs and feathers. Children also dress as witches or 'easterhubbies' (the difference is whether you wear a scarf or a hat), painting their faces with red dots, and go from door to door asking for candy. Afterwards, they are supposed to fly on their brooms to Brocken. Fires and fireworks are lit, and, most importantly, enormous quantities of egg, fish, meat, and candy are consumed.

So, which are the terms we use for this festival? Most languages have form of the Greek (via Latin) word paskha, itself borrowed from Aramaic (Hebrew Pesach), meaning 'passover'. The West Germanic terms, such as English Easter and German Ostern, go back to a Common Germanic goddess of spring, Old English Eastre, which is identical to the Indo-European goddess of dawn *h2éus-ōs (Sanskrit uṣās, Latin aurōra). Other languages have words that in various ways relate to the basically biblical rituals of Easter, including 'sacrificial animal', 'taking of the meat', 'resurrection', 'great day' or 'great night', or 'liberation'.

Just as with the Christmas words (see, the map of meanings of Easter unveil important information about various cultural spheres, as well as exceptions in the form of islands of different usage.

With this little etymological overview I would like to wish you all a Happy Easter!

Lubotsky, Alexander. Brill Online Dictionaries: Indo-European Etymological Dictionaries Online ( Accessed 2019-04-17.
Troels-Lund 1932. Dagligt liv i Norden på 1500-talet. VII Årets fester. Stockholm: Bonniers.
Andersson et al 1968. Kulturhistoriskt Lexikon för Nordisk Medeltid XIII. Malmö: Allhems förlag.

I thank Ante Petrović for assistance with compiling/checking data for the Easter map.

Wikipedia has an excellent overview of names of Easter:

In the previous blogpost, I started a compilation of safe loans from and into Tocharian. I will continue this work in the next post. In this post, I will talk about loan directionality, since I am currently completing a paper (with several co-authors) on lexical borrowability in Eurasian languages. I want to say a few word about this project.

We have compiled and extracted all loan events in the lexical database, and tested various statistical measures on this data. Worth noticing is the directionality of loans in contrast to language power as well as the differential source languages of the families. As I have described in recent posts, our data set on lexical data compiles culture concepts, i.e., words for farming, technology, hunting, and war, which have a presumed age that go at least back to the Chalcolithic. This means that this vocabulary is not representative for the entire lexicon, only these specific domains. Loans are also extended over long periods, at least back to antiquity. If we look at the source languages, we notice that they differ between families. In Indo-European, Latin is most frequent, followed by Middle Low German, French, Old French, Slavic, Classical Greek. In Caucasian, Turkic languages dominate, followed by Persian, Georgian, and Arabic. In Uralic, Scandinavian languages dominate, which is mainly due to the fact that our Fenno-Ugric languages dominate in our data (see pictures below).

The correlation between loan directionality and language power and populations size is also noteworthy. We define the power of languages by a quantitative rank based on several features, including literary power, economic power and population size. This we plot against the occurence as source and target language in loan events (see graph above). All languages are equally likely to be target languages, but the most powerful languages are more likely to be source languages. This is a significant correlation. The most frequent loan event is from a very powerful language to a very weak. The second most frequent language is from a medium powerful to a weak. The third most frequent loan is from a medium powerful to a medium powerful language. In scrutinizing the data, we observe that this type of loan event is almost entirely restricted to the middle ages, which is also an interesting result. Unequality between languages seems to be specific to the antique and modern periods, whereas language contact in the middle ages was more distributed between languages of equal power.

Graph illustrating the most frequent source languages in Indo-European (top), Caucasian (middle), and Uralic (bottom) families.

Deixis refers to pointing by using language. Deixis seems to be universal – all languages have a system for denoting at least two dimensions of deixis: ‘here’ and ‘there’. Deixis is marked either by deictic markers without person reference ‘here’, ‘there’, or deictic markers with person reference, ‘s/he /that here’, ‘s/he /that there’. Almost without exception, deictic words are accompanied by gestures.
Deictic systems are very interesting – their purpose is clearly communicative and they are deeply rooted in our cognitive system. Think of a hunting situation: a speaker wants to communicate to a companion that a game animal is hiding among the bushes. Or that a dangerous snake has been seen among the rocks.

-Where? asks the second speaker.
-Over there! answers the first speaker, pointing in the direction of the presumed hiding animal.
-Where over there? Did you really see it yourself?
-No, I am not sure … I thought I saw something...

In situations such as these, languages have found out different effective ways to standardize the communication, often by means of intricate and complex systems of deixis. But even if the preconditions for deixis is imprinted in our brains, the ways in which systems come out is highly diverse and pronounced cultural.

Deictic systems – at least the ones we are used to – typically distinguish two or three dimensions of deixis): ‘here’, ‘there’ and ‘over there’. In language, these dimensions are also mirrored in the sound structure of their words – a phenomenon that seems to be almost universal. Forms for ‘here’ are expressed by sounds that have higher frequency, e.g., vowels i, e or consonants such as s, t. In contrast, forms for ‘there’ are expressed by sound with lower frequency, such as the vowels a, o, u, and consonants m,b. This has to do with our apprehension of our surrounding environment: we associate closeness with familiarity, safety, smallness and higher voice or pitch, whereas we associate distance with unfamiliarity, threat, large size, and a lower voice or pitch. To fully understand this phenomenon, think about the sound of a cat versus the growl of a tiger. Which one do we want to have closer? This opposition between high and low frequency in here- and there-forms is stable in languages. If is becomes distorted by change in the sound structure, the opposition becomes restored within generations.

Besides forms for the basic deictic distinctions, some languages have expanded their deictic systems in various directions, introducing a large amount of additional information.
A system such as this is found in Kamaiurá, a Tupí-Guaraní language spoken in Upper Xingu, Brazil. Kamaiurá is a prototypical Amazonian language: they have mother-in-law language, evidentiality (linguistic ‘truth-marking’), male versus female speech, and nominal tense. In the system of deictic terms, there are four basic dimensions of deixis, ‘s/he /that here’ (close to speaker), ‘s/he /that there’ (close to listener), or ‘s/he /that over there’ (away from both speaker and listener), and ‘s/he /that over there’ (far away from both speaker and listener). Besides these four basic dimensions, there is a large set of forms, in total around 20. In normal speech, such as when someone tells a story or reports an event, these deictic forms are highly important: they communicate a number of dimensions of an event: the time, the place, the role of the speaker, what may come next, or what the speaker or the participants know or don’t know, as well as modalities, feelings, and so forth.
One deictic form denotes ‘s/he/ that, close to speaker but invisible’ – a form used for instance about someone talking inside a house, who is heard through the door. Another form is used to mark that the referent is more or less close, heard but not seen, and again, another form marks that the referent is over there, neither heard nor seen, and the speaker is uncertain about its status – the source is secondary, ‘hearsay’. There is also a that form refers to ‘that guy I don’t know the name of’ or ‘the guy I don’t remember’, and one form notes that someone is close but not visible: this is used for instance when talking about an absent son. Further, forms may denote that someone is moving away or is located close to something else of importance. The system is impossible to learn for an outsider: since the use of the forms are consolidated in each and every situation the language is used, only native speakers can learn to master the system in full.
References: (Carling et al. 2017; Diessel 2011; Johansson and Carling 2015)
Coming up next: The Tocharians, the mysterious people who travelled more than 4000 km and ended up in a desert
Carling, Gerd, et al. (2017), 'Deixis in narrative: a study of Kamaiurá, a Tupí- Guaraní language of Upper Xingu, Brazil', Revista Brasilieira de Linguística Antropológica, 9 (1), 13-48.
Diessel, Holger (2011), 'Deixis Demonstratives', in Claudia Maienborn, Klaus von Heusinger, and Paul Portner (eds.), Semantics: An International Handbook of Natural Language Meaning (Handbooks of Linguistics and Communication Science (Handbooks of Linguistics and Communication Science): 33 (1-3); Berlin, Germany: de Gruyter Mouton), 2407-32.
Johansson, Niklas and Carling, Gerd (2015), 'The de-iconization and rebuilding of iconicity in spatial deixis', Acta Linguistica Hafniensia: International Journal of Linguistics, 47 (1), 4.

The Takla Makan desert in Western China is in the middle of nowhere. Being there feels more like having landed on a deserted Tatooine than on earth; most villages are very scarcely populated and sand rocks, red desert sand, and dried salt rivers outdo the surroundings. The climate is horrible: winters are freezing, summers extremely hot and dry; springs and autumns are endurable, but temperatures between day and night often differs by 30̊ C. In a village called Subashi I met a villager, who had used 20 years to dig a well (by hand, I assume, considering the many years he had spent on the project). The well was obviously very deep, but it contained no water.
Nevertheless, French and German expeditions 100 years ago found the remnants of an Indo-European language in the sand-filled grottoes of this desert. The language, which was wrongly labelled ‘Tocharian’, after an Iranian tribe mentioned by the ancient Greeks, turned out to represent a branch of its own on the large Indo-European tree. In recent years, research has revealed new and interesting knowledge of this mysterious people, how they lived, where they came from, and what their language looked like.
During the first millennium ACE, the Tocharian civilization flourished along the Silk Road. By that time, Tocharian had split into two languages, which for the sake of simplicity are labelled Tocharian A and Tocharian B. The Tocharian culture was in important aspects not very different from other early Eastern medieval civilizations: they possessed a warrior class, a nobility, royals, farmers, and a religious class of monks, which lived from welfare in the form of alms by the working population. The Tocharians were Buddhists and learned to write by Buddhist missionaries from India, and the system they used to write their language was an adaptation of the Indic Brahmi script. Accordingly, most texts, which date between 300-1100 ACE, are of Buddhist content. A large part of the literary sources represent Tocharian adaptations of the Indian Buddhist canon – parallels in Sanskrit cannot always be found. After the Islamic conquest of Central Asia and the closing of the Silk Road, the Tocharian kingdoms collapsed, the Tocharian language died out, the area was depopulated, and the desert sand quickly buried all traces of the Tocharian people and their language.   
Even though out texts in Tocharian are of a relatively late date, at least compared to the ancient civilizations of the Mediterranean or the Fertile Crescent, archeology, archaeogenetics and - most of all – language give us rich information about the prehistory of the Tocharians.
It is evident that the Tocharians left the Indo-European homeland very early and migrated towards the East. Even though Tocharian is a centum language and actually has more similarities with western than eastern Indo-European languages, it clearly forms its own branch on the Indo-European tree. The long absence from the Indo-European proto-language, together with a long period in isolation from other Indo-European languages, has resulted in two languages with very weird and complex structures. The languages have many case forms, like Uralic and Caucasian languages, and they have double causatives, like Turkic languages. But even though the Tocharian categories clearly show non-Indo-European impact in the typological structure, the inflectional forms themselves are all of Indo-European descent: the setup of verbs easily matches Greek or Sanskrit in its complexity and variety of forms. Most forms and categories reconstructed to Indo-European are there, but often in a reorganized structure and with changed use and meaning.
Even though most preserved texts are of Buddhist context, the language and the specific Tocharian version of Buddism shows many traces of a pre-Buddhist, pagan faith, not very different from what we assume was present in early Indo-European. We have a sun-god and a moon-god, as well as remnants of the so-called heroic myths and the concept of ‘eternal glory’, which is well represented in epic tales such as the Iliad, the Odyssey, or the Mahabharata.  
Tocharians borrowed words from the Turkic Uighur language, from Chinese, and from Sanskrit; the latter in large amounts – almost half of the Tocharian lexicon has its source in Sanskrit. Uighur also borrowed from Tocharian. However, if we move back in time, Tocharian also borrowed a substantial amount of vocabulary, often administrative terms, from Iranian. In the period between 500 BCE and onwards, Tocharian seemed basically to be a recipient language, something that indicates that Tocharian during this period was a less important regional language than, for instance, Chinese (in the East) or various Iranian languages (in the West). If we look earlier than that, we find interesting and striking language contacts of Tocharian. Early forms of Tocharian are found in Uralic languages, and very likely, a pre-form of Tocharian is responsible for the Indo-European borrowings into Early Chinese. Therefore we may assume that Tocharians had a more important cultural role in the archaic period than in the antique period, when they basically were target of language borrowing.

Archaeological track record in the Tocharian-speaking area is astonishingly rich: most famous are the well-preserved mummies, which look like Celts with their pointy hats, tattoos and red braids. Studies of their DNA indicate several origins, in the earlier layers mainly Western European haplogroups, in later layers preferably Central Asian or Eastern haplogroups. The patrilinear DNA is mainly R1a1, a haplogroup associated with the Proto-Indo-European migration out of Eastern Europe.
However, there are many enigmas that still look for a solution. One of the most complex issues is the large amount of obscure lexemes in Tocharian. Even though the core vocabulary of Tocharian is completely Indo-European, most words of the lexicon (except for the many Sanskrit borrowings, of course) have either no etymology or a very uncertain etymology. It is possible that the Tocharians borrowed words from a long-lost substrate language – but what would that be? There are few traces of significantly different cultures in the area, preceding the Tocharians. Alternatively, Tocharian picked up words from several extinct, unrelated languages of Eurasia on their way from Eastern Europe to the Takla Makan desert. Very few, reliable etymologies in Tocharian can be sourced in any of the living language families of Asia.  
Coming up next: Heroic, lethal, or filthy animal? The history of pig words
References: (Adams 2013; Carling 2005; Carling et al. 2009; Mallory and Mair 2000; Malzahn 2011-2018; Pinault 2008)
Adams, Douglas Q. (2013), Dictionary of Tocharian B. : Revised and Greatly Enlarged. (Amsterdam: Rodopi).
Carling, Gerd (2005), 'Carling, Gerd. Proto-Tocharian, Common Tocharian, and Tocharian – on the value of linguistic connections in a reconstructed language', in Karlene Jones-Bley, et al. (eds.), Proceedings of the Sixteenth Annual UCLA Indo-European Conference (Journal of Indo-European Studies - Monograph Series; Washington: Institute of Man), 47-70.
Carling, Gerd, Pinault, Georges-Jean, and Winter, Werner (2009), Dictionary and thesaurus of Tocharian A (Wiesbaden: Otto Harrassowitz).
Mallory, J. P. and Mair, Victor H. (2000), The Tarim mummies : ancient China and the mystery of the earliest peoples from the West (London: Thames & Hudson).
Malzahn, Melanie (2011-2018), CEToM - A Comprehensive Edition of Tocharian Manuscripts.
Pinault, Georges-Jean (2008), Chrestomathie tokharienne : textes et grammaire (Leuven: Peeters).

In Scandinavian folklore, there is a story about a lethal pig, the Gloso (‘glowing sow’), which kills lonely hikers on their way home at Christmas Eve. The pig is black with glowing, red eyes, and its back is a sharp saw: running between humans’ legs, the creature cuts humans in two parts. The only way to survive a Gloso is to jump into the ditch as soon as you spot the animal’s glowing eyes from a distance. Stories about lethal pigs are also found in Celtic mythology, in the tale about Mag Mucrime, pigs from the underworld, which haunt and ravage the lands, killing people and destroying the fields. However, the ancient Celts were also very fond of their pigs. Typically, helmets and shields of Celtic warriors were decorated by boars – and we should not forget that a pig is in the centre in one of the most important Celtic epic tales, the story of Mac Da Thó’s pig. Likewise, in Germanic mythology, the pig is the animal of the god of fertility, Frey, and the boar Sæhrímnir, which can be eaten again and again, plays a central role as provider of meat to the dead warriors and the gods od Valhalla.
How come that the most important protein source to ancient Neolithic farmers had such different roles in various cultures of the Eurasian continent? Banned in some cultures, worshipped in others, and in other associated with death and the netherworld – apparently, the pig did not stay neutral to ancient people. Our answers are partly found in language.

Like the cow, goat and sheep, the pig belongs to the earliest domesticated animals, dating back to 10-11,000 BP in Anatolia and West Asia. Very likely, the first pigs were wild pigs attracted to human settlements by the waste. The early farmers, who very quickly must have understood the value of pigs as a protein source, successively domesticated them by killing the males and keeping the females for reproduction. In fact, even today, pigs are the most effective protein source of farming, besides chicken. The great danger associated with the hunting of wild boars must have contributed to the early farmers’ high esteem of pig domestication.
Domestication of pigs spread with the spread of farming, but for some reason – maybe that pigs are useless for herding or that they are easily infected by sickness – the domestication did not reach as far as the domestication of cow, goat, and sheep. Pigs are extremely unusual in Ancient Egypt, and pig domestication never reached Central Asia. In parts of West Asia and Anatolia, there was a decline in pig domestication already in early antiquity, something that was later transformed into a complete ban though religion, as in Judaism and later on also in Islam. In cultures where pig domestication was continued (Eastern and Western Europe, and the Mediterranean), the pig received a dual role in cultures: it was both an animal associated with death and the underworld, worshipped in chthonic sacrifices, as well as an animal symbolizing fertility and prosperity. This is found both in Graeco-Roman, Celtic, and Germanic mythology.

Can linguistics help us solving this enigma? There are several ways of investigating cultural patterns by language: either to look at the origin of words and their etymology down to the proto-language, or to consider the colexification patterns (meanings that co-occur in a language) and the meaning change patterns of words in genetically related languages. Stability and spread of cognates, as well as borrowing tendencies are important methods as well.

If we look at linguistic reconstructions, the picture is complex and interesting. Pig words, including a general word for ‘pig’ (generic), which is often the same as ‘sow’, as well as ‘piglet’, can be reconstructed to Proto-Indo-European (PIE *suH- ‘pig’, PIE *porḱo- ‘young pig, piglet’). These lexical roots, which had the meaning of ‘pig’ and ‘piglet’ already in the proto-language, indicate the Indo-Europeans had domesticated pigs. They are represented in a vast majority of pig words in Indo-European languages. Besides, some sub-branches replaced the forms or added new words for the pig terms. In Germanic languages, the male pig was derived from a root meaning ‘infertile’ (PGm *galtan- ‘boar’ < *gald(j)a- ‘infertile’ < PIE *ghol-tó-), indicating that male pigs or boars were gelded rather than killed. Several languages created new lexemes by referring to the grunting sound of pigs, such as Lithuanian čiūkà, kūkà ‘pig’ (Balto-Slavic *kyaw-, *kyū- < PIE *kew-, *kū- 'to howl') or Old and Modern Irish cráin ‘sow’ (Proto-Celtic *krākni- 'sow'). Some languages used the wide-spread Indo-European root for ‘young of animal’ (PIE *wetso- 'young of animal' < *wet- 'year').
The wild boar has its own root in Proto-Indo-European (PIE *h₁pr-o- '(wild) boar'), e.g., Latin aper, but very often, this root comes to represent both the wild and domesticated male pig, such as Croatian vȅpar, German Eber. Several languages use the Proto-Indo-European root PIE *h₂wŕ̥s-en- 'male' for the wild boar, such as Sanskrit varāha-, Hindi varāh, bā̆rāh. Else, a combination of a root meaning ‘wild’ and the root PIE *suH- 'pig' is very frequent, as in Bulgarian díva svinjá, German Wildschwein. In general, words with the meaning ‘wild boar’ also frequently mean ‘(domesticated) boar’, something that indicates that the wild boar was represented by the (male) boar, in contrast to the (female) sow, which represented the domesticated pig.     
Caucasian proto-languages, Proto-Kartvelian, Proto-North-West-Caucasian, Proto-Nakh, and Proto-Dagestanian all have reconstructed words with the meaning ‘pig’ (PKv *ɣor- ‘pig’, PNWC *ɣaw- ‘pig, piglet’, PN *eɣ-ə ‘pig’; PD *bol’- ‘pig’, PKv *burw- ‘gilt (female pig, 3-12 months old); suckling pig’, PNWC *bl˜’-ə ‘sow, female pig’, PN *borl’- ‘colourful’). This clearly points out that the early Caucasians domesticated the pig, something that we know they did early on.
Uralic, on the other hand, borrowed their pig words from Indo-European or Iranian (Proto-Finnic *sika ‘pig’, Proto-Finno-Ugric *porśas, *porćas, loan from Indo-Iranian), indicating that the early Uralic tribes did not domesticate the pig – they adapted pig domestication from Indo-European tribes.

However, the patterns of meaning change and colexification of pig words give an interesting picture. First, pig words often change to the meaning of other animals, often large and ‘chubby’ animals, such as ‘elephant’, ‘stallion’, or ‘camel’. In particular, this is the case in Caucasian languages. The domestic pig occasionally points in the direction of negative connotations, such as ‘filthy person’, ‘immoral person’, ‘fat’, or ‘greedy’. However, meaning changes and colexifications in the direction of power and fertility are frequent, such as 'bull', ‘hero’, ‘powerful’, 'king', ‘manly’, ‘chieftain’ and ‘husband’, in particular with the (wild) boar.

It is obvious that ancient people both worshipped and admired their pigs, but language indicates that they most of all respected the wild boars, probably because they were dangerous and hard to hunt. The domestic pigs were highly evaluated but also, apparently, looked down upon. The dangerous pigs we know from mythology have not given much imprint on language.
(Carling To appear (2019); Gamkrelidze et al. 1995; Larson and Fuller 2014; Mallory and Adams 1997)
Carling, Gerd (To appear (2019)), Mouton Atlas of Languages and Cultures. Vol. 1: Europe, Caucasus, Western and Southern Asia (Berlin - New York: Mouton de Gruyter).
Gamkrelidze, Tamaz Valerianovič, Ivanov, Vjačeslav Vsevolodovič, and Winter, Werner (1995), Indo-European and the Indo-Europeans : a reconstruction and historical analysis of a proto-language and a proto-culture (Trends in linguistics. Studies and monographs, 99-0115958-X ; 80; Berlin: Mouton de Gruyter).
Larson, Greger and Fuller, Dorian Q. (2014), 'The Evolution of Animal Domestication', Annual Review of Ecology, Evolution & Systematics, 45, 115-36.
Mallory, James P. and Adams, Douglas Q. (1997), Encyclopedia of Indo-European culture (London: Fitzroy Dearborn).

Hyllested, Adam 2017. Again on ancient pigs in Europe.

Semantic network of colexifications (blue, purple) and meaning change in etymologies (red) of the core concepts (green) PIG, WILD BOAR, and PIGLET in 85 Indo-European languages. Graph by Niklas Johansson.

I have decided to move the updating of this blog to even weekends instead of Thursdays. Thursday is very often an extremely busy day, with no time left to update or complete blogposts for publication.

In this blogpost I will continue the previous topic of principles of language change. In historical linguistics, the pricinple of the particular status of the most frequent words and grammatical forms of language is well known. The most frequent lexemes and grammatical categories are more resistant to change. Lexemes, such as kinship words, body parts, numerals, fire, water, liver, and so forth, typically preserve more archaic paradigms, that may resist change for millenia. The most frequent adverbials and particles even resist phonological erosion and change. The most frequent verbs, such as 'to be' or 'to become', are typically irregular, and archaic inflection patterns and archaic categories, such as tenses, modalities, and aspectual categories, survive in these verbal stems. On the other hand, less frequent words, such as various verbs, nouns, and adjectives, are much more frequently impacted by analogy and other types of changes that harmonize and simplify language structures, making them more easy to memorize.

However, few studies investigate this from an evolutionary perspective, using phylogenetic methods. As shown by Pagel et al (2007) there is a correlation between lexical substitution and frequency in basic vocabulary. The most frequent words have generally lower substitution rates.

Frequency is very important in explaining cross-linguistic universal patterns, among others in morphological marking hierarchies in languages. More frequent categories, such as singular (in relation to plural), agent (in relation to object), present (in relation to past), are unmarked in relation to the categories, which are marked. This theory, known as the markedness theory (which has a lot of exceptions in languages) can to a large degree be explained by frequency (Greenberg 1966, Croft 1993, 2003).

In a current study I wanted to investigate the correlation between frequency and change rates of grammar, focusing on the Indo-European family. I compiled a sample of grammatical categories of word order, nominal morphology, verbal morphology and tense and organised the properties into hierarchical pairs according to the properties of present < past, pronoun < noun, agent < object, and masculine/feminine < neuter, which are well-known, universal, hierarchial relations, observed from a large number of languages. By means of an evolutionary model (performed by Chundra Cathcart), where transititions rates between property states over a tree were were reconstructed, we extracted the average number of transitions (per 1000 years) between each grammatical property in our data. 
When the results were split up into pairs of marking hierarchy, as mentioned above, it turned out that the rates of change in the lower categories (i.e., the less frequent ones from a universal perspective), was higher. The rates of the higher categories (i.e., more frequent ones from a universal perspectives), was lower. The difference was statistically significant (p=>0.005). Even if this study is based on one family (Indo-European), 149 languages and about 100 properties only, it seems likely that frequency impacts language change also in the grammar. This explains why more frequent grammatical categories preserve more archaic patterns over time.

Text has been updated 2019-03-11

I am currently travelling, so this blogpost will only very briefly discuss the topic of my current research in grammar reconstruction: the role of marking hierarchies in language change.
The notion of marking hierarchies has it roots in the markedness theory by Roman Jakobsen and implies that grammatical categories (e.g., singular - plural) typically are in a mutual, hierarchical relation, where one of the categories are morphologically unmarked, whereas the other is morphologically marked. The unmarked category thus has a higher position within a hierarchy of grammatical properties (singular < plural). These grammatical relations are, according to some authors, general, or "universal", anchored in our in-born grammatical system. However, we know that this is a problematic notion: there are a substantial amount of languages where the actual morphological marking contradict the proposed markedness hierarchies. Further, not all languages have morphology. Morphological marking alone cannot be the identifyer of marking hierarchies.
On the other hand, there is an obvious connection between the observed marking hierarchies and frequency. Superior categories, "unmarked" in the traditional markedness theory, are more frequently used in speech and in text. Again, the definion may be problematic, since not all languages have corpora that enable a detailed study of category frequency. Also, marking hierarchies based on frequency may contradict marking hierachies based on general morphological marking observations.
My current study on grammar reconstruction, which I have been writing about in several blogposts, indicate a clear correlation between change rates and marking hierarchies: superior categories, which are more frequent in grammar and most likely to be unmarked grammatically, have substantially lower change rates (and slower pace of change) than inferior categories, which have higher change rates (and faster pace of change). I will continue and follow up this topic in a coming blogpost. 

Bickel, Balthasar (2008), 'On the scope of the referential hierarchy in the typology of grammatical relations', in G. Corbett Greville and Michael Noonan (eds.), Case and Grammatical Relations. Studies in honor of Bernard Comrie (Amsterdam - Philadelphia: John Benjamins), 191-210.
Croft, William (2003), Typology and universals (Cambridge textbooks in linguistics, 99-0104661-0; Cambridge: Cambridge Univ. Press).
Comrie, Bernard (1981), Language universals and linguistic typology : syntax and morphology (Oxford: Blackwell).
Dixon, Robert M V (1994), Ergativity [Elektronisk resurs] (Cambridge: Cambridge University Press).
--- (1997), The Rise and Fall of Languages [Elektronisk resurs].
--- (2010a), Basic linguistic theory. Vol. 2, Grammatical topics (Oxford: Oxford University Press).
--- (2010b), Basic linguistic theory [Elektronisk resurs]. Vol. 2, Grammatical topics (Oxford: Oxford University Press).