I am taking up this blogg after a summer intermission. During the summer, I have been at International Conference of Historical Linguistics 24 in Canberra and at the 52nd Annual Meeting of Societas Linguistica Europea in Leipzig. In both places I talked about one specific topic, which have attracted my interest recently: gender evolution and gender assignment, specifically in Indo-European.
In a couple of coming blogposts, I will talk specifically about this issue. The first post will deal with the morphosyntactic reconstruction of the Indo-Europen gender system.

First of all, how do we define gender? The typical way in which this is done is to use the definition of agreement, which is visible on an agreeing article, adjective or verb. Normally, the gender system of a language is described in grammars, which is reflected in the dictionary of this language. However, this definition does not work for pronominal gender, which is more tricky. For defining pronominal gender, it is necessary to look at the occurrence of gendered forms in pronominal systems.

Gender is prototypically a property of nouns, and once the gender has been identified for all nouns in a language, an important issue is to try to define the underlying causes for gender assigment. There is plenty of research on this issue, both from a general typological perspective as well as with respect to individual languages.  According to the canonical gender literature (Corbett 1991, 2013, Corbett and Fraser 2000), there are three basic principles according to which gender is assigned in languages. These are phonological, morphological and semantic. A fundamental problem is that these rules typically compete in languages.

What is the situation in Indo-European?

Most languages have gender (masculine, feminine, neuter). No language has ”purely” phonological, morphological, or semantic assignment. Diachrony apparently plays a role: many language inherit larger or smaller parts of their gender system and gender assignment on nouns. Most languages have competing rules for assignment.

The next issue is the reconstruction of Indo-European gender. For the reconstruction of the Indo-European gender system, based on a morphological reconstruction of systems in the various branches, there are three proposed suggestions in the literature. The option suggested by Hermann Hirt in the 1930s (Hirt 1934, 1937) was that Indo-European had no gender, which then later developed into a three-gender system by means of grammaticalization. The reconstruction of Delbrück and Brugmann (Brugmann & Delbrück 1893, 1897, 1900) contained three genders, like Sanskrit, Classical Greek and Latin, which later was either preserved or collapsed into a masculine-feminine or a common-neuter system. However, Brugmann and Delbrück were uncertain about the feminine gender, basically due to the formal correspondence in the reconstructed state of the feminine and the neuter (the -h2- suffix). Based on this formal similarity between the collective/neuter and the feminine, as well as the shape of the system of Anatolian with a commune and a collective/neuter, later Indo-European scholars agree that Indo-European had a two-gender animate-inanimate system (which is reflected in the Anatolian system), which later developed into a sex-based gender system with an additional collective gender, the neuter (see Table 1) (Luraghi 1911, Matasović 2004).
Basically, the model of Hirt implies that gender evolved by grammaticalization, the Delbrück model that the three-gender system of Indo-European either remained or collapsed. However, we must remember that both these models were constructed before the discovery of Anatolian.
The mainstream model is based on an idea of a typological evolution of the gender systems, which moves from an animate - inanimate to a sexus-based system, which retains the difference between animacy in the masculine feminine and the difference between abstract and concrete in feminine-neuter (table 1).

In brief, the mainstream model supposes that there is:

Trace of the old system in languages Emergence of human~non-human distinction after the proto-language Emergence of an abstract~conctrete distinction of non-human gender after the proto-language Later mapping into a sexus-based system with retention of the concrete inanimate (neuter) Continuation of the ancient assignment principles in various languages
Table 1. The developmental phases of the Indo-European gender system according to the mainstram model (after Luraghi 2009). Stage 1 ANIMATE INANIMATE Stage 2 HUMAN ABSTRACT CONCRETE Stage 3 MASCULINE/FEMININE FEMININE NEUTER

The next issue in this process is to find out what happens if an evolutionary model is used for the reconstruction (Cathcart, Carling et al 2018, Carling 2019)? Gender reconstruction is an important question for evolutionary models, since the system reconstructed to Proto-Indo-European has been changed in most living languages (see Table 1).

I will discuss this issue in the next blogpost.

Brugmann, Karl, Delbrück, Berthold, and Delbrück, Berthold (1893), Grundriss der vergleichenden Grammatik der indogermanischen Sprachen : kurzgefasste Darstellung der Geschichte des Altindischen, Altiranischen (Avestischen u. Altpersischen), Altarmenischen, Altgriechischen, Albanesischen, Lateinischen, Oskisch-Umbrischen, Altirischen, Gotischen, Althochdeutschen, Litauischen und Altkirchenslavischen. Bd 3, Vergleichende Syntax der indogermanischen Sprachen, T. 1 (Strassburg: Trübner).
--- (1897), Grundriss der vergleichenden Grammatik der indogermanischen Sprachen : kurzgefasste Darstellung der Geschichte des Altindischen, Altiranischen (Avestischen u. Altpersischen), Altarmenischen, Altgriechischen, Albanesischen, Lateinischen, Oskisch-Umbrischen, Altirischen, Gotischen, Althochdeutschen, Litauischen und Altkirchenslavischen. Bd 4, Vergleichende Syntax der indogermanischen Sprachen, T. 2 (Strassburg: Trübner).
--- (1900), Grundriss der vergleichenden Grammatik der indogermanischen Sprachen : kurzgefasste Darstellung der Geschichte des Altindischen, Altiranischen (Avestischen u. Altpersischen), Altarmenischen, Altgriechischen, Albanesischen, Lateinischen, Oskisch-Umbrischen, Altirischen, Gotischen, Althochdeutschen, Litauischen und Altkirchenslavischen. Bd 5, Vergleichende Syntax der indogermanischen Sprachen, T. 3 (Strassburg: Trübner).
Carling, Gerd (2019), Mouton Atlas of Languages and Cultures. Vol. 1: Europe, Caucasus, Western and Southern Asia (Berlin - New York: Mouton de Gruyter).
Cathcart, Chundra, et al. (2018), 'Areal pressure in grammatical evolution.', Diachronica, 35 (1), 1-34.
Corbett, Greville G. (1991), Gender (Cambridge textbooks in linguistics, 99-0104661-0; Cambridge: Cambridge Univ. Press).
Corbett Greville, G. (2013), 'Gender typology', The Expression of Gender.
Corbett, Greville G. and Fraser, Norman M. (2000), 'Gender assignment: a typology and a model', in Gunter Senft (ed.), Systems of Nominal Classification (Cambridge: Cambridge University Press), 293-325.
Hirt, Hermann Alfred (1934), Indogermanische Grammatik. T. 6, Syntax, 1 : syntaktische Verwendung der Kasus un der Verbalformen (Heidelberg: Carl Winter).
Luraghi, Silvia (2011), 'The origin of the Proto-Indo-European gender system: Typological considerations', Folia Linguistics, 45 (2), 435-64.
Matasović, Ranko (2004), Gender in Indo-European (Heidelberg: Winter).


Prononminal gender systems in Indo-European languages.

The Swedish summer vacation is approaching, and I will go to Australia, among others to attend the International Conference on Historical Linguistics in Canberra, 1-5 July. I will give two talks, one about the evolution and tendencies of gender assignment in Indo-European, and one about the evolution and change of alignment in Indo-European. After the summer intermission I will return and write more about these two topics in different posts.
However, I will try (if I have time and possibility) to make an overview of some of the interesting talks from the ICHL conference. Therefore, stay tuned! Thanks to all readers and have a nice summer!

Wordcloud of texts from the blogposts of autumn 2018.

This blogpost will briefly introduce a highly interesting phenomenon in the history of Eurasian languages, namely the emergence of definiteness. Most ancient attested Indo-European languages do not have definitess marking, but the phenomenon appears relatively early on in several languages, in various forms. The emergence of the various types of definiteness markings do not seem to be areally caused, rather, most of the variants emerge through internal pressure and grammaticalization. In addition, definiteness is not restricted to the Indo-European languages but occurs also in various forms in Caucasian families, in Turkic, as well as in some Uralic languages.
There are several types of definiteness marking, which typically co-occur in languages. One type, is to have a non-bound definite article (as a special word class), as in German or English:
das Haus
def house ‘the house’

Another type is a bound definite marker, as in Scandinavian:
house-def ‘the house’

The fundamental types of definiteness are  definiteness marked on the adjective, as in Swedish:
det stor-a hus-et
DEF large-DEF house-DEF
‘the large house’

Definite marking can be obligatory, either at the end or at the beginning of a Noun Phrase, as in Bulgarian:
xubava-ta kniga
nice-DEF book
‘the nice book’

The ancient Indo-European languages lack definiteness, and this state has been preserved in a huge area of predominately Slavic and Indo-Aryan languages. The emergence of the various forms of definiteness began - apparently independently and with large variation even within branches of the families - already in ancient times, and escalates during the medieval period. A large part of the existing variation seems to be caused by parallel evolution. Still, the exact causes for the variation remain obscure.

Bauer, Brigitte. 2007. "The definite article in Indo-European. Emergence of a new grammatical category?" In Nominal Determination. Typology, context constratis, and historical emergence, edited by Elisabeth Stark, Elisabeth Leiss and Werner Abraham, 103-139. Amsterdam-Philadelphia: John Benjamins.


Variation in definiteness marking in historical Eurasian languages. Legens see map of modern languages above.

Probability levels of different types of definiteness marking in protolanguages, based on an evolutoinary test using the data of the DiACL database.

Of the 3672 entries of the Tocharian A dictionary (Carling and Pinault to appear), 772 lemma have been marked as “from Sanskrit”, which represents 21% of the entire vocabulary (of 1508 nouns, 338 are from Sanskrit, representing 22%). Of these 772 lemma, 39 are marked as “via Middle Indic”, which represents 5% of the words borrowed from Sanskrit. Compared to Sanskrit loans, other source languages are marginal: there are 22 words marked as “from Middle Iranian”, 5 “from Chinese”, 10 “from Uighur”, 10 “from Prakrit”, and 4 “from Pali”.
What does this imply? First, and foremost, of course, that Sanskrit, or rather Buddhist Hybrid Sanskrit, plays a fundamental role in Tocharian literature. “From Sanskrit” means that a word has been borrowed from Classical Sanskrit (Monier Williams 1899) or Buddhist Hybrid Sanskrit (Edgerton 1953, Bechert, Waldschmidt, and Bongard-Levin 1996) with no other change than an adaptation to the morphological system according to the languages’ rules for adapting loans (Krause and Thomas 1960). “From Prakrit” or “from Pali” means that the word can be traced back to a source attested in Pali or Prakrit texts, which apparently is much more unusual than the other way round.
So, what type of changes are we talking about when we define words as “via Middle Indic” instead of just “from Sanskrit”? (Note that the examples below are from Tocharian A: there are also similar patterns in Tocharian B (Carling 2005)). Let us look at a couple of examples.
Some of the words are almost identical to the Sanskrit word, with little change:  A pāruṣak (n.) ‘name of a mythical garden’, via Middle Indic from Sanskrit pāruṣyaka- ‘n. of one of the groves of trāyastriṃśa gods’ (BHSD:343b), as in Pali phārusaka- ‘name of one of Indra's groves’ (PED:478b).  A kās* ‘Kāśa, a species of grass’, via Middle Indic from Sanskrit kāśa- ‘a species of grass’ (MW:280b).  
In other lexemes, there is more far-gone phonological change, which were either taken over from the Middle Indic source word, or alternatively, they took place in Tocharian. This remains unclear. Examples are: A kurkal (n.) ‘bdellium, a medical ingredient’, via Middle Indic from Sanskrit gulgulu- ‘bdellium’ (MW:360b). A klawe (n.m.) ‘die, throw of the die’, via Middle Indic from Sanskrit glaha-, originally ‘throw of the dice’, and individually ‘die’ (MW:374b). A jar (n.m.) ‘topknot’, via Middle Indic from Sanskrit jaṭā- ‘the hair twisted together (as worn by ascetics)’ (MW:408a). A tāpātriś (n.m.) ‘name of a class of gods’, via Middle Indic from Sanskrit trāya(s)-triṃśa- ‘name of a class of gods’, cf. Pali tāvatiṃsa (BHSD:257b).  A patatam (adv.) ‘fortunate, gifted’, via Middle Indic from Sanskrit pradattam, neuter adv. from pradatta- ‘granted, bestowed, gifted’ (MW:679c). A nātäk (n.m.) ‘lord’, via Middle Indic from Sanskrit nāthaka-, derived from Sanskrit nātha- ‘protector, patron, owner, lord’ (MW:534c). This vocabulary, both in Tocharian A and B (which has a larger vocabulary), is very interesting. The lexemes were apparently not borrowed from the literary standard of Prakrit and Pali or from Buddhist Hybrid Sanskrit directly. Rather, they were borrowed from one or several local Indo-Aryan dialects, which became extinct, but which may be part of a general change in Middle Indo-Aryan leading to the dialectal diversity of Modern Indo-Aryan languages.
In addition, the boundaries between Indo-Aryan and Iranian in some of these lexemes are not sharp: the words may have been borrowed from Iranian, but since Indo-Aryan is much better attested (via Classical Sanskrit), an Indo-Aryan source becomes more likely.
A systematization of sound changes in these words would likely add knowledge to the evolution of sound changes in Middle Indo-Aryan leading to Modern Indo-Aryan. This will also help us to teas apart Iranian from Indo-Aryan borrowings in Tocharian.

Bechert, Heinz, Ernst Waldschmidt, and Grigorij Maksimovic Bongard-Levin. 1996. Sanskrit-Wörterbuch der buddhistischen Texte aus den Turfan-Funden. Beih. 6, Sanskrit-Texte aus dem buddhistischen Kanon: Neuentdeckungen und Neueditionen, 3. Folge. Göttingen: Vandenhoeck und Ruprecht.
Carling, Gerd. 2005. "Carling, Gerd. Proto-Tocharian, Common Tocharian, and Tocharian – on the value of linguistic connections in a reconstructed language." In Proceedings of the Sixteenth Annual UCLA Indo-European Conference, edited by Karlene Jones-Bley, Martin E. Huld, Angela Vella Volpe and Miriam Robbins Dexter, 47-70. Washington: Institute of Man.
Carling, Gerd, and Georges-Jean Pinault. to appear. A Dictionary and Thesaurus of Tocharian A. Wiesbaden: Harrassowitz.
Edgerton, Franklin. 1953. Buddhist hybrid Sanskrit grammar and dictionary, William Dwight Whitney linguistic series: Yale U.P.; Oxford U.P.
Krause, Wolfgang, and Werner Thomas. 1960. Tocharisches Elementarbuch. B. 1, Grammatik. Heidelberg.
Monier Williams, Monier. 1899. A Sanskrit-English Dictionary : Etymologically and philologically arranged with special Reference to Cognate Indo-European Languages. Oxford: At the Clarendon Press.

During the period of the first centuries BCE, the impact of Iranian becomes important in Tocharian. This is something we know from the relatively large amounts of loanwords in Tocharian from various Iranian languages, beginning with one or several unknown Old Iranian dialects (which are not Avestan or Old Persian) and continuing with loans from various known Middle Iranian languages, such as Khotanese, Sogdian, and Bactrian. As usual with loans, the exact match of the source word is seldom found, meaning that the exact source language cannot be identified.
Iranian loans in Tocharian are interesting from the viewpoint of their semantic domains, which are indicative of the cultural impact of the Iranians on the Tocahrians in Central Asia.

A majority of the words refer to administrative concepts , e.g., titles or specific concepts of merchandise or administration, indicating that the Iranians influenced the Tocharians by imposing an administrative infrastructure. Examples are: Tocharian B waipecce 'possession', from Old lranian, Avestan xʷaēpaiθya­'own' Tocharian B waipte 'separately, apart' < Common Tocharian *wai-pätæ, borrowed probably from an adjective, Old Iranian *hwai­pati in the sense of 'independent, oneself’. Tocharian A pärko, B pärkau 'advantage, profit, interest' < Common Toch. *pärkāwV, borrowed from Old Bactrian, Bactrian φρογαοο 'profit', Old lranian *fragāwa-, Sogdian prγ'w, βry'w, Parthian frg'w 'treasure'. Tocharian A pare, B peri 'debt' < Common Tocharian *pæräī is borrowed from Old Bactrian *pāra > Bactrian paro 'debt, obligation, loan, amount, due'. Tocharian A  āpṣātrik* ‘citizen of a borough or market-town’, borrowed from Old Iranian *αβþαρο < *api-xšaθra- ‘borough, sub-district (of a city)’.  
Other words clearly refer to military concepts, such as values or terms for weapons: Tocharian B tsain 'arrow' from an Old lranian *dzaina-, Avestan zaēna- 'weapon'. Tocharian A āmāṃ B amāṃ ‘pride, arrogance’, loan from Middle Iranian, cf. Buddhist Sogdian ’’m’n ‘power’. Tocharian A āṣāṃ B aṣāṃ ‘worthy’, borrowed from Middle Iranian, cf. Khotanese āṣaṇa- ‘worthy’. Tocharian A āṣānik B āṣānike ‘venerable, worthy of respect’, loan from Middle Iranian, with same sourse as A āṣāṃ B aṣāṃ A senik ‘care, pledge, guaranteee’, from Middle Iranian *zēnik (Khot. ysīnīta, Sogd. zynyh, Kroraina Prakrit jheniya-)  
A bunch of words refer to farming and the household. Examples are: Tocharian AB ās ‘she-goat’, borrowed from Middle Iranian. Tocharian A kātak* B kattāke ‘master of the house, householder’, from Common Tocharian *kāttākǝ borrowed from Middle Iranian, cf. Khotanese ggāṭhaa, itself borrowed from Middle Indic, cf. Gāndhārī Prakrit *ghahaṭha, from Sanskrit gṛhastha-. Tocharian A miṣi B miṣṣe, miṣṣi ‘field’, borrowed from Khotanese mäṣṣa, miṣṣa ‘field for seed’.  
 A small amount of words are Buddhist terms (normally, the impact of Sanskrit is enormous on both Tocharian languages here). Examples are: Tocharian A pissaṅk ‘community of monks’, from Middle Iranian from Skt. bhikṣusaṃgha-  ‘Mönchsgemeinde, Mönchsorden’ (SWTF III:298b), cf. Khotanese bisaṃga-.  
Finally, we have a group of words referring to plants and ingrediants which are unfamiliar to the Tocharian fauna (also here, Sanskrit loans are much more common). Examples are: Tocharian A kārāś B karāśe* Via TB from Khotanese karāśśa ‘climbing plant’. Tocharian A kuñcit B kwäñcit, kuñcit, from Khotanese kuṃjsata- ‘sesame’.
In conclusion, the Iranian impact on Tocharian is mainly pre-Buddhist, referring to concepts of administration, warfare, and farming. With the change to Buddhism, the impact of Old and Middle Aryan becomes completely dominating in both Tocharian languages.
The words have been extracted from these sources:
Carling Gerd (to appear). A Dictionary and Thesaurus of Tocharian A. Complete Edition. In collaboration with Georges-Jean Pinault. Wiesbaden: Harrassowitz (610p.).
Carling, Gerd (2005). Proto-Tocharian, Common Tocharian, and Tocharian – on the value of linguistic connections in a reconstructed language. In: Jones-Bley, Karlene, Huld, Martin E., Volpe, Angela Vella,  Dexter, Miriam Robbins Proceedings of the Sixteenth Annual UCLA Indo-European Conference. Journal of Indo-European Studies. Monograph Series 50, 47-70.
These sources have many references to works by, e.g., Georges-Jean Pinault, K T Schmidt, Werner Winter, Nicholas Sims-Williams, Harold Bailey, L Isebaert, Jörundur Hilmarsson.

This week, also known as the Holy Week, is part of the holiday that in English goes by the name of Easter. Easter, which is celebrated throughout all of the Judaeo-Christian world, is one of the most important festivities of the year, marking the beginning of spring or summer and the resurrection of Christ. Like most Christian holidays, the roots of Easter go back into pagan times. In particular in Northern Europe, many of the mysterious habits of an ancient spring festival have survived until today. Children chase an unvisible easter hare, which puts candy-filled eggs in the grass. Birch twigs are compiled, taken indoors, and ornamented with painted eggs and feathers. Children also dress as witches or 'easterhubbies' (the difference is whether you wear a scarf or a hat), painting their faces with red dots, and go from door to door asking for candy. Afterwards, they are supposed to fly on their brooms to Brocken. Fires and fireworks are lit, and, most importantly, enormous quantities of egg, fish, meat, and candy are consumed.

So, which are the terms we use for this festival? Most languages have form of the Greek (via Latin) word paskha, itself borrowed from Aramaic (Hebrew Pesach), meaning 'passover'. The West Germanic terms, such as English Easter and German Ostern, go back to a Common Germanic goddess of spring, Old English Eastre, which is identical to the Indo-European goddess of dawn *h2éus-ōs (Sanskrit uṣās, Latin aurōra). Other languages have words that in various ways relate to the basically biblical rituals of Easter, including 'sacrificial animal', 'taking of the meat', 'resurrection', 'great day' or 'great night', or 'liberation'.

Just as with the Christmas words (see http://www.gerdcarling.se/i/a32842142/2018/12/), the map of meanings of Easter unveil important information about various cultural spheres, as well as exceptions in the form of islands of different usage.

With this little etymological overview I would like to wish you all a Happy Easter!

Lubotsky, Alexander. Brill Online Dictionaries: Indo-European Etymological Dictionaries Online (https://dictionaries-brillonline-com.ludwig.lub.lu.se/iedo). Accessed 2019-04-17.
Troels-Lund 1932. Dagligt liv i Norden på 1500-talet. VII Årets fester. Stockholm: Bonniers.
Andersson et al 1968. Kulturhistoriskt Lexikon för Nordisk Medeltid XIII. Malmö: Allhems förlag.

I thank Ante Petrović for assistance with compiling/checking data for the Easter map.

Wikipedia has an excellent overview of names of Easter: https://en.wikipedia.org/wiki/Names_of_Easter

In the previous blogpost, I started a compilation of safe loans from and into Tocharian. I will continue this work in the next post. In this post, I will talk about loan directionality, since I am currently completing a paper (with several co-authors) on lexical borrowability in Eurasian languages. I want to say a few word about this project.

We have compiled and extracted all loan events in the lexical database, and tested various statistical measures on this data. Worth noticing is the directionality of loans in contrast to language power as well as the differential source languages of the families. As I have described in recent posts, our data set on lexical data compiles culture concepts, i.e., words for farming, technology, hunting, and war, which have a presumed age that go at least back to the Chalcolithic. This means that this vocabulary is not representative for the entire lexicon, only these specific domains. Loans are also extended over long periods, at least back to antiquity. If we look at the source languages, we notice that they differ between families. In Indo-European, Latin is most frequent, followed by Middle Low German, French, Old French, Slavic, Classical Greek. In Caucasian, Turkic languages dominate, followed by Persian, Georgian, and Arabic. In Uralic, Scandinavian languages dominate, which is mainly due to the fact that our Fenno-Ugric languages dominate in our data (see pictures below).

The correlation between loan directionality and language power and populations size is also noteworthy. We define the power of languages by a quantitative rank based on several features, including literary power, economic power and population size. This we plot against the occurence as source and target language in loan events (see graph above). All languages are equally likely to be target languages, but the most powerful languages are more likely to be source languages. This is a significant correlation. The most frequent loan event is from a very powerful language to a very weak. The second most frequent language is from a medium powerful to a weak. The third most frequent loan is from a medium powerful to a medium powerful language. In scrutinizing the data, we observe that this type of loan event is almost entirely restricted to the middle ages, which is also an interesting result. Unequality between languages seems to be specific to the antique and modern periods, whereas language contact in the middle ages was more distributed between languages of equal power.

Graph illustrating the most frequent source languages in Indo-European (top), Caucasian (middle), and Uralic (bottom) families.

In Scandinavian folklore, there is a story about a lethal pig, the Gloso (‘glowing sow’), which kills lonely hikers on their way home at Christmas Eve. The pig is black with glowing, red eyes, and its back is a sharp saw: running between humans’ legs, the creature cuts humans in two parts. The only way to survive a Gloso is to jump into the ditch as soon as you spot the animal’s glowing eyes from a distance. Stories about lethal pigs are also found in Celtic mythology, in the tale about Mag Mucrime, pigs from the underworld, which haunt and ravage the lands, killing people and destroying the fields. However, the ancient Celts were also very fond of their pigs. Typically, helmets and shields of Celtic warriors were decorated by boars – and we should not forget that a pig is in the centre in one of the most important Celtic epic tales, the story of Mac Da Thó’s pig. Likewise, in Germanic mythology, the pig is the animal of the god of fertility, Frey, and the boar Sæhrímnir, which can be eaten again and again, plays a central role as provider of meat to the dead warriors and the gods od Valhalla.
How come that the most important protein source to ancient Neolithic farmers had such different roles in various cultures of the Eurasian continent? Banned in some cultures, worshipped in others, and in other associated with death and the netherworld – apparently, the pig did not stay neutral to ancient people. Our answers are partly found in language.

Like the cow, goat and sheep, the pig belongs to the earliest domesticated animals, dating back to 10-11,000 BP in Anatolia and West Asia. Very likely, the first pigs were wild pigs attracted to human settlements by the waste. The early farmers, who very quickly must have understood the value of pigs as a protein source, successively domesticated them by killing the males and keeping the females for reproduction. In fact, even today, pigs are the most effective protein source of farming, besides chicken. The great danger associated with the hunting of wild boars must have contributed to the early farmers’ high esteem of pig domestication.
Domestication of pigs spread with the spread of farming, but for some reason – maybe that pigs are useless for herding or that they are easily infected by sickness – the domestication did not reach as far as the domestication of cow, goat, and sheep. Pigs are extremely unusual in Ancient Egypt, and pig domestication never reached Central Asia. In parts of West Asia and Anatolia, there was a decline in pig domestication already in early antiquity, something that was later transformed into a complete ban though religion, as in Judaism and later on also in Islam. In cultures where pig domestication was continued (Eastern and Western Europe, and the Mediterranean), the pig received a dual role in cultures: it was both an animal associated with death and the underworld, worshipped in chthonic sacrifices, as well as an animal symbolizing fertility and prosperity. This is found both in Graeco-Roman, Celtic, and Germanic mythology.

Can linguistics help us solving this enigma? There are several ways of investigating cultural patterns by language: either to look at the origin of words and their etymology down to the proto-language, or to consider the colexification patterns (meanings that co-occur in a language) and the meaning change patterns of words in genetically related languages. Stability and spread of cognates, as well as borrowing tendencies are important methods as well.

If we look at linguistic reconstructions, the picture is complex and interesting. Pig words, including a general word for ‘pig’ (generic), which is often the same as ‘sow’, as well as ‘piglet’, can be reconstructed to Proto-Indo-European (PIE *suH- ‘pig’, PIE *porḱo- ‘young pig, piglet’). These lexical roots, which had the meaning of ‘pig’ and ‘piglet’ already in the proto-language, indicate the Indo-Europeans had domesticated pigs. They are represented in a vast majority of pig words in Indo-European languages. Besides, some sub-branches replaced the forms or added new words for the pig terms. In Germanic languages, the male pig was derived from a root meaning ‘infertile’ (PGm *galtan- ‘boar’ < *gald(j)a- ‘infertile’ < PIE *ghol-tó-), indicating that male pigs or boars were gelded rather than killed. Several languages created new lexemes by referring to the grunting sound of pigs, such as Lithuanian čiūkà, kūkà ‘pig’ (Balto-Slavic *kyaw-, *kyū- < PIE *kew-, *kū- 'to howl') or Old and Modern Irish cráin ‘sow’ (Proto-Celtic *krākni- 'sow'). Some languages used the wide-spread Indo-European root for ‘young of animal’ (PIE *wetso- 'young of animal' < *wet- 'year').
The wild boar has its own root in Proto-Indo-European (PIE *h₁pr-o- '(wild) boar'), e.g., Latin aper, but very often, this root comes to represent both the wild and domesticated male pig, such as Croatian vȅpar, German Eber. Several languages use the Proto-Indo-European root PIE *h₂wŕ̥s-en- 'male' for the wild boar, such as Sanskrit varāha-, Hindi varāh, bā̆rāh. Else, a combination of a root meaning ‘wild’ and the root PIE *suH- 'pig' is very frequent, as in Bulgarian díva svinjá, German Wildschwein. In general, words with the meaning ‘wild boar’ also frequently mean ‘(domesticated) boar’, something that indicates that the wild boar was represented by the (male) boar, in contrast to the (female) sow, which represented the domesticated pig.     
Caucasian proto-languages, Proto-Kartvelian, Proto-North-West-Caucasian, Proto-Nakh, and Proto-Dagestanian all have reconstructed words with the meaning ‘pig’ (PKv *ɣor- ‘pig’, PNWC *ɣaw- ‘pig, piglet’, PN *eɣ-ə ‘pig’; PD *bol’- ‘pig’, PKv *burw- ‘gilt (female pig, 3-12 months old); suckling pig’, PNWC *bl˜’-ə ‘sow, female pig’, PN *borl’- ‘colourful’). This clearly points out that the early Caucasians domesticated the pig, something that we know they did early on.
Uralic, on the other hand, borrowed their pig words from Indo-European or Iranian (Proto-Finnic *sika ‘pig’, Proto-Finno-Ugric *porśas, *porćas, loan from Indo-Iranian), indicating that the early Uralic tribes did not domesticate the pig – they adapted pig domestication from Indo-European tribes.

However, the patterns of meaning change and colexification of pig words give an interesting picture. First, pig words often change to the meaning of other animals, often large and ‘chubby’ animals, such as ‘elephant’, ‘stallion’, or ‘camel’. In particular, this is the case in Caucasian languages. The domestic pig occasionally points in the direction of negative connotations, such as ‘filthy person’, ‘immoral person’, ‘fat’, or ‘greedy’. However, meaning changes and colexifications in the direction of power and fertility are frequent, such as 'bull', ‘hero’, ‘powerful’, 'king', ‘manly’, ‘chieftain’ and ‘husband’, in particular with the (wild) boar.

It is obvious that ancient people both worshipped and admired their pigs, but language indicates that they most of all respected the wild boars, probably because they were dangerous and hard to hunt. The domestic pigs were highly evaluated but also, apparently, looked down upon. The dangerous pigs we know from mythology have not given much imprint on language.
(Carling To appear (2019); Gamkrelidze et al. 1995; Larson and Fuller 2014; Mallory and Adams 1997)
Carling, Gerd (To appear (2019)), Mouton Atlas of Languages and Cultures. Vol. 1: Europe, Caucasus, Western and Southern Asia (Berlin - New York: Mouton de Gruyter).
Gamkrelidze, Tamaz Valerianovič, Ivanov, Vjačeslav Vsevolodovič, and Winter, Werner (1995), Indo-European and the Indo-Europeans : a reconstruction and historical analysis of a proto-language and a proto-culture (Trends in linguistics. Studies and monographs, 99-0115958-X ; 80; Berlin: Mouton de Gruyter).
Larson, Greger and Fuller, Dorian Q. (2014), 'The Evolution of Animal Domestication', Annual Review of Ecology, Evolution & Systematics, 45, 115-36.
Mallory, James P. and Adams, Douglas Q. (1997), Encyclopedia of Indo-European culture (London: Fitzroy Dearborn).

Hyllested, Adam 2017. Again on ancient pigs in Europe.

Semantic network of colexifications (blue, purple) and meaning change in etymologies (red) of the core concepts (green) PIG, WILD BOAR, and PIGLET in 85 Indo-European languages. Graph by Niklas Johansson.

I was asked by my friend and colleague Victor Mair (University of Pennsylvania) to come up with my 'safe list' of loans from and into Tocharian. This is a very interesting and challenging topic, which I will continue working upon in a couple of coming posts. First, I will start with the most tricky one: Tocharian loan contacts with Chinese.
Establishing Tocharian loans from and into Chinese are particularly complex for two reasons: first, the reconstruction of Chinese phonology at various stages in the Chinese prehistory, which is connected to many uncertainties and a large amount of debate, and second, the reconstruction of Tocharian phonology, which is particularly tricky and complex. The fundamental question is: How can we be certain that a specific word was borrowed at a certain stage from one reconstructed language to another? The prehistory of both languages can be stratified into various stages, Pre-Proto and Proto-Chinese, Old Chinese (Early and Late) Middle Chinese, and Pre-Proto- and Proto-Tocharian, Common Tocharian, Pre-A and Pre-B, and Tocharian A and B. Beyond that, we have the proto-languages Proto-Indo-European and Proto-Sino-Tibetan, which can be further stratified into stages on their way to Proto-Chinese and Proto-Tocharian. 
How can we know that a word, that obviously looks as if it was borrowed from Indo-European, is borrowed from Tocharian? The answer is that we have to show that specific Tocharian sound changes have taken place in the specific borrowed lexeme. These changes also have to be identified in the target language from the corresponding period. The process is very tricky, and the result is very few certain loans, more uncertain loans, and a huge number of uncertain loans.

Tocharian loans from Old Chinese (before 2nd ct BCE)
Toch. AB klu ‘rice’ was borrowed from Old Chinese: Mod. Ch. dào, Mid. Ch. *dawX, Old Chin. *C-luu-? ‘rice, rice-paddy’ (GSR 1078). In Middle Chinese, the initial cluster OChin. *gl- was simplified to d-. 
Toch. B rapaññe ‘of the last month of the year’ (LP 12 a2 rapaññe meṃne ikäṃ-wine ‘on the second day of the month rapaññe’), an adjective formed on a noun *rāp, from Old Chinese: Mod. Ch. là, Mid. Ch. *lap, Old Ch. *raap (GSR 637j) ‘winter sacrifice’. It is likely that an earlier meaning of the Chinese word is reflected in Tocharian.
Toch. A ri B rīye 'town' < Common Toch. *riye matches the Old Chinese reconstruction of Mod. Ch. lĭ, Mid. Ch. *liX, Old Ch. *r̯ǝ-? (GSR 978a) ‘walled city’. The word may also be a Tocharian loan in Old Chinese.
Further loans include  Toch. A truṅk Toch. B troṅk 'cave' 

Tocharian loans from Early Middle Chinese (possibly 3-4th ct ACE)
TA ṣoṣtäṅk ‘tax collector, banker’ (Skt. śreṣṭhin-) corresponds to Niya ṣoṭhaṃga ‘tax collector’, Bactr. σωταγγο < *šoštaṅgV. A possible source is Mod. Ch. shōucáng, Mid. Ch. *syuw+dzang, Old. Ch. *xiw-N-s-(h)raŋ (GSR 1103a+727g´) ‘receive, accept, gather’ + ‘conceal, store’.
TA ṣukṣ ‘(smaller) village’, TB kwaṣo* ‘village’. Parallel Mod. Ch. sù, Mid. Ch. *sjuwk, Old Ch. *suk (GSR 1029a) ‘lodge, mansion’. Itō & Takashima (1996:401) reconstruct Old Ch. *sjәkw-s with a final *-s (that has a function of localisation and production of nomina actionis etc.).
Toch. A āṅk* ‘seal, stamp’, Mod. Ch. yìn, Mid. Ch. *ʔjinH, Old Ch. *ʔin-s (GSR 1251f), *ʔi̯əɳ (Takashima) ‘seal, stamp’.

Further loans include
Toch. B cāk, tau  '(dry measures)', Toch. B cāne 'money'. Toch. B śakuse 'brandy', Toch. B ṣaṅk '(measure of volume)', TA yāmutsi TB yāmuttsi 'waterfowl' < 'parrot', Toch. B ṣitsok 'millet alcohol', Toch. B ṣipāṅkiñc 'abacus', Toch. A Toch B cok 'lamp', Toch. A lyäk Toch. B lyak 'thief', Toch. A < Toch. B tseṃ 'blue, Toch. A nkiñc Toch. B ñkante 'silver'.

These words give important indications of the impact of the Chinese culture on Tocharian. The track will be continued further on.

Carling, Gerd. Proto-Tocharian, Common Tocharian, and Tocharian – on the value of linguistic connections in a reconstructed language. In: Jones-Bley, Karlene, Huld, Martin E., Volpe, Angela Vella,  Dexter, Miriam Robbins Proceedings of the Sixteenth Annual UCLA Indo-European Conference. Journal of Indo-European Studies. Monograph Series (Institute for the Study of Man) 50, 47-70.
Kim, Ronald. (1999). Observations on the absolute and relative chronology of Tocharian loanwords and sound changes. Tocharian and Indo-European studies, 8, p. 111–138.
Lubotsky, Alexander, & Starostin, Sergei. (2003). Turkic and Chinese loan words in Tocharian.
Židek, Jan. (2017). Tocharian Loanwords in Chinese [Dissertation]. Praha: Univerzita Karlova.

I found a fun map on twitter, from the Foreign Service Institute (see below), which categorises the difficulty of learning a language identified as number of weeks. According to the map (which applies to (American) English speakers), Swedish and French are languages that are supposed to be very easy to learn, whereas, e.g., Russian is found among the more difficult languages. Even though applied to English speakers, the map would not be very different to a speaker of Swedish or German. Why is that so? If you ask normal people (i.e., without a degree in linguistics), the answer would naturally be that languages like English and Swedish “have no grammar”. If you ask what they mean by “grammar”, many would come up with the answer “they have no cases”.
In learning a language like Russian, we have, early on, to start learning many case forms, and then to learn the rules for how to apply them in language. This is difficult to most of us using a language with prepositions (in, on, on top of, towards) rather than cases. But why do some languages have cases instead of prepositions? Or, to reverse the question, why do some languages have prepositions instead of cases? And are really the usages of prepositions easier to learn than the usages of cases? Very few languages (such as Hungarian or Ossetian or other exotic languages in the Caucasian mountains) have as many cases as any normal language such as Swedish or English has prepositions. The rules of English prepositions are also hard to learn, and speakers of, e.g., Swedish often make mistakes in the use of prepositions.

However, if we take a look at the map of learning difficulties in contrast to the map of case system types below (data from the DiACL database), the correspondence between the two maps (in the parts that overlap) is striking. Analytical systems are the easiest, followed by fusional, and finally by agglutinating and other more complex systems. It would be very interesting to see what the map looks like to native speakers of Finnish or Russian.
Case systems are interesting, since they indicate that languages are circular in their evolution. Case systems are basically of three types:

isolating (or analytic), with no cases, relations between participants in an event is expressed by prepositions (or postpositions), agglutinating, with cases expressed by affixes with a simple function (plural, dative), which are attached to the noun stem, fusional, where paradigms are built by cases which may mark several functions, such as feminine + dative + plural.

Case systems are typically of one of these types, where isolating systems are small, with 0-2 distinctions (e.g., English, Swedish, Danish), fusional systems are medium-large (e.g., Russian, German), often with many different forms in the language, whereas agglutinating systems tend to be large (e.g., Finnish, Hungarian, Turkic). Agglutinating systems are ruled by the principle, one suffix – one function (e.g., plural, dative).
Systems are seldom ‘pure’: most languages have case systems that are partly isolating, partly, agglutinating, partly fusional. That is what makes them difficult to learn.

Why is the case situation the way it is? The structure of case systems has multiple explanations, and linguists are not yet aware of all the details in the process and development of case systems. One important reason for the outcome is language change and the cyclical behaviour of case systems: Fusional systems (e.g., Russian) tend to break down or erode to isolating systems (e.g., English), which in may merge their combinations of noun + adposition into an agglutinating system (e.g., Turkish). And agglutinating systems, again, may fuse their forms to become fusional. However, in this cycle, languages may become stuck for millennia between states, where various types of mixed, weird and complex systems, with many and irregular forms, become standard.
Besides time and cyclic change, geography and language contact shape case systems. The situation is complex: case systems show clear tendencies of sharing similarities over language, branch and family boundaries. For instance, no case is more frequent in Western Europe, fusional cases are more frequent in Eastern Europe and in various conservative pockets (islands, forests) such as in Iceland, Faroe Islands, Germany, and Dalecarlia, and agglutinating cases are more frequent on the Asian landmass (except for in the east, China). But the map is complex: historical explanations struggle with geographic explanations, which in turn struggle with typological cyclic behaviour explanations, when we try to explain the structure of case systems.


From https://twitter.com/AmericanGeo/status/1010364347502059520

Distribution of types of nominal case systems in modern (top) and ancient (bottom) languages. Dark red (1) targets no cases, green represent fusional types, pink/purple nuances agglutinating systems.

Illustration of the morphological cycle of case systems. Tocharian is an example of a mixed system which has moved in the opposite direction, from fusinal to agglutinating..