How Scandinavian languages got two genders

Most Scandinavian languages, including Swedish and Danish, distinguish two genders. Nouns are defined as either common or neuter, which is marked by indefinite pronouns en cykel ‘a bike’ but ett träd ‘a tree’. This is something that speakers have to learn for each noun in the language. Scandinavian languages once used to have even more genders, with a feminine, masculine, and neuter. This was the case in Old Norse, the ancestral language of Scandinavian languages, spoken about thousand years ago in large parts of Scandinavia. In fact, some dialectal varieties of Scandinavian have kept the system of three genders, and in a new paper we study how these languages are gradually drifting towards a system of two genders. Over time, the feminine becomes weaker. Feminine endings are lost and fewer and fewer words are feminine. Also, more infrequent words are unstable in their gender, and new loans become masculine. The process can be traced already in Old Norse, indicating that the decay and eventual loss of the feminine can be predicted already before it is taking place.

The study is published in the latest issue of Journal of Germanic Linguistics

Why a nose is a nose

In most of the world’s languages, a number of basic words have similar sound structure. A word for ‘nose’ typically has a nasal sound, an /n/ or /m/, words for ‘mother’ an /m/ or /n/, and various bone words, such as ‘knee’ a /k/. It is a mystery how these connections emerge and why they are maintained as languages evovolve over generations of speakers. Are we born to pronounce words in a specific way? Or does every new generation of speakers reinvent similar-sounding words for ‘mother’,  ‘father’, ‘knee’, ‘blow’, and so forth? A new study form in Lund (in collaboration with Tübingen) finds that these sound-meaning mappings are more stable than average as words evolve over time. This tendency is strongest for those sounds which are acquired earlier when a child learns a language. Our results indicate that across languages, new generations uphold these sound-symbolic associations and therefore keep pronouncing basic concepts similarly.

The study is published in Philosophical Transactions of the Royal Society B and can be accessed at

A previous study by the Lund group, identifying basic concepts that have similar sound structure in all of the world's languages was published by Linguistic Typology 2020 and can be accessed at

Since I began by posting a picture on the Eurasian diversity for the words for WHEEL, my first post is lexical: I will talk about terms for vehicles. Within Indo-European studies, the issue of the words for vehicle-related terms is an important issue. Generally, it is believed that the invention of the wheel as a means of transport during early Chalcolithic was, together with the domestication of the horse for traction, the innovation that spread the Indo-European family over all Eurasia. However, there are several enigmas surrounding the origin of vehicles and wheeled transports. First, archaeology does not help us very much. The early wheels, hubs, and naves were made of wood, a non-durable material. Further, the spread of the wheel was so swift that we cannot know where it appeared first. Before the wheeled transport, there were other uses of the wheel: millstones for grinding, the pottery wheel, and spindles for spinning, so the word for wheel in the Indo-European proto-language had several potential functions. More important is the entire complexity of wheel and transport-related lexemes in Indo-European and its neighbors.
For Indo-European, a set of forms for wheel and transport can be reconstructed to the proto-language. Beginning with WHEEL, we have at least 3 common terms (PIE *h₂wērg-wn̥t-ōn 'wheel, circle’, PIE *h₂urg-i- 'wheel, circle', PIE *kʷekʷlo-, *kʷel-o- 'wheel, circle' < PIE *kʷel(H)- ‘to turn‘; PIE *Hróth₂o- 'wheel, circle' < PIE *(H)reth₂- 'to run'). Besides, we have terms for HUB or NAVE, which also mean ‘navel’ (PIE * h₃enbh-, * h₃nebh- ‘navel, nave, hub’, PIE *h₃nobh-li- 'navel, nave'), a reconstructed lexeme for WAGON (PIE *weǵhno- 'wagon' < PIE *uoǵh- 'to carry, drive'), The process of creating a word for ‘wheel’ from a verb meaning ‘to roll’ is found also outside of Indo-European, such as in Caucasian languages (Proto-Kartvelian *gor- 'wheel; to roll',  Proto-Nakh *gur- 'wheel', Proto-Dagestanian *gur- 'to whirl, to roll; wheel‘ (Georgian gor-gor-a 'wheel', Chechen gur-ma 'wheel for plough’); Proto-Kartvelian *bor- 'rotation', Proto-Nakh *bor-a 'mill's wheel', Proto-Dagestanian *bor-a 'wheel‘ (Georgian borbali 'wheel', Laz bor-bol-ia 'wheel', Laz  bur-in-i ’rotation; spinning’, Beshta örræ 'wheel', Avar ber 'wheel')).
It is evident that the Indo-Europeans knew the wheel and also used wheeled transports. Whether these transports took them over large areas is questionable: the wagons were heavy, the wheels of solid wood and roads were absent. Wagons were more likely used for loading and traction, such as for pulling hay from the field to the barn. Caucasians also had a word for WAGON (PKv *sa-kʰum- 'carriage', PNWC *kwə 'carriage, cart', PD *hankʰwə- 'carriage, vehicle‘ (Megr o-kʰim-o 'carriage', Adyghe kʷə, kʰwə 'wagon', Ubykh  kʰwə 'cart', Dargwa urkʰura 'carriage', Lezg akʰur 'carriage'). Apparently, these wagons were not fit to transgress the high Caucasus Mountains and spread the languages over the open plains.
Proto-Indo-European also had several words for YOKE (e.g., PIE *yug-o- 'yoke’). YOKE is a highly stable word in Indo-European, which practically did not change its form and was not substituted in languages. If the root was substituted, new forms were derived from roots meaning ‘to bind’ (Proto-Slavic *arь̀mъ, *arьmò 'yoke, ox-yoke' < PIE *h₂er- 'join’, Proto-Celtic *wedo- ‘yoke, harness’ < PIE *wedh- 'bind'). Interestingly enough, the Caucasians use the same root for the YOKE (PKv *uɣ-el- 'yoke', PNWC *ɣəw 'yoke', PD *ur- 'yoke’ (Georgian uɣeli 'yoke', Megrelian uɣeli 'yoke', Ubukh ɣawə 'yoke', Tabarasan uɣ-in 'cart (drawn by a single ox), Udi ọq' 'yoke')). The yoke, independent whether it was put on a bull, horse, donkey or human, had a very simple and straight-forward function, which did not change over the millennia: to put a device over the neck for facilitating traction and carrying.
The vehicles words in languages are highly interesting. Words for the parts of vehicles, such as the wheel or the hub, are seldom borrowed and remain stable in most languages. The words for WAGON and AXIS change more frequently: they are more often borrowed, and they often switch or expand their meaning. Both WAGON and AXIS frequently change or colexify their meanings, in particular to meanings referring to the sky and the firmament, e.g., ‘Polar star’, ‘axis’ or ‘firmament’. This says us something about the cultural importance of the wheel and the transport: words are frequently projected to the firmament, something that has a natural cause.
Density heatmaps indicating the frequency of languages as source (y) and target (x) language in loan events, by their ranking in a Language Power Index rank.

A study in PLOS ONE shows that borrowing is hierchical: borrowings are most likely to take place from a more prestigious language to a less prestigious one. In addition, borrowing is caused by increased cultural labour intensity.

All languages borrow words from other languages. Some languages are more prone to borrowing, while others borrow less, and different domains of the vocabulary are unequally susceptible to borrowing. Languages typically borrow words when a new concept is introduced, but languages may also borrow a new word for an already existing concept. Linguists describe two causalities for borrowing: need, i.e., the internal pressure of borrowing a new term for a concept in the language, and prestige, i.e., the external pressure of borrowing a term from a more prestigious language. We investigate lexical loans in a dataset of 104 concepts in 115 Eurasian languages from 7 families occupying a coherent contact area of the Eurasian landmass, of which Indo-European languages from various periods constitute a majority. We use a cognacy-coded dataset, which identifies loan events including a source and a target language. To avoid loans for newly introduced concepts in languages, we use a list of lexical concepts that have been in use at least since the Chalcolithic (4000–3000 BCE). We observe that the rates of borrowing are highly variable among concepts, lexical domains, languages, language families, and time periods. We compare our results to those of a global sample and observe that our rates are generally lower, but that the rates between the samples are significantly correlated. To test the causality of borrowing, we use two different ranks. Firstly, to test need, we use a cultural ranking of concepts by their mobility (of nature items) or their labour intensity and “distance-from-hearth” (of culture items). Secondly, to test prestige, we use a power ranking of languages by their socio-cultural status. We conclude that the borrowability of concepts increases with increasing mobility (nature), and with increased labour intensity and “distance-from-hearth” (culture). We also conclude that language prestige is not correlated with borrowability in general (all languages borrow, independently of prestige), but prestige predicts the directionality of borrowing, from a more prestigious language to a less prestigious one. The process is not constant over time, with a larger inequality during the ancient and modern periods, but this result may depend on the status of the data (non-prestigious languages often remain unattested). In conclusion, we observe that need and prestige compete as causes of lexical borrowing.

The large Scandinavian languages, such as Swedish and Danish, have lost their three-gender system to a system of commune and neuter. However, several smaller dialects or languages, such as Jamtlandic and Elfdalian, have preserved the system of three genders. In a new study from our research group, by Briana Van Epps and me, we investigate the assignment principles of gender in Jamtlandic. The dialect indicates an instability of the feminine gender, which is visible, among others, in gender assignment of loanwords.

DOI to the paper (Nordic Journal of Linguistics (2019), 1-33):

AbstractIn this study, we present an analysis of gender assignment tendencies in Jamtlandic, a lan-guage variety of Sweden, using a word list of 1029 items obtained from fieldwork. Mostresearch on gender assignment in the Scandinavian languages focuses on the standard lan-guages (Steinmetz 1985; Källström 1996; Trosterud 2001, 2006) and Norwegian dialects(Enger 2011, Kvinlaug 2011, Enger & Corbett 2012). However, gender assignment prin-ciples for Swedish dialects have not previously been researched. We find generalizationsbased on semantic, morphological, and phonological principles. Some of the principlesapply more consistently than others, some‘win’in competition with other principles; amultinomial logistic regression analysis provides a statistical foundation for evaluatingthe principles. The strongest tendencies are those based on biological sex, plural inflection,derivational suffixes, and some phonological sequences. Weaker tendencies include non-core semantic tendencies and other phonological sequences. Gender assignment inmodern loanwords differs from the overall material, with a larger proportion of nounsassigned masculine gender.

Continuing my blogposts about gender, I will say a few words about gender stability. Over time, words often change their gender. This is well known, for instance, in Germanic languages, the words for 'sun' and 'moon' are feminine and masculine respectively (as in German die Sonne and der Mond), whereas other branches of Indo-European the situation is the reverse (Italian sole masculine 'sun' and luna feminine 'moon').
The important and interesting thing here is to investigate the reasons for gender stability or instability. Are they connected to a specific gender? Or are they connected to specific words? Or is gender stability a matter of frequency? There are still very few, if no studies that look at gender stability, using large-scale data sets.
If we consider fist the issue of gender instability in our culture data set for Indo-European, we notice that is little difference between the genders when it comes to stability in cognates. We distinguished three classes, cognates with more than 90% same gender (stable class), cognates with between 90-50% same gender (dominant class), and cognates with under 50% same gender (change class). Wee notice that all three genders masculine, feminine, and neuter have approximately the same distribution within the classes stable, dominant and changing gender (see picture below). However, the masculine is slightly overrepresented in the stable group, feminine in the dominant group and neuter in the change group, meaning that the masculine is most stable, feminine a bit less stable, and neuter must untable. However, the differences are small.
What is more interesting though, and probably also promising for future research on gender stability, is that there is a large variation in the stability of different semantic classes. Crops, metals, trees, vegetables, prodcuts, are all highly stable, drink & drugs, small cattle, and tillage, etc and highly unstable. And so forth. If there is a connection to general frequency remains to be controlled for the entire Indo-European family, but a study on gender in Scandinavian languages only (Van Epps, Carling & Sapir 2019), found a correlation between frequency and gender instability.

Van Epps Briana, Gerd Carling & Yair Sapir to appear. “Gender assignment in six North Scandinavian languages: Patterns of variation and change”, to appear in a journal.

Heatmap of frequency of occurrence of various semantic classes in the different categories stable (

This week's blog post will deal with a complex topic: gender assignment.
As I have described in a previous post, gender involves a classification of nominal entities in language. Gender can generally be defined as classes of nouns which are reflected in the behaviour of associated words (Corbett 1991: 1). That is, gender is indicated by agreement of various elements. Gendered languages have varying number of genders present and they vary with respect to assignment, or how individual lexical items receive a gender (Audring 2014, 2017). Some languages assign gender based on semantic principles (semantic assignment systems), in which gender reflects categories such as biological sex or animacy. Other languages have formal assignment systems, which can be divided into morphological and phonological assignment (Corbett 1991: 7-8). Thus, gender assignment may be guided by semantic qualities (e.g., male/female, level of abstractness, shape), by morphological criteria (e.g., stem formation, inflection class, derivational suffixes), or by phonological criteria (e.g. word-final vowels or consonants). Languages may use semantic factors only, or a combination of semantic and formal factors, but all gender languages have a some semantic core (Corbett 1991: 8).

When looking at gender assignment in Indo-European culture vocabulary (the 100-culture list of our database, consisting of 8,500 gender- and cognacy-coded lexical items), some interesting tendencies emerge. We cannot investigate the phonological and morphological assignment principles on the data in its current shape (words in languages have not ben coded for morphology or phonology), but many other interesting tendencies can be extracted from the data.
First, the total distribution of genders of lexical items in the data is straightforward as masculine<feminine<neuter<alternans (see below). This is also reflected in the timeline of evolution of genders (see below), where we see that the masculine dominates in the early period, but weakens during the antique period and then regains strength during the first and in particular the second millenia ACE, on behalf of the feminine and in particular the neuter.  
We code all concepts for various semantic properties listed in the literature as important for gender assignment, such as animacy, collectiveness, countability, sexus, concreteness, and form/shape. In addition, we divide gender by different concepts classes, which we conclude by patterns of colexification and semantic change in the data.
We find that animated concepts (animals in our data) are significantly associated with the masculine gender (we compile both male and female forms of animals, but the overrepresentation of masculine for the general terms is important in the data). Further, we find that collectives as well as concepts coded as materials are significantly associated with the neuter gender. Our data does not contain abstract nouns, but surprisingly, we find that sharp and sticking implements are significantly associated with the feminine gender.
These tendencies for semantic properties undelie the overrepresentation of particular genders in certain semantic classes, which can be seen in the heatmap of gender distribution in relation to different classes above. In this heatmap, which divides concepts into classes, we can observe that neuter is overrepresented for metals and materials and drink and drugs, masculine is overrepresented for all animals, feminine is overrepresented for weapons, trees and insects (honeybee). This indicates that assignment is not just caused by semantic property, it is very likely also caused by semantic class, but more research and data is required to prove this assumption.

Distribution of the genders alternans, commune, neuter, feminine, and masculine in the dataset (lexemes of 104 concepts in 105 Indo-European languages)

Timeline of gender distribution in the lexical dataset (by Briana Van Epps).

Evolutionary reconstruction of gender in Indo-European is a highly interesting field. The subject is a perfect testbed for how well evolutionary methods generally work. The core issue is that the system that we reconstruct to Proto-Indo-European, a system with a commune/neuter distinction, which has developed into a sexus-based system (masculine/feminine/neuter) in most daughter branches, is preserved only in Anatolian (Hittite, Luwian), the oldest attested Indo-European branch. However, in Scandinavian and Dutch/Frisian, a commune/neuter system has re-emerged as a merger of a previous three-gender system. Therefore, on the surface, Anatolian and Scandinavian are similar, as we see from the MCA plot above, which indicates the synchronic similarities of Indo-European gender systems based on attested languages. However, the similarity between Scandinavian, Frisian/Dutch and Hittite/Luwian is an illusion, or - to use evolutionary terminology -  an example of homoplasy. The background and the functionality of the different systems are completely different. How can we make evolutionary methods account for this difference in the reconstruction reconstruct?
This is where we can test how well different models perform. Experiments (performed by our colleagues Chundra Cathcart, Harald Hammarström, and Marc Tang) indicate that the result of an evolutionary reconstruction are similar to the model of a comparative reconstruction (even if the the method, of course, is completely different). What we want the evolutionary reconstruciton to produce is a high probability of masculine/neuter at the root (i.e., Proto-Indo-European) and a lower probability of a feminine.
In experimenting with the data and different models, we find that the most important thing is the shape of the tree. For Indo-European, we get different results if we use a branched vs non-branched tree, if we use Indo-Anatolian vs non-Indo-Anatolian, if we use ancestry constraints vs. non-ancestry-constraints (ancestral languages are situated on the branches of trees, not 'cousins' to the living language). As for the model, we get different results depending on if we us an Markov Chain Monte Carlo model, which is basically constructing a chain that has a desired distribution as its equilibrium distribution, where one can obtain a sample of the desired distribution by recording states from the chain. A Dollo model has as its precondition that a system never returns exactly to its previous state, but it keeps trace of intermediate stages through which it passes. A Dollo model with and Indo-Anatolian tree produces a reconstruction which looks almost similar to Anatolian. However, more experimenting needs to be performed: obviously, it is necessary to have a correct tree of a family before an evolutionary reconstruction can be performed. But different models of reconstruction may be better than others, depending on how they deal with the problem of homoplasy and parallel drift.


I am taking up this blogg after a summer intermission. During the summer, I have been at International Conference of Historical Linguistics 24 in Canberra and at the 52nd Annual Meeting of Societas Linguistica Europea in Leipzig. In both places I talked about one specific topic, which have attracted my interest recently: gender evolution and gender assignment, specifically in Indo-European.
In a couple of coming blogposts, I will talk specifically about this issue. The first post will deal with the morphosyntactic reconstruction of the Indo-Europen gender system.

First of all, how do we define gender? The typical way in which this is done is to use the definition of agreement, which is visible on an agreeing article, adjective or verb. Normally, the gender system of a language is described in grammars, which is reflected in the dictionary of this language. However, this definition does not work for pronominal gender, which is more tricky. For defining pronominal gender, it is necessary to look at the occurrence of gendered forms in pronominal systems.

Gender is prototypically a property of nouns, and once the gender has been identified for all nouns in a language, an important issue is to try to define the underlying causes for gender assigment. There is plenty of research on this issue, both from a general typological perspective as well as with respect to individual languages.  According to the canonical gender literature (Corbett 1991, 2013, Corbett and Fraser 2000), there are three basic principles according to which gender is assigned in languages. These are phonological, morphological and semantic. A fundamental problem is that these rules typically compete in languages.

What is the situation in Indo-European?

Most languages have gender (masculine, feminine, neuter). No language has ”purely” phonological, morphological, or semantic assignment. Diachrony apparently plays a role: many language inherit larger or smaller parts of their gender system and gender assignment on nouns. Most languages have competing rules for assignment.

The next issue is the reconstruction of Indo-European gender. For the reconstruction of the Indo-European gender system, based on a morphological reconstruction of systems in the various branches, there are three proposed suggestions in the literature. The option suggested by Hermann Hirt in the 1930s (Hirt 1934, 1937) was that Indo-European had no gender, which then later developed into a three-gender system by means of grammaticalization. The reconstruction of Delbrück and Brugmann (Brugmann & Delbrück 1893, 1897, 1900) contained three genders, like Sanskrit, Classical Greek and Latin, which later was either preserved or collapsed into a masculine-feminine or a common-neuter system. However, Brugmann and Delbrück were uncertain about the feminine gender, basically due to the formal correspondence in the reconstructed state of the feminine and the neuter (the -h2- suffix). Based on this formal similarity between the collective/neuter and the feminine, as well as the shape of the system of Anatolian with a commune and a collective/neuter, later Indo-European scholars agree that Indo-European had a two-gender animate-inanimate system (which is reflected in the Anatolian system), which later developed into a sex-based gender system with an additional collective gender, the neuter (see Table 1) (Luraghi 1911, Matasović 2004).
Basically, the model of Hirt implies that gender evolved by grammaticalization, the Delbrück model that the three-gender system of Indo-European either remained or collapsed. However, we must remember that both these models were constructed before the discovery of Anatolian.
The mainstream model is based on an idea of a typological evolution of the gender systems, which moves from an animate - inanimate to a sexus-based system, which retains the difference between animacy in the masculine feminine and the difference between abstract and concrete in feminine-neuter (table 1).

In brief, the mainstream model supposes that there is:

Trace of the old system in languages Emergence of human~non-human distinction after the proto-language Emergence of an abstract~conctrete distinction of non-human gender after the proto-language Later mapping into a sexus-based system with retention of the concrete inanimate (neuter) Continuation of the ancient assignment principles in various languages
Table 1. The developmental phases of the Indo-European gender system according to the mainstram model (after Luraghi 2009). Stage 1 ANIMATE INANIMATE Stage 2 HUMAN ABSTRACT CONCRETE Stage 3 MASCULINE/FEMININE FEMININE NEUTER

The next issue in this process is to find out what happens if an evolutionary model is used for the reconstruction (Cathcart, Carling et al 2018, Carling 2019)? Gender reconstruction is an important question for evolutionary models, since the system reconstructed to Proto-Indo-European has been changed in most living languages (see Table 1).

I will discuss this issue in the next blogpost.

Prononminal gender systems in Indo-European languages.

The Swedish summer vacation is approaching, and I will go to Australia, among others to attend the International Conference on Historical Linguistics in Canberra, 1-5 July. I will give two talks, one about the evolution and tendencies of gender assignment in Indo-European, and one about the evolution and change of alignment in Indo-European. After the summer intermission I will return and write more about these two topics in different posts.
However, I will try (if I have time and possibility) to make an overview of some of the interesting talks from the ICHL conference. Therefore, stay tuned! Thanks to all readers and have a nice summer!

