Guide to finding and using RESEARCH MATERIALS
The tips for users page is a guide to how to download and use data from the infrastructures related to my research projects, including DiACL - Diachronic Atlas of Comparative Linguistics and DiACL Zenodo community, as well as other relevant sources. The page is continuously changing, so stay updated for new additions.

If you have questions on how to use the data of DiACL or DiACL Zenodo, do not hesitate to contact me, gerd.carling@ling.lu.se or gerd.carling@gmail.com.

UPDATES

2020-05-11: Guide to linguistic metadata, connection to glottocodes, info of datasets from Mouton Atlas of Languages and Culture (2019).




General rules for using data from the DiACL homepage or the DiACL Zenodo community: all data, material and code are free of use, but has to be quoted by the reference of the database:
Carling, Gerd (ed.) 2017. Diachronic Atlas of Comparative Linguistics Online. Lund: Lund University. (DOI/URL: https://diacl.ht.lu.se/. Accessed on: x.).

LINGUISTIC METADATA

Linguistic metadata is very important for using linguistic data. The linguistic metadata of DiACL can be achieved in various ways. Via the Language Index page, you can download all metadata of the database as xml (xml icon on top of page), or the data for each language (xml icon in the right-most column). This metadata includes name, ISO 693-3 code, alternative names, location (focal point), time frame, language area, reliability, and associated tree node.

Family information of the database can be achieved in JSON format via the Language Tree page, which gives all family trees (json icon on top), or by family (in the drop-down menu for each family).

Glottolog is the most important source for linguistic metadata. Unfortunately, glottocodes are not in the DiACL database, but the preparation of DiACL data for CLICS, published by Rzymski et al (2019) and stored as a Zenodo library gives the equivalent glottonames and glottocodes for most (but not all) DiACL languages.  

Grammar data

Grammar data on the DiACL database is divided by macro-regions. Currently, there are three regions, Eurasia, Pacific, and South America, and of these, Eurasia is most well provided with data. The raw data sets can be downloaded here, which will render an xml file with all the data and the meta-data for languages.

The data is different from several other similar resources, such as WALS, mainly since the data is organized according to hierarchical categorical features. The hierarchical model is described in several publications, such as Carling et al (2018), and Carling (2019) Mouton Atlas of Languages and Cultures.

The prepared grammar files, used for the maps, graphs, and visualizaitons of Mouton Atlas of Languages and Culture (2019) are available on the DiACL Zenodo Library. Grammar data from this publication is given as Appendix 2b and 2c (Appendix 2a of the volume is a list of features, which can be downloaded on the webpage). Appendix 2a gives the grammar features as state combinations with a labels, as they are used in maps of the atlas. Appendix 2c gives the state combinations of 2b, as they show up in languages. Check the atlas, pp.211-225.