Abstract Euralex 2018

Abstract

The project VerbaAlpina funded by the German Research Foundation explores the Alpine region selectively and analytically in its historico-cultural and historical linguistic unity. This region is characterised by its ethnographic and topographic homogeneity and at the same time by its strong linguistic heterogeneity (different languages in form of different dialects). In three stages (stage 1: 2014 – 2017, stage 2: 2017 – 2020, stage 3: 2020 – 2023) the project elaborates the linguistic reality of three conceptual domains: stage 1: alpine pasture, stage 2: flora and fauna, stage 3: modern alpine life. With the innovative approach VerbaAlpina overcomes the traditional limitation of geolinguistic investigation to nation-states. To reach this aim, an extensive portal, which goes beyond single languages and national borders, is built using modern media technology (database, geocoding, internet, social software).

One of the main functional areas is lexicographic function embedded in the virtual research environment of VerbaAlpina. It offers new possibilities to gather, elaborate, access and visualize lexical data.Whereas the most of the dictionaries barely offer one point of view on the subject, VerbaAlpina allows exploring the language from both: onomasiological and semasiological perspective. To reach that aim the already existing data from language atlases (such as AIS, ASLEF, ALF), dictionaries (ALTR, WBÖE), in both digital and analogue form as well as the newest data harvested with the crowdsourcing tool undergo to a process of systematic elicitation of data to fit it to the unified and structured data entities in means of the relational database. The process can be divided into three major steps:

Transcription/Data transfer: The linguistic material is being brought into the structure of the relational database. Depending on the type of the source, the data is being transcribed or digitally transferred to the database. Beginning from this point and following the principle of Quellentreu, all the input data can be retraced to the original raw data and to the source. This step provides also the elicitation of concepts related to the source data.
Tokenization: It consists in the automatic separation of the gathered utterances in single tokens. Doing so complex lexemes are split into single words whereby the grammatical and lexical units are separated from each other prior to the typification. Also at this stage, all the tokens, which are the voice attestations are converted into IPA to achieve kind of comparable data.
Typification: The last step consists in morpho-lexical typification. By that, we mean the allocation of single tokens to the morpho-lexical types, which can be seen as a lemma in a dictionary. The categories used to create a morpho-lexical type are language family (Romance, Germanic, Slavic), part of speech, affix, genus. Additionally, the lexical basis and its source language are determined.

As an outcome of that process, one can access and analyze the data two ways, through database queries and through the usage of an interactive card. The advantages of direct database query consist in easy and most precise structured access to all linguistic data gathered through the project. Although the queries make it possible to freely “ask” any question to the database, some knowledge of SQL is necessary. The second way, let say more user-friendly, is the presentation of a data in a geographical context. The interactive map gives to the user a number of choices and filters to visualize the linguistic data, as mentioned before in both ways: semasiologic and onomasiologic. The power of interactive map consist also in the possibility of bringing together linguistic and non-linguistic data such as infrastructure, historic data etc. The latest can be seen as an additional interpretative information. Every cartographic presentation created by the user can be stored in a database as a so called synoptic map and be easily retrieved later on.

The project is open for all kind of cooperation and is already working with numerous partners from the alpine region. Everyone is welcome to participate in the development of data stock and on creating of the network of projects and users interested in research subjects or in our technology. Therefore VerbaAlpina offers different tools for scientists and regular language users. The added value consists on the one side in the multi-directionality of the project which collects, documents and disseminates structured linguistic and ethnographic data. For example, it is possible for already existing online dictionaries to connect directly to the interactive card and give the reader the opportunity to look at the book page words in geographical context. On the other side the project make every effort to provide innovative online publishing platform with solutions focused on sustainabilty and citability.