Language Lab

Head: Prof. Dr. Caroline Sporleder
Team: Stefan Ziehe


The Language Lab is a platform for various activities in the field of digital language analysis. The working groups of Prof. Sporleder and Prof. Gipp (Computer Science) as well as Prof. Holler and Prof. Coniglio (Linguistics) are very active. The research training group "Form-Meaning-Mismatches" also works with digital methods of computational linguistics.

The Language Lab investigates how human language can be automatically processed and interpreted. It explores the mathematical and logical properties of natural language and develops algorithmic and statistical methods for automatic language processing based on language and text-based collections. These include text collections (e.g. newspaper and magazine texts or twitter posts), speech recordings (e.g. speeches or interviews) and corresponding experimental or measurement data (e.g. EEG, eye tracking, surveys, reaction times, etc.).

Usually, a corpus (e.g. of newspaper articles or speeches) is created for this purpose according to certain criteria and with a specific research goal. On this basis, one can, for example, carry out a lexical analysis of the word frequency distribution.
To do this, we use various Big Data techniques that use natural language processing (NLP), different types of algorithms and statistical methods to convert unstructured text into structured, normalised data. This is done with the aim of categorising texts (clustering), extracting meaning, classifying topics (topic modelling), modelling relationships and generating hypotheses. This also includes finding information in large amounts of linguistic data (text mining, information extraction) or automatically searching for relevant text passages (information retrieval). The subject matter can also include analyses of the people speaking or the sentiment detection (positive or nagative) of certain texts like reviews or tweets.

The automated analysis of texts with a political orientation has gained enormous importance in recent years. Such texts can be, for example, election programmes or government statements, but also social media data, such as tweets with political content. Typical analysis tasks are, for example, the automated detection of hate speech, the identification of extremely partisan content (hyperpartisan), the detection of "framing" and agenda setting, the analysis of knowledge enforcement processes or the evaluation of communication and argumentation structures.

These methods are not only of great relevance for linguistics, but for all text-based disciplines such as literary studies, philologies, philosophy and theology as well as language- and text-based research in history, social and political sciences.