Language Technology: Computational, linguistic and cognitive perspectives

We work in the interdependent areas of computational linguistics, natural language processing (NLP) and cognitive modelling with the long-term goal of enriching computational language models with linguistic, cultural and cognitive knowledge.

About the unit

The group investigates how current language models can be enriched with linguistic information, cognitive signals such as eye-tracking traces, and visual data (combining text with images, or speech with gesture). This research attempts to ground NLP models with data outside of the textual domain and has important societal applications, for instance in mental healthcare.

Group leader

Additional information

Another challenge concerns the language technology gap that exists for smaller languages, particularly Danish, where we focus on methods to develop high-quality datasets and lexical-semantic resources that can encompass linguistic and cultural traits typical for the Danish language community.

We also work on adapting NLP techniques to data relevant for digital humanities, an emerging field where increasingly more material, including text in historic precursors of modern languages, is becoming digitised and therefore amenable to digital processing.

Research areas

Computational Cognitive Modelling and Multimodality

In our approach to computational cognitive modelling, we focus on models of language and speech processing, and the extent to which these models make use of cognitive signals and data from the visual gestural modality, and combine them with linguistic knowledge.

An application of this area we investigate is the development of tools supporting detection and monitoring of mental health conditions including psychosis, depression and cognitive decline.

Methodologies for Developing NLP Language Resources and Benchmark Data

We develop methods for compiling NLP resources and language models that encompass cultural and societal diversity – with a particular focus on Danish and other low and medium-resourced languages.

We perform culture-aware evaluation of language models and provide benchmark data for Danish that span linguistic and lexical variety. We work with annotated datasets, including in particular lexical-semantic lexicons and corpora, also across languages.

Representation Learning for NLP

We contribute to explainable artificial intelligence by exploring the internal representations and learning mechanisms in computational models of natural languages.

In particular, through a multilingual approach, we investigate how language models cope with linguistic diversity at various architectural and linguistic levels.

Our research seeks to improve the transparency and performance of multilingual language models and ensure more robust and accurate multilingual language processing.

NLP and Digital Humanities

This area includes computational NLP models for the analysis and generation of textual data in its widest forms including poems, novels, letters, manuscripts, news articles, scientific articles and lyrics.

We perform a continuous upgrade of our NLP pipelines and corpus tools, in particular for Danish, and work to ensure that appropriate methods and gold standards are compiled for evaluating them.

This research opens many collaboration opportunities with researchers from the Humanities at large, and NorS in particular.

Associated researchers

  • Kasper Boye
  • Alexander Conroy
  • Philip Diderichsen 
  • Dorthe Duncker
  • Ruben Schachtenhaufen
  • Daniel Hershcovich (DIKU). 

Contact

Patrizia Paggio
Associated professor, group leader