Keynote Speakers - 2025 IEEE 21st International Conference on Intelligent Computer Communication and Processing

Vivi Nastase
Unraveling the multi-dimensional dot: Mapping linguistic information in transformer-based sentence embeddings

Vivi Nastase

Abstract: At the heart of modern language processing lies a fascinating construct: word and sentence embeddings. They are arrays of real numbers, that we can view as coordinates in a mathematical space where meaning is encoded through topological properties. This abstract geometry has powered some of the most impressive advances in AI, including the rise of generative models. But what is inside these embeddings? How do numbers capture the structure of sentences or the web of relations we express through them? What does the landscape of this sentence space look like? In this talk, I’ll share some of our explorations into these fascinating representations -- what patterns we uncovered, where we tracked them down, and what this tells us about how large language models encode language.

Vivi Nastase is a researcher in natural language processing and machine learning at the Idiap Institute (Martigny)/University of Geneva. Vivi graduated from the Computer Science department of the Technical University of Cluj-Napoca many years ago, obtained a PhD in natural language processing from the University of Ottawa, and has been working on numerous projects around the world ever since. Since 2021, she has been part of the NCCR Evolving Language project funded by the Swiss government, which studies the evolution of languages in human and non-human animals. She works now mainly on understanding how the linguistic properties of words and sentences are mapped onto small-ish arrays of real numbers.

Vivi NastaseUnraveling the multi-dimensional dot: Mapping linguistic information in transformer-based sentence embeddings

Vivi Nastase
Unraveling the multi-dimensional dot: Mapping linguistic information in transformer-based sentence embeddings