Vivi Nastase
Here be dragons: Looking for linguistic information in sentence embeddings
Abstract: At the heart of modern language processing lies a fascinating construct: word and sentence embeddings. They are arrays of real numbers, that we can view as coordinates in a mathematical space where meaning is encoded through topological properties. This abstract geometry has powered some of the most impressive advances in AI, including the rise of generative models. But what is inside these embeddings? How do numbers capture the structure of sentences or the web of relations we express through them? What does the landscape of this sentence space look like? In this talk, I’ll share some of our explorations into these fascinating representations -- what patterns we uncovered, where we tracked them down, and what this tells us about how large language models encode language.
Vivi Nastase is a researcher in natural language processing and machine learning at the Idiap Institute (Martigny)/University of Geneva. Vivi graduated from the Computer Science department of the Technical University of Cluj-Napoca many years ago, obtained a PhD in natural language processing from the University of Ottawa, and has been working on numerous projects around the world ever since. Since 2021, she has been part of the NCCR Evolving Language project funded by the Swiss government, which studies the evolution of languages in human and non-human animals. She works now mainly on understanding how the linguistic properties of words and sentences are mapped onto small-ish arrays of real numbers.