Beyond the Patterns - Karen Livescu (TTI Chicago): Learning Speech Models from Multi-modal Data

45 - Beyond the Patterns - Karen Livescu (TTI Chicago): Learning Speech Models from Multi-modal Data - A survey/ClipID:39111 vorhergehender Clip nächster Clip

Geschützte Daten

Zugriff/Freigabe nur für Mitglieder der FAU. Bitte klicken Sie hier für die Anmeldung

Aufnahme Datum 2021-12-09

Sprache

Englisch

Einrichtung

Lehrstuhl für Informatik 5 (Mustererkennung)

Produzent

Friedrich-Alexander-Universität Erlangen-Nürnberg

Format

Vorlesung

Typ

universitäre Vorlesung

We have the great honor to welcome Karen Livescu to our lab for an invited presentation!

Abstract: Speech is usually recorded as an acoustic signal, but it often appears in context with other signals. In addition to the acoustic signal, we may have available a corresponding visual scene, the video of the speaker, physiological signals such as the speaker’s movements or neural recordings, or other related signals. It is often possible to learn a better speech model or representation by considering the context provided by these additional signals, or to learn with less training data. Typical approaches to training from multi-modal data are based on the idea that models or representations of each modality should be in some sense predictive of the other modalities. Multi-modal approaches can also take advantage of the fact that the sources of noise or nuisance variables are different in different measurement modalities, so an additional (non-acoustic) modality can help learn a speech representation that suppresses such noise. This talk will survey several lines of work in this area, both older and newer. It will cover some basic techniques from machine learning and statistics, as well as specific models and applications for speech.

Short Bio: Karen Livescu is an Associate Professor at TTI-Chicago. She completed her PhD in electrical engineering and computer science at MIT. Her main research interests are in speech and language processing, as well as related problems in machine learning. Some specific interests include multi-view representation learning, visually grounded speech models, acoustic word embeddings, new models for speech recognition and understanding, unsupervised and weakly supervised models for speech and text, and sign language recognition from video. Her professional activities include serving as a program chair of ICLR 2019, ASRU 2015/2017/2019, and Interspeech 2022, and on the editorial boards of IEEE OJ-SP and IEEE TPAMI. She is an ISCA fellow and an IEEE SPS Distinguished Lecturer.

References

Karen's Talk at Interspeech

This video is released under CC BY 4.0. Please feel free to share and reuse.

For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.

Music Reference:
Damiano Baldoni - Thinking of You (Intro)
Damiano Baldoni - Poenia (Outro)

Nächstes Video

46 - Beyond the Patterns - Mirco Ravanelli (MILA): The Speech Brain Project

Prof. Dr. Andreas Maier

2021-12-09

Frei

47 - Beyond the Patterns - Heidi Christiansen (U Sheffield): Automated Processing of Pathological Speech: Recent Work and ongoing Challenges

Prof. Dr. Andreas Maier

2021-12-09

Frei

48 - Beyond the Patterns - Robert Sablatnig (TU Vienna): Multispectral Imaging and Writer Identification for Historical Manuscripts

Prof. Dr. Andreas Maier

2023-05-31

Frei

49 - Beyond the Patterns - Farah Deeda (U British Colombia) – Understanding the Placenta: Towards an Accessible and Effective Pregnancy System

Prof. Dr. Andreas Maier

2023-06-01

Frei

50 - Beyond the Patterns - Prof. Dr. Jonghyun Choi (Yonsei University) – Continual Learning in Practical Scenarios

Prof. Dr. Andreas Maier

2023-11-08

Frei

Mehr Videos aus der Kategorie "Technische Fakultät"

16 - Halbleitertechnik II - CMOS Technik (HL II) WS2024/2025

2025-02-03

Passwort / Studon

geschützte Daten

12 - Werkstoffkunde und Technologie der Metalle

2025-01-31

Passwort

geschützte Daten

12 - Cyber-physical Systems (Übung), Wintersemester 2024/25

2025-02-03

Studon

geschützte Daten

13 - Cyber-physical Systems (Übung), Wintersemester 2024/25

2025-02-07

Studon

geschützte Daten

11 - Übung zu Sensorik

2025-01-30

Studon

geschützte Daten

13 - Grundlagen der Informatik

2025-01-31

Studon

geschützte Daten

45 - Beyond the Patterns - Karen Livescu (TTI Chicago): Learning Speech Models from Multi-modal Data - A survey/ClipID:39111 vorhergehender Clip nächster Clip

Kurs-Verknüpfung

Lehrende(r)

Zugang

Sprache

Einrichtung

Produzent

Format

Typ

Nächstes Video

Mehr Videos aus der Kategorie "Technische Fakultät"