Yle Open Data for research, development and tinkering

Downloadable audiovisual data, subtitles datasets and word vector data.
Table icon with Yle logo

Audiovisual data and subtitles datasets

Latest Release v1.1 (Thu November 4th, 2021)

Yle has released three datasets with an experimental license for a limited amount of time to support the development of language and media related technologies. These datasets were originally created by the MeMAD research and innovation project, a collaboration between media industry members and research groups.

Read more about Audiovisual data and subtitles dataset

Word Vector data

Latest Release v2 (Thu May 9th, 2019)

The word vectors can be used for commercial purposes, but Yle should be mentioned as one of the sources, yet in a way that is not related for selling, advertising or promoting your products.

Read more about Word Vectors