Broadcasts.com - "Building the howto100m Video Corpus" (Data Skeptic)

Science
RELATED

Building the howto100m Video Corpus

Published: Aug. 19, 2019, 8:12 p.m.

Video annotation is an expensive and time-consuming process. As a consequence, the available video datasets are useful but small. The availability of machine transcribed explainer videos offers a unique opportunity to rapidly develop a useful, if dirty, corpus of videos that are "self annotating", as hosts explain the actions they are taking on the screen.

This episode is a discussion of the\xa0HowTo100m\xa0dataset - a project which has assembled a video corpus of 136M video clips with captions covering 23k activities.