Companies competing in the chatbot wars are using something known in the industry as \u201cthe Pile\u201d to train their large language models. It\u2019s a trove of open-source data made up of text scraped from all around the internet, including Wikipedia and the European Parliament. Annie Gilbertson, investigative reporter for Proof News, recently took a deep dive into the Pile and discovered something else: a dataset called \u201cYouTube Subtitles.\u201d Marketplace\u2019s Lily Jamali spoke with Gilbertson about her investigation and how YouTube creators feel about their content being used without their consent.