107. Kevin Hu - Data observability and why it matters

Published: Dec. 15, 2021, 4:30 p.m.

b'

Imagine for a minute that you\\u2019re running a profitable business, and that part of your sales strategy is to send the occasional mass email to people who\\u2019ve signed up to be on your mailing list. For a while, this approach leads to a reliable flow of new sales, but then one day, that abruptly stops. What happened?

\\n

You pour over logs, looking for an explanation, but it turns out that the problem wasn\\u2019t with your software; it was with your data. Maybe the new intern accidentally added a character to every email address in your dataset, or shuffled the names on your mailing list so that Christina got a message addressed to \\u201cJohn\\u201d, or vice-versa. Versions of this story happen surprisingly often, and when they happen, the cost can be significant: lost revenue, disappointed customers, or worse\\u200a\\u2014\\u200aan irreversible loss of trust.

\\n

Today, entire products are being built on top of datasets that aren\\u2019t monitored properly for critical failures\\u200a\\u2014\\u200aand an increasing number of those products are operating in high-stakes situations. That\\u2019s why data observability is so important: the ability to  track the origin, transformations and characteristics of mission-critical data to detect problems before they lead to downstream harm.

\\n

And it\\u2019s also why we\\u2019ll be talking to Kevin Hu, the co-founder and CEO of Metaplane, one of the world\\u2019s first data observability startups. Kevin has a deep understanding of data pipelines, and the problems that cap pop up if you they aren\\u2019t properly monitored. He joined me to talk about data observability, why it matters, and how it might be connected to responsible AI on this episode of the TDS podcast.

\\n

Intro music:

\\n

\\u279e Artist: Ron Gelinas

\\n

\\u279e Track Title: Daybreak Chill Blend (original mix)

\\n

\\u279e Link to Track: https://youtu.be/d8Y2sKIgFWc 0:00

\\n

Chapters

\\n
    \\n
  • 0:00 Intro
  • \\n
  • 2:00 What is data observability?
  • \\n
  • 8:20 Difference between a dataset\\u2019s internal and external characteristics
  • \\n
  • 12:20 Why is data so difficult to log?
  • \\n
  • 17:15 Tracing back models
  • \\n
  • 22:00 Algorithmic analyzation of a date
  • \\n
  • 26:30 Data ops in five years
  • \\n
  • 33:20 Relation to cutting-edge AI work
  • \\n
  • 39:25 Software engineering and startup funding
  • \\n
  • 42:05 Problems on a smaller scale
  • \\n
  • 46:40 Future data ops problems to solve
  • \\n
  • 48:45 Wrap-up
  • \\n
'