Reflecting On The Past 6 Years Of Data Engineering

Published: Feb. 6, 2023, 1 a.m.

Summary\n\n

This podcast started almost exactly six years ago, and the technology landscape was much different than it is now. In that time there have been a number of generational shifts in how data engineering is done. In this episode I reflect on some of the major themes and take a brief look forward at some of the upcoming changes.

\n\nAnnouncements\n\n
    \n
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • \n
  • Your host is Tobias Macey and today I'm reflecting on the major trends in data engineering over the past 6 years
  • \n
\n\nInterview\n\n
    \n
  • Introduction
  • \n
  • 6 years of running the Data Engineering Podcast
  • \n
  • Around the first time that data engineering was discussed as a role\n\n
      \n
    • Followed on from hype about "data science"
    • \n
  • \n
  • Hadoop era
  • \n
  • Streaming
  • \n
  • Lambda and Kappa architectures\n\n
      \n
    • Not really referenced anymore
    • \n
  • \n
  • "Big Data" era of capture everything has shifted to focusing on data that presents value\n\n
      \n
    • Regulatory environment increases risk, better tools introduce more capability to understand what data is useful
    • \n
  • \n
  • Data catalogs\n\n
      \n
    • Amundsen and Alation
    • \n
  • \n
  • Orchestration engine\n\n
      \n
    • Oozie, etc. -> Airflow and Luigi -> Dagster, Prefect, Lyft, etc.
    • \n
    • Orchestration is now a part of most vertical tools
    • \n
  • \n
  • Cloud data warehouses
  • \n
  • Data lakes
  • \n
  • DataOps and MLOps
  • \n
  • Data quality to data observability
  • \n
  • Metadata for everything\n\n
      \n
    • Data catalog -> data discovery -> active metadata
    • \n
  • \n
  • Business intelligence\n\n
      \n
    • Read only reports to metric/semantic layers
    • \n
    • Embedded analytics and data APIs
    • \n
  • \n
  • Rise of ELT\n\n
      \n
    • dbt
    • \n
    • Corresponding introduction of reverse ETL
    • \n
  • \n
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on running the podcast?
  • \n
  • What do you have planned for the future of the podcast?
  • \n
\n\nParting Question\n\n
    \n
  • From your perspective, what is the biggest gap in the tooling or technology for data management today?
  • \n
\n\nClosing Announcements\n\n
    \n
  • Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
  • \n
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • \n
  • If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
  • \n
  • To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
  • \n
\n\n

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Sponsored By:

Support Data Engineering Podcast