How Upsolver Is Building A Data Lake Platform In The Cloud with Yoni Iny - Episode 56

Published: Nov. 11, 2018, 9 p.m.

Summary\n\n

A data lake can be a highly valuable resource, as long as it is well built and well managed. Unfortunately, that can be a complex and time-consuming effort, requiring specialized knowledge and diverting resources from your primary business. In this episode Yoni Iny, CTO of Upsolver, discusses the various components that are necessary for a successful data lake project, how the Upsolver platform is architected, and how modern data lakes can benefit your organization.

\n\nPreamble\n\n
    \n
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • \n
  • When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute.
  • \n
  • Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch.
  • \n
  • Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
  • \n
  • Your host is Tobias Macey and today I’m interviewing Yoni Iny about Upsolver, a data lake platform that lets developers integrate and analyze streaming data with ease
  • \n
\n\nInterview\n\n
    \n
  • Introduction
  • \n
  • How did you get involved in the area of data management?
  • \n
  • Can you start by describing what Upsolver is and how it got started?\n
      \n
    • What are your goals for the platform?
    • \n
    \n\n


  • \n
  • There are a lot of opinions on both sides of the data lake argument. When is it the right choice for a data platform?
\n\n
    \n
  • What are the shortcomings of a data lake architecture?
  • \n
\n\n


\n
  • How is Upsolver architected?
  • \n\n
      \n
    • How has that architecture changed over time?
    • \n
    • How do you manage schema validation for incoming data?
    • \n
    • What would you do differently if you were to start over today?
    • \n
    \n\n


    \n
  • What are the biggest challenges at each of the major stages of the data lake?

  • \n
  • What is the workflow for a user of Upsolver and how does it compare to a self-managed data lake?

  • \n
  • When is Upsolver the wrong choice for an organization considering implementation of a data platform?

  • \n
  • Is there a particular scale or level of data maturity for an organization at which they would be better served by moving management of their data lake in house?

  • \n
  • What features or improvements do you have planned for the future of Upsolver?

  • \n\n\nContact Info\n\n\n\n\n\n


    \n\n\nParting Question\n\n
      \n
    • From your perspective, what is the biggest gap in the tooling or technology for data management today?
    • \n
    \n\nLinks\n\n\n\n

    The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

    Support Data Engineering Podcast