How Upsolver Is Building A Data Lake Platform In The Cloud with Yoni Iny - Episode 56

Published: Nov. 11, 2018, 9 p.m.

b'

Summary

\\n\\n

A data lake can be a highly valuable resource, as long as it is well built and well managed. Unfortunately, that can be a complex and time-consuming effort, requiring specialized knowledge and diverting resources from your primary business. In this episode Yoni Iny, CTO of Upsolver, discusses the various components that are necessary for a successful data lake project, how the Upsolver platform is architected, and how modern data lakes can benefit your organization.

\\n\\n

Preamble

\\n\\n
    \\n
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • \\n
  • When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute.
  • \\n
  • Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch.
  • \\n
  • Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
  • \\n
  • Your host is Tobias Macey and today I’m interviewing Yoni Iny about Upsolver, a data lake platform that lets developers integrate and analyze streaming data with ease
  • \\n
\\n\\n

Interview

\\n\\n
    \\n
  • Introduction
  • \\n
  • How did you get involved in the area of data management?
  • \\n
  • Can you start by describing what Upsolver is and how it got started?\\n
      \\n
    • What are your goals for the platform?
    • \\n
    \\n\\n


  • \\n
  • There are a lot of opinions on both sides of the data lake argument. When is it the right choice for a data platform?

    \\n\\n
      \\n
    • What are the shortcomings of a data lake architecture?
    • \\n
    \\n\\n


  • \\n
  • How is Upsolver architected?

    \\n\\n
      \\n
    • How has that architecture changed over time?
    • \\n
    • How do you manage schema validation for incoming data?
    • \\n
    • What would you do differently if you were to start over today?
    • \\n
    \\n\\n


  • \\n
  • What are the biggest challenges at each of the major stages of the data lake?

  • \\n
  • What is the workflow for a user of Upsolver and how does it compare to a self-managed data lake?

  • \\n
  • When is Upsolver the wrong choice for an organization considering implementation of a data platform?

  • \\n
  • Is there a particular scale or level of data maturity for an organization at which they would be better served by moving management of their data lake in house?

  • \\n
  • What features or improvements do you have planned for the future of Upsolver?

  • \\n

\\n\\n

Contact Info

\\n\\n

\\n\\n

Parting Question

\\n\\n
    \\n
  • From your perspective, what is the biggest gap in the tooling or technology for data management today?
  • \\n
\\n\\n

Links

\\n\\n\\n\\n

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

'