Episode 29: From Bioinformatics to Natural Language Processing with Leonard Apeltsin

Published: March 13, 2020, 3:30 p.m.

Show Notes:

  • (2:18) Leonard discussed his undergraduate experience at Carnegie Mellon - where he studied Biology and Computer Science.
  • (5:10) Leonard decided to pursue a Ph.D. in Bioinformatics at the University of California - San Francisco.
  • (6:27) Leonard described his Ph.D. research that focused on finding hidden patterns in genetically-linked diseases.
  • (9:42) Leonard went deep into clustering algorithms (Markov Clustering and Louvain) and their applications such as protein and news article similarity.
  • (13:21) Leonard shared his story of starting a data science consultancy with various client startups.
  • (17:58) Leonard discussed the interesting consulting projects that he worked on: from detecting plagiarism to predicting bill insurance.
  • (22:04) Leonard shared practical tips to learn technical concepts.
  • (23:23) Leonard reflected on his experience working with a string of startups including Accretive Health, Quid, and Stride Health.
  • (26:06) Leonard is the founding team member of Primer AI, a startup that applies state-of-the-art NLP techniques to build machines that read and write, back in early 2015.
  • (30:31) Leonard discussed the technical challenges to develop algorithms that power Primer’s products to scale across languages other than English.
  • (34:28) Leonard unpacked his technical post "Russian NLP” on Primer’s blog.
  • (38:17) Leonard talked about the advances in the NLP research domain that he is most excited about in 2020 (XLNet >>> BERT).
  • (41:10) Leonard discussed the challenges of scaling the data-driven culture across Primer AI as the company grows.
  • (46:20) Leonard mentioned different use cases of Primer for clients in finance, government, and corporate.
  • (51:41) Leonard talked about his decision to leave Primer and become a Data Science Health Innovation Fellow at the Berkeley Institute for Data Science.
  • (54:30) Leonard went over applications of data science in healthcare that will be adopted widely in the next few years.
  • (1:02:45) Leonard discussed his process of writing a book called “Data Science Bookcamp.”
  • (1:07:21) Leonard revealed how he chose the case studies to be included in the book.
  • (1:10:27) Closing segment.

His Contact Info:

His Recommended Resources:

You can read the completed chapters of "Data Science Bookcamp" using the codes below:

  • Permanent discount code: poddcast19
  • 5 free eBook codes: dcdsprf-B373, dcdsprf-CA3B, dcdsprf-299E, dcdsprf-6E5, and dcdsprf-9660 (activated and will last for 2 months)