Episode 54: Information Retrieval Research, Data Science For Space Missions, and Open-Source Software with Chris Mattmann

Published: Feb. 4, 2021, 7 p.m.

Timestamps

  • (2:55) Chris went over his experience studying Computer Science at the University of Southern California for undergraduate in the late 90s.
  • (5:26) Chris recalled working as a Software Engineer at NASA Jet Propulsion Lab in his sophomore year at USC.
  • (9:54) Chris continued his education at USC with an M.S. and then a Ph.D. in Computer Science. Under the guidance of Dr. Nenad Medvidović, his Ph.D. thesis is called “Software Connectors For Highly-Distributed And Voluminous Data-Intensive Systems.” He proposed DISCO, a software architecture-based systematic framework for selecting software connectors based on eight key dimensions of data distribution.
  • (16:28) Towards the end of his Ph.D., Chris started getting involved with the Apache Software Foundation. More specifically, he developed the original proposal and plan for Apache Tika (a content detection and analysis toolkit) in collaboration with Jérôme Charron to extract data in the Panama Papers, exposing how wealthy individuals exploited offshore tax regimes.
  • (24:58) Chris discussed his process of writing “Tika In Action,” which he co-authored with Jukka Zitting in 2011.
  • (27:01) Since 2007, Chris has been a professor in the Department of Computer Science at USC Viterbi School of Engineering. He went over the principles covered in his course titled “Software Architectures.”
  • (29:49) Chris touched on the core concepts and practical exercises that students could gain from his course “Information Retrieval and Web Search Engines.”
  • (32:10) Chris continued with his advanced course called “Content Detection and Analysis for Big Data” in recent years (check out this USC article).
  • (36:31) Chris also served as the Director of the USC’s Information Retrieval and Data Science group, whose mission is to research and develop new methodology and open source software to analyze, ingest, process, and manage Big Data and turn it into information.
  • (41:07) Chris unpacked the evolution of his career at NASA JPL: Member of Technical Staff -> Senior Software Architect -> Principal Data Scientist -> Deputy Chief Technology and Innovation Officer -> Division Manager for the AI, Analytics, and Innovation team.
  • (44:32) Chris dove deep into MEMEX — a JPL’s project that aims to develop software that advances online search capabilities to the deep web, the dark web, and nontraditional content.
  • (48:03) Chris briefly touched on XDATA — a JPL’s research effort to develop new computational techniques and open-source software tools to process and analyze big data.
  • (52:23) Chris described his work on the Object-Oriented Data Technology platform, an open-source data management system originally developed by NASA JPL and then donated to the Apache Software Foundation.
  • (55:22) Chris shared the scientific challenges and engineering requirements associated with developing the next generation of reusable science data processing systems for NASA’s Orbiting Carbon Observatory space mission and the Soil Moisture Active Passive earth science mission.
  • (01:01:05) Chris talked about his work on NASA’s Machine Learning-based Analytics for Autonomous Rover Systems — which consists of two novel capabilities for future Mars rovers (Drive-By Science and Energy-Optimal Autonomous Navigation).
  • (01:04:24) Chris quantified the Apache Software Foundation's impact on the software industry in the past decade and discussed trends in open-source software development.
  • (01:07:15) Chris unpacked his 2013 Nature article called “A vision for data science” — in which he argued that four advancements are necessary to get the best out of big data: algorithm integration, development and stewardship, diverse data formats, and people power.
  • (01:11:54) Chris revealed the challenges of writing the second edition of “Machine Learning with TensorFlow,” a technical book with Manning that teaches the foundational concepts of machine learning and the TensorFlow library's usage to build powerful models rapidly.
  • (01:15:04) Chris mentioned the differences between working in academia and industry.
  • (01:16:20) Chris described the tech and data community in the greater Los Angeles area.
  • (01:18:30) Closing segment.

His Contact Info

His Recommended Resources