Tan Vachiramon - Choosing the right algorithm for your real-world problem

Published: July 16, 2019, 3:04 a.m.

b'

You import your data. You clean your data. You make your baseline model. 

\\n

Then, you tune your hyperparameters. You go back and forth from random forests to XGBoost, add feature selection, and tune some more. Your model\\u2019s performance goes up, and up, and up.

\\n

And eventually, the thought occurs to you: when do I stop?

\\n

Most data scientists struggle with this question on a regular basis, and from what I\\u2019ve seen working with SharpestMinds, the vast majority of aspiring data scientists get the answer wrong. That\\u2019s why we sat down with Tan Vachiramon, a member of the Spatial AI team Oculus, and former data scientist at Airbnb. 

\\n

Tan has seen data science applied in two very different industry settings: once, as part of a team whose job it was to figure out how to understand their customer base in the middle of a the whirlwind of out-of-control user growth (at Airbnb); and again in a context where he\\u2019s had the luxury of conducting far more rigorous data science experiments under controlled circumstances (at Oculus). 

\\n

My biggest take-home from our conversation was this: if you\\u2019re interested in working at a company, it\\u2019s worth taking some time to think about their business context, because that\\u2019s the single most important factor driving the kind of data science you\\u2019ll be doing there. Specifically:

\\n
    \\n
  • Data science at rapidly growing companies comes with a special kind of challenge that\\u2019s not immediately obvious: because they\\u2019re growing so fast, no matter where you look, everything looks like it\\u2019s correlated with growth! New referral campaign? \\u201cThat definitely made the numbers go up!\\u201d New user onboarding strategy? \\u201cWow, that worked so well!\\u201d. Because the product is taking off, you need special strategies to ensure that you don\\u2019t confuse the effectiveness of a company initiative you\\u2019re interested in with the inherent viral growth that the product was already experiencing. 
  • \\n
  • The amount of time you spend tuning or selecting your model, or doing feature selection, entirely depends on the business context. In some companies (like Airbnb in the early days), super-accurate algorithms aren\\u2019t as valuable as algorithms that allow you to understand what the heck is going on in your dataset. As long as business decisions don\\u2019t depend on getting second-digit-after-the-decimal levels of accuracy, it\\u2019s okay (and even critical) to build a quick model and move on. In these cases, even logistic regression often does the trick!
  • \\n
  • In other contexts, where tens of millions of dollars depend on every decimal point of accuracy you can squeeze out of your model (investment banking, ad optimization), expect to spend more time on tuning/modeling. At the end of the day, it\\u2019s a question of opportunity costs: keep asking yourself if you could be creating more value for the business if you wrapped up your model tuning now, to work on something else. If you think the answer could be yes, then consider calling model.save() and walking away.
  • \\n
\\n


'