Thoughts on machine learning, feature engineering, and data pipelines for informed prediction, analytics, and insight from a summer internship.
I had a blast working for Topo Solutions this summer.
My role was in data science, and I had the flexibility of using this time to learn, experiment, and apply modern machine learning on client data.
I ended up with some fantastic results that even surprised me. The models will go on to assist the company push forward in applications of AI and predictive modeling. All this aside, taking the product-oriented approach to machine learning made me think a lot about business value…
Here are my two biggest takeaways on what is changing in the “data-driven” business:
1) Level 1 chaos is a growth strategy.
First Order Chaos doesn’t respond to prediction. If you predict the weather to some level of accuracy, that prediction will hold because the weather doesn’t adjust based on the prediction itself. Second Order Chaos is infinitely less predictable because it *does *respond to prediction. Examples include things like stocks and politics.
Growth strategies often fall into buckets tending to network effects, stickiness, pricing power, ecosystem lock, etc….
And almost universally today, clients generate data that can be looped back into the company’s product — feedback, A/B testing, feature request, etc….
If a predictive model can add value to the business through either the product or sales, then a level 1 chaos system is desirable. It proves iterative growth is accessible, and this, in turn, means greater future cash flow.
Say a bakery generates client profiles and collects buying habits for everyone coming in to purchase baked goods (a sort of silicon valley mutant of a bakery). We understand that the chaos of this system is first-order — reading this data and predicting what a customer will buy next time will not influence future sales (and the primary market is wholly owned by the business so information liquidity is low).
The bakery can iteratively use this data to improve (in a provable way). Some ideas might be using A/B testing to better the taste of the pastries that are most commonly sold, or lower the price of less desirable (but still worth producing) items.
The business product can improve and generate more cash.
2) Friction starting in the pipeline.
You could spend days reading about “zero friction” in business, but the underlying point is that we lose business value when both we and our customers waste time.
As an existing business becomes more data-driven, likely, the collected data and data collection policy will not scale perfectly (i.e. needing more columns, optimal storage format, etc…). If this is left undealt, the post-pipeline stages are hindered.
Complex data sources have necessitated data collection standards, which has reduced friction on the software implementation level. Still, inaccurate tagging, foreign importing, etc… will still slow down a good system.
When it becomes slow, expensive, or time-consuming to work with the data, data-driven decision making becomes less likely as people become impatient, and thus irrational.
For example, imagine we want to produce real-time analytics for customers using a platform we built to track study habits. If we allow 100% free-reign customization of the platform for the customers, we cannot quickly ingest the aggregate data for analytics, as we have no clue how the customer is collecting their data. We’ll have to rely on clustered reporting and hand-tuned pipelines, slowing down the entire data operation of the business on our end. We can improve this by still allowing maximum customization, but offering some sort of pre-determined tagging as an opt-in to facilitate business goals with the data.
This all can seem obvious, but knowing where friction is easy to find and diagnose makes business growth more realistic to foresee and depend on.
Thank you for reading, hope you found some value!