Over the past two weeks I have presented two future-casts. The first involves the ubiquitous appearance of AGI inside of ten years. The second concerns the tipping-point appearance of a Virtual Patient for drug development. This week is the third and final installment of the deep-tech incubation game.
I think we will have an Excel for Data Science within the next 3-5 years.
Big software
Stay with me for a moment. I know this is a deeply unsexy idea.
Once upon a time in software engineering it was common to build increasingly complex software. Microsoft’s Excel was the pinnacle achievement of that age. I actually met the main product owner of Excel, in the 1990’s period, in recent years and I think he really achieved something.
It has become fashionable in the past 20 years to look-down on ambitious software projects. It is a true-ism that you can capture 80% of the financial rewards by delivering on only 20% of the features. We’ve become ‘agile’ and software is eating the world.
But we also have shutdown the space shuttle program and have so-far failed to return to the moon.
Sometimes ambition requires excellence
Data science has reached the limits of its abilities in its current configuration.
Every data scientist I know plays the exact same thought experiment of finding their ideal configuration in a large company (or consultancy) context. The answer is, there is no stable equilibrium.
Good data scientists are worth their weight in gold for a short period and then, in an ideal world, they should move on. But delivering on good data science is not compatible with this trajectory. The data scientist needs to feel bound to their employer in order to deliver.
So you have a mis-alignment of interests.
Delivering on data science beyond the level of basic business intelligence requires considerable technical expertise. But the return on this investment is incremental at best.
Outsource the engineering
The only solution which I can come up with is to outsource the engineering. The heavy-lifting of data science must be done once to a high standard. This will allow the in-house, or consultant, data scientists to deliver much faster insights without spending months on engineering.
To a certain extent this has been the purported offering of Tableau Software. They deliver graphical dashboards which knowledgeable insiders can manipulate and derive insights from. I’m onboard with that for what it is worth. But they seem to underdeliver. I don’t use their software so I don’t want to point fingers, but something is clearly missing.
AWS is trying the tech-heavy approach to what I am describing. For a long time they avoided the ‘data space’ but clearly they are now deeply involved in developing plug-and-play data backend offerings. I think the AWS approach is still a few steps away from the ubiquity of Excel which I am claiming will conquer this space.
Excel became the one and only software which people doing basic math needed to understand. Whoever wants to conquer data needs to aim for this level of ubiquity. The technical commitment required is expensive and otherwise unsustainable. They also need to consider offering approaches which unlock data siloes – I’m thinking data federation tools – otherwise they (AWS in particular) can only operate on niche data.
Without the appearance of an Excel for Data Science I think we are doomed to repeatedly reengineering the wheel, in this case the data science backend. This makes data science unacceptably expensive and thus unsustainable. By offloading the costs to a single provider, who owns the market, the ongoing use of data science becomes viable for considerably longer.