CIO

Steve Jones

Data science is easy; making it work is hard

In the world of data science, there are three core problems: acquiring data, doing the math and taking action. Two of those drive data scientists crazy; the other one they find easy.

“Doing the math” is what most people think of as data science. Algorithms, machine learning, cognitive tools, deep learning and the word stochastic are often not far away. That’s the easy bit.

Now let me define easy:

Data science is easy if you have the right data scientists.

I am not in any way saying that the complex discipline known as data science is easy or that becoming a proper data scientist is simple. It’s not, and the mathematics is well beyond my understanding.

Data scientists are smart people. If you get them the data, they can create a model that delivers value where there is value to be had. There is nothing more frustrating to a data scientist than being able to do the math but having neither the data to run it against nor the ability for it to be used.

That first bit – acquiring data – can be a big “if” for many organizations. On several occasions, I’ve had companies that truly wanted to practice data science but were hindered because they couldn’t get the right data. Or if they had the right data for today, they didn’t have the proper data history required to create models and undertake machine learning. It’s here that business data lakes are often created to provide that foundation for information.

The last step can often be the most challenging, however, because it’s there that predictive has to be turned into prescriptive, where a view of the future has to be turned into a decision in the present, and where analytical models are turned into an outcome that is delivered via a transactional system. This requires modifying transactional systems and making them beholden to analytical systems. This is not something many IT estates are capable of doing, because data estates are set up to run in batch, while transactional systems require fast responses. But that’s only the first challenge. The second challenge is how to turn a probabilistic model into a specific set of actions.

Imagine a scenario similar to a crane operator loading a super container ship at a port. That’s a very hard challenge because of the ship’s size, with a huge amount of mathematics that creates a series of probable most-optimal loadings. Crane operators, however, need to know none of what is going on in the background and would understand none of the mathematics. They simply need to know “this container goes there, that container goes here.” Thus, complex math has to be distilled down to an action. 

With data science, it’s always smart to look backward from the point of action, from the decision being made to understand whether your systems are yet capable of handling the interaction. By focusing first on the decisions, and how they are being made, it becomes possible to understand where data science can truly add value. There are few things I find more frustrating than data scientists identifying value that could be delivered, but then realizing that there is no way to actually deliver it. Another common mistake is not identifying where inefficiencies are beyond your control – the delivery time from your logistics supplier, for example. No analytics will improve that.

So data science is easy. Making it actionable is the hard part. To ensure that you can deliver value, follow this simple checklist:

  1. What are the crucial decision points that drive business effectiveness or inefficiency?
  2. How are those decisions made today?
  3. Can those decisions be replaced with analytical input?
  4. Could that analysis result in an improved outcome?
  5. Do we have the data required to create the analytical models?

If each answer to those questions is positive, then you have a chance to not only create data science, but also make it actionable. Only then will you understand that data science is easy only when you have the right data scientists.

 

This article was written by Steve Jones from CIO and was legally licensed through the NewsCred publisher network.