If your data lake turned into a data swamp, it might be time to take the next step

0 Posted by - 21st January 2017 - Technology

As the the leader of Deloitte’s analytics practice, Paul Roma directs the company’s analytics offerings across all businesses, so he sees companies struggling with a range of issues.  Network World Editor in Chief John Dix recently talked to Roma about everything from what analytics problems companies are facing (Hint: the swamp reference above), to tools that help extract more value (cognitive analytics and machine learning), and even the executive management roles that are evolving (the title doesn’t matter much, but ownership of the problem does). 

paul roma Deloitte

Paul Roma, Chief Analytics Officer, Deloitte

What do customers typically bring you in to address?  Are they looking to solve a specific problem, or are they trying to address bigger picture, overarching analytic issues?

Most routinely we are brought in to work on a specific business outcome.  A customer may be looking to, say, improve their consumer Net Promotion Scores (NPS), which is an industry norm for scoring a consumer’s relation to a particular corporation and its products. It’s called Net Promotions because it’s a heuristic that adds together several factors and builds up to a way to judge yourself.  Or a healthcare organization may come to us to help improve outcomes in certain healthcare protocols, so we’re usually talking about business outcomes.

When you arrive do you find companies have the analytic tools they need, or are they looking for input on new tech as well?

Larger customers have the analytic tools.  There isn’t a company we go into that doesn’t have one of everything.  The question is more of focused usage. It isn’t a shortage of data, either, because they’ve got tons of data.  It’s normal now to have data warehouses or data lakes that have been put together over the years.  But I’ve seen multiple millions of dollars spent on data lakes that become what I call data swamps.  They’ve spent all this money to put everything together, yet they can’t do anything with it. The major question now is how to use the data to get better outcomes.

Given there is so much data and so many different tools available to make sense of it, how do you go about helping clients move forward?

I offer up three ways to think about it.  First, if you’re grounded in an outcome it leads you to certain questions to interrogate the problem.  If I want to improve consumer relations or if I want to improve outcomes in healthcare, you’re at least grounded in what you want to do.  As you analyze the data the experience leads you to create certain domains and take what is a very unstructured data lake and start to apply structured boundaries.

Once you’ve done that you can start to use more advanced tools like cognitive analytics to apply structure to the data lakes, and natural language processing and machine learning to give you a way of letting the data give you hypotheses. 

The advanced techniques have gone beyond putting out a report and then looking at the graph to see what it says.  Now machine learning can actually create causal analysis and tell you what the hypotheses are for which variables, or which data domains are most influential to a particular outcome.  In healthcare, for instance, the machine can indicate why readmissions on a given protocol are high.  The causal analysis leads to that type of analytic.

The advanced techniques are probably where we’re called in the most to try to make sense of all the data.  Without advanced techniques there is no way to cut through it.  You have no knife to cut through the data.  Just running reports will create endless reams of paper that, frankly, you could never get anyone to interpret.

We bring custom-tuned algorithms to a lot of our engagements — whether it’s in healthcare, supply chain or customer marketing — and with machine learning algorithms and supervised learning cycles we can run against their data and create hypotheses you can investigate with your experience.

Interesting.  Are those algorithms particular to vertical markets or is there a common base you build from?

We have horizontals and verticals.  The verticals tune into markets like supply chain in manufacturing or supply chain in consumer products, protocols in life sciences, etc., and horizontals are used throughout.  [An example of the latter] is a sparse matrix completion algorithm we patented.  If the data lake you have for a particular problem doesn’t fill in all of the variables you need, it runs predictive algorithms to fill that in and creates hypotheses as to what the trends would be.  We just ran it against a diabetes protocol with a large healthcare company and, with 93% accuracy, we can predict who isn’t compliant with their diabetes protocol without having any of the compliance data associated with it.

Meaning you can predict who isn’t doing what they’re supposed to be doing?

Right.  They’re not doing weigh-ins, they’re not doing their exercise.  It doesn’t predict exactly what they’re not doing because we just started, but it predicts who isn’t compliant. We’re hoping to improve its accuracy into the high 90s, and then we’ll be able to scrutinize a whole hospital system because it becomes predictive at that point.  Before you have compliance problems you can show trended scores.  This person is trending towards someone that won’t be compliant.  And then you can have a nurse call and ask, “Are you having trouble taking your insulin?  Is there a reason you haven’t been doing your exercise?  Are you not getting to the doctor because you have a transportation problem?”  You can start to seek out particular issues in the protocol to try to help.

Is this something you leave behind after your engagement ends?

Deloitte has become a provider of products and software over the last four years.  That was my previous endeavor, creating the products and solutions portion of our company so I can talk fairly in depth about it.  We now offer Software-as-a-Service products and we leave behind, if you will, installed solutions.  We do both.  It’s just a matter of the problem we’re solving as to which one makes the most sense and which one is most economical. 

Where does the push for this type of analytics come from within an organization?

I would say the strongest push is from the business, not the boardroom.  We do lots of dashboards for executives, but typically you start with a business owner, and then after it’s working, the business owner is presenting it to the CEO and the board and it becomes more viral and usually works its way back down to the next business unit.

I was talking to the chief data officer of a financial firm and he told me that when they started some of their big data efforts they had to reconcile a bunch of differences in their core customer data. Is that typical of many organizations? 

http://www.cio.com.au/article/613025/your-data-lake-turned-into-data-swamp-it-might-time-take-next-step/?utm_medium=rss&utm_source=taxonomyfeed via http://www.cio.com.au/tax/news/ #CIO, #Technology