Lessons learned for the modern data strategy

Let’s face it, big data can be hard. Now you may think this is a careless statement for a software vendor to make but bare with me and I’ll explain.

Industry analyst Tony Baer (of Ovum Research fame) and I have debated this topic many times so last week I asked him to join me for a webinar to explore it further. During our discussion we sought to answer a few questions, namely:

  • Is it true? Is big data indeed hard?
  • If the answer to the first question is yes (spoiler alert, it can be!), why is that the case?
  • And finally, what can we learn from those have been successful so as to simplify the journey?

When I did my first triathlon many years ago, I didn’t just show up on race day and jump in the water but rather planned out a months long strategy for how I was going achieve my goal (Step 1?  Learn how to swim!!). I broke my plan down into three main components, namely people (me and my support system), process (my weekly training and diet plan), and technology (my bike mostly, but shoes are worth a look too!). Thinking about it this way, I was able to break the problem down into more manageable pieces that with the right focus, could be improved in the spirit of finishing the race.

Tackling a big data project it turns out can be broken down in a similar way.

Through our combined experience of working with organizations that have taken a leap into the Apache Hadoop pool, Tony and I share the same belief that people, process and technology are all equally important for ensuring success with Hadoop. Many organizations, though, have a tendency to overlook these three important dimensions but rather look solely at the technology part in isolation. Who can blame them?  Who doesn’t want to get their hands on the shiny new toy and try new things?  We caution against this though. Let’s explore each component in more detail to understand why:


Ask yourself this: do you have the people on staff today to meet your objectives with Hadoop?  It’s well known that data scientists are in short supply, but there are other things to consider as well.  Who is going to administer the cluster? How will you enable your analysts to use it?  In many cases, these questions are overlooked and as such, organizations jump in only to learn their teams aren’t ready yet. Training is a great way address this.


During the webinar, we talked about business impact of big data projects and explored the business processes that need to be taken into consideration for overall business by in.  Specifically, we asked if attendees projects were aligned to driving customer insights, improving products and services efficiency, or lowering business risks.  Here’s what we learned:

Two observations I have from this data: first, a decent number of people don’t know what the business impact is at all.  Second, many others say they are solving every business problem, which leads me to believe they don’t really know either!  Know this for certain: if you’re not thinking about the business impact, and partnering with those business executives most impacted by it, your Hadoop projects are going to struggle more than if you do.


As the global provider of the fastest, easiest, and most secure data management and analytics platform built on Apache Hadoop, you’ll hear us talk a lot about an enterprise data hub (EDH) as the architectural centerpiece of your data center moving forward.  An EDH allows you to store and process any data, at rest or in motion using a fundamentally secure platform for leveraging the latest innovations such as Apache Spark or Apache Kafka, on premises or in the cloud. The tricky part is it’s not an all or none proposition.  In order to do this right, you need to determine how to rationalize your EDH relative to your existing investments.  Cloudera and our vast partner ecosystem make that as simple as possible.

So which one are most people worried about?  Tony and I were curious about this too so asked during our discussion and here’s what we learned:


As we can see, most people are worried about their staffing levels and the ability to have people on the ground to get going.  Not something that should be taken lightly but fortunately there are a wealth of resources available to get ramped up.

There’s more to this topic than can be explained in just one blog. As such, we’re planning a follow on set to discuss each of these things in more detail. In the interim, we encourage you to take a listen to the webinar replay to hear the entire discussion and let us know what you think.

The post Lessons learned for the modern data strategy appeared first on Cloudera VISION.

Leave a Comment

Your email address will not be published. Required fields are marked *