Hadoop Adoption – Where is your organization?

Make Use of Your Data

Organizations have always strived to make optimal use of their available data.  As an organization reaches a certain threshold there are several limiting factors.

One of those factors is of course the ability, focus, and attention that an organization devotes to organizing their information to support ongoing growth and analysis.

Another factor is that for years, the technology that was available to most organizations was not easy to use. Perhaps it was too expensive, required too much reworking of current processes, or too close to the bleeding edge and not stable enough to commit resources to making a process change or investment in the technology.

Hadoop is Available Now

What has changed over the past couple years, is that Hadoop is available to any organization willing to spin up a test environment to assess the technology.  Whether an organization deploys to three old servers in their environment, or spins up several virtual servers using for example AWS or Rackspace, is not as important as the fact that a smallish hardware budget can let an organization try something for a few months to see if it helps them without a huge commitment.

While Hadoop and data integration products are rapidly developing, there is a core available now that can be used to assess the usefulness of Hadoop. In addition, that core can be upgraded with reasonable effort and respect to existing installations to support the decision to move ahead and towards Hadoop in your organization!

What Level is My Organization?

For the next section, I will borrow the concept of Capability Maturity Model (CMM) from Carnegie Mellon University and the Software Engineering Institute, and apply that to the maturity of an organization with regard to data management and adoption of Hadoop.  

Hadoop Adoption Level (HAL)

To simplify your starting point, which of these descriptions fits the status of your organization?   Your answer determines what the Hadoop Adoption Level (HAL) of your organization is!

  • Level 1 – We stopped trying to collect certain information years ago, because there was no way to retain that information in a useful way.  The difference between Level 1 and zero, is that a Level 1 organization recognizes that better data management would be helpful.
  • Level 2 – We have been collecting data from various sources such as our ordering process, website access logs, accounting or ERP system, Google Analytics.  Unfortunately, we have no way to gather this combination of structured and unstructured data into something that could produce meaningful information.
  • Level 3 – We have a test cluster set up, but are struggling to get data from various sources ingested and cataloged within Hadoop in a consistent way that supports our analyst team.
  • Level 4 – We have several Hadoop clusters set up, and each team is pushing data into Hadoop their own way, from their own silo or department.
  • Level 5 – We have more than one Hadoop cluster set up, and they are split for a specific purpose or requirement related to critical business policy or compliance factors.

For each one of these levels, there are steps your organization can take to make better use of your data and the latest technology!

Please let me your thoughts and suggestions about this article.

Sincerely,

Michael Blizman

 

2 Comments

  1. Lester Martin

    Good stuff, Michael. I can attest that most everyone can load a ton of information into Hadoop, but many do get stuck in your Level 3 block of leveraging the tool to produce meaningful information. I also think that many want to go from zero to hero in one quick sweep. Hadoop is like everything else you have to digest and you do it by taking bites of the (proverbial Hadoop) elephant one at a time.

    While I know of one organization we both are thinking about that has to have battling business units & technologies, I’ve also seen plenty of adopting organizations taking the one team (at least from the tech front) focus on the core cluster while letting more and more departments and BUs become benefactors of this model, so I personally don’t see that you’d have to go from your level 3 to level 4. Also, I’m still optimistic that folks will see the benefits of leveraging a monstrously-sized cluster with all the multi-tenancy knobs and levels adjusted properly that services up many, many different use cases.

    In general, I see the Hadoop Journey for most companies going from potential value, operational value, strategic value, and ideally to become data-driven organizations.

    Reply
    1. datafoam_wp_admin (Post author)

      Thanks for the thoughtful reply Lester! I agree, not every organization needs to or should go from level 3 to level 4 as you pointed out. Every organization has to find their own sweet-spot where Hadoop fits. The idea for this post is to try to grow a common vocabulary or approach as we start consulting around big data and Hadoop so we can describe where our client is today, and present options and roadmaps.

      As an analogy, consider relational database design. Many databases might approach third normal form, but there is a fourth and fifth level of normalization that one can attempt. Those levels of normalization will not fit the needs of most systems, especially if you have to run a report from it later :-).

      Reply

Leave a Comment

Your email address will not be published. Required fields are marked *