Practical Tips for a Successful Hadoop Journey
One of the really cool things about Hadoop is its flexibility. It’s a fantastic platform for data storage. It’s an active archive. It’s a system for doing ETL. It’s a way to explore and analyze new data. It’s a solution for reducing fraud, getting to know your customers better, and driving product innovation. The list goes on and on.
So where and how do you start? And what do data-driven organizations need to consider in order to successfully deploy Hadoop across the enterprise?
We tackled a number of topics like these on a recent panel-style webinar with experts from Intel and Red Hat, and moderated by Tony Baer, principal analyst at Ovum. Check out the on-demand recording here.
The discussion was rich and deep, with topics ranging from how to identify an initial use case and foster collaboration between IT and business users, to why open source matter to customers. We also spent time on security, governance and agility. In other words, we covered a lot of ground.
As a result, there were a number of good audience questions we were not able to answer. I’d like to address a few of those below:
How should we be thinking about migrating data from legacy systems?
Treat legacy data as you would any other complex data type. HDFS acts as an active archive, enabling you to cost effectively store data in any form for as long as you like and access it when you wish to explore the data. And with the latest generation of data wrangling and ETL tools, you can transform, enrich, and blend that legacy data with other, newer data types to gain a unique perspective on what’s happening across your business.
What are your thoughts on getting combined insights from the existing data warehouse and Hadoop?
Typically one of the starter use cases for moving relational data off a warehouse and into Hadoop is active archiving. This is the opportunity to take data that might have otherwise gone to archive and keep it available for historical analysis. The clear benefit is being able to analyze data for the types of extended time periods that would not otherwise be cost feasible (or possible) in traditional data warehouses. An example would be looking at sales, not just in the current economic cycle, but going back 3 – 5 years or more across multiple economic cycles.
You should look at Hadoop as a platform for data transformation and discovery, compute intensive tasks that aren’t a fit for a warehouse. Then consider feeding some of the new data and insights back into the data warehouse to increase its value.
What’s the value of putting Hadoop in Cloud?
- The cloud presents a number of opportunities for Hadoop users.
- Time to benefit through quicker deployment and eliminating the need to maintain cluster infrastructure
- Good environment for running proof-of-concepts and experimenting with Hadoop
- Most Internet of Things data is cloud data. Running Hadoop in the cloud enables you to minimize the movement of that data
- The elasticity of the cloud enables you to rapidly scale your cluster to address new use cases or add more storage and compute.
Thanks again to everyone who participated. And if you missed it, we encourage you to check out the replay.