How I Saved My Company Millions—Part Two

By Shawn Dolley, Vice President, Corporate Development and Strategy, Appfluent


Crossing the Chasm to Peak Savings

How I Saved My Company Millions—Part One discussed what the majority of large companies face in regards to data, analytic platforms, and evolution toward Hadoop. Most large companies moving along the path to Hadoop get stuck when they need to decide which large batches of structured data and its data management to move. In fact, a 2013 Cloudera poll indicated that the most common reported cause of a lack of progress was not knowing which data could and should be moved to Hadoop.

Several different terms are used in the market to describe the movement of existing structured data to Hadoop. The most common ones are: Data Warehouse Offload, Data Warehouse Modernization, Data Warehouse Optimization and Data Warehouse Rationalization. Regardless of the name, the steps involved in this continual process are the same:

1.   Identify offload candidates (data, transformations, applications)
2.   Obtain sign-off from business or stakeholders to perform the offload
3.   Implement new ETL on Hadoop
4.   Perform offload, test and optimize
5.   Repeat above, continuously!

How is each step accomplished? What are the keys to success?

1.     Identify offload candidates (data, transformations, applications)

This first step in some ways is the hardest. In a large analytic environment used by a variety of users, for many parts of the business, it is not clear what can be moved. Firms are left to get part of the picture using a mix of business intelligence or database tools not designed to interrogate the complete SQL logs/traffic on the data warehouse. This conundrum has caused many companies to turn to Appfluent. With Appfluent, an organization can get a complete picture of all usage on a data warehouse.

By semantically parsing all SQL hitting the data warehouse, Appfluent can easily show which tables are unused, underused, have extra history that no one is accessing, and where compute is hogged by SAS extracts or resource-intensive push-down transformations. All this is accomplished with no impact on performance. By moving the data identified by to Hadoop, data warehouses can reclaim large amounts of storage and control costs.

2.    Obtain sign-off from business or stakeholders to perform the offload

Before data is moved, sign-off is needed from the business units who are ostensibly using that information. In the past, this has been a roadblock for IT. The business—without specific information and validation—is loath to sign-off on any effort involving the movement of data that no one can prove is not in use. With Appfluent, the IT team can show the business users in-depth reports that illustrate low or no usage, timeframes indicating when data is used and other supporting documentation.

Armed with actionable information, the business can feel confident that their concerns have been addressed and agree to move the data. And Hadoop, unlike tape or other colder archives, can produce the data for the business if it is needed in the future.

3.     Implement new ETL on Hadoop

Beyond some of the open-source approaches available to create Hadoop-ready extraction, transformation and load (ETL) code, there are select off-the-shelf products designed to bring all the sophistication of ETL packaged applications to the Hadoop environment.  Whatever the approach, the transformation code post-load that was running on the data warehouse to support the unused or little used data is not going to run on Hadoop

4.    Perform offload, test and optimize

It will be time to create the data management infrastructure needed to support this data in its new home. As data is ingested into Hadoop you can leverage the power of high performance distributed grid computing to parse, extract features, integrate, normalize, standardize, and cleanse data for analysis. Data must be parsed and prepared for further analysis.

A successful move of underutilized and unused data will set the stage to migrate entire business intelligence applications to Hadoop sooner rather than later.

5.    Repeat above, continuously!

Offloading data to Hadoop can become addictive. Organizations that achieve success with Hadoop can cap the size of the data warehouse with ongoing monitoring and movement of data and transformations that are wasteful. Ultimately, offloading to Hadoop – and saving millions in the IT budget becomes commonplace.  Just another day in the life of an IT hero.


The post How I Saved My Company Millions—Part Two appeared first on Cloudera VISION.

Leave a Comment

Your email address will not be published. Required fields are marked *