Data Governance in Hadoop – Part 2
In the previous post, we examined what data governance is and why it’s so important, as well as why it can be complicated for Hadoop. In short, when you have petabytes of data from many different sources, with varying degrees of structure, it’s difficult to keep track of where your sensitive or confidential data is located. But that also makes doing so even more important!
In this post, we’ll take a look at Cloudera Navigator, the only native data governance solution for Hadoop, and integrated into Cloudera Enterprise. We developed Cloudera Navigator to address the challenge of Hadoop data governance, and it’s one of the many reasons that so many organizations in financial services, healthcare, and pharmaceuticals rely on Cloudera Enterprise to store, analyze, and govern their most important data assets.
Cloudera Navigator is a turnkey data governance solution: once you install it, it automatically tracks everything you need to know about data usage in your cluster – in particular, what kind of data you have in your cluster, who’s accessing it, and what they’re doing with it. This includes the following:
Unified Auditing. Cloudera Navigator captures every data access attempt and logs it in a central location. Whether it’s through Apache Hive, Apache HBase, Cloudera Search, Impala, or just plain MapReduce. Cloudera Navigator tracks the user ID, IP address, resource name, and even the exact query that was run.
Comprehensive, Column-Level Lineage. Cloudera Navigator automatically captures lineage for all batch and interactive workloads, including Hive, Impala, MapReduce, Apache Oozie, Apache Pig, Apache Spark and Apache Sqoop transformations – all the way down to the column-level. It also integrates with all the top enterprise lineage frameworks.
Unified Metadata. Cloudera Navigator simplifies metadata access in Hadoop by consolidating all of the Hadoop technical metadata into a single, searchable interface. Additionally, Cloudera Navigator lets you classify all your data with custom tags and key-value pairs — so you can classify your data by clinical trial, customer type, degree of sensitivity, security clearance level, or anything else. Cloudera Navigator’s metadata features make it easy for data scientists and Hadoop administrators alike to effortlessly find and trust the data that matters most to them.
Data Lifecycle Management and Policy Enforcement. Cloudera Navigator’s flexible policy management, built on top of its rich metadata foundation, lets you automate crucial data stewardship and curation activities, such as metadata classification, data archiving and retention, or even invoking partner products for additional data preparation and transformation.
Encryption and Key Management. Cloudera Navigator includes enterprise-grade encryption and key management through Navigator Encrypt and Navigator Key Trustee to secure all data, metadata, and log files.
Seamless Integration with Existing Governance Solutions. Cloudera Navigator provides seamless integration with the leading enterprise metadata, lineage, and SIEM applications that organizations already rely on, including IBM, Imperva, Informatica, RSA, and Splunk.
Cloudera is leading the way for data governance in Hadoop and we will continue to be adding to and improving Cloudera Navigator’s already robust capabilities. Be on the lookout for exciting enhancements stemming from the acquisition Xplain.io around managing more complex data and workloads faster.