SAP, Cloudera & Spark: Streaming Analytics for Business Data and Big Data

What if you could more accurately predict how much an investment is likely to lose over a given period of time? Or gain better precision in determining risk exposure based on credit valuation adjustment (CVA) calculations?

Financial institutions are constantly evaluating risk, and in ever increasing numbers, organizations are running these assessments on Apache Spark. We’ve written a lot about Spark over the past couple years, both from a business standpoint and from the view of a developer who’s running Spark in production. We believe Spark is now unquestionably the primary general purpose data processing engine for Hadoop, ideal for running fast batch and stream processing as well as machine learning – the type of compute jobs that enable financial organizations to quickly crunch massive datasets and make smarter decisions out of the data.

But Spark isn’t just for finance. Consider the following:

Energy –  Companies drilling for oil or manufacturing turbines are processing streaming sensor data from machines in the field in near real time and using that information to better predict when a part will need maintenance

Telco – CSPs can look at network and phone data to determine root cause of an outage and reduce mean time to resolution

Healthcare – Insurance agencies can more quickly process claims to look for signs of fraud

As the first Hadoop distribution to ship and support Spark, Cloudera saw the value of this tool early on and we have seen wide adoption from our customers – with hundreds of customers running Spark in Cloudera’s platform across a variety of use cases and industries. Cloudera has also worked closely with our broad ecosystem of partners to ensure the latest innovative applications being built on Spark are available to our customers. We are excited about the opportunity SAP HANA Vora brings to our customers and we look forward to working with SAP to deliver Spark support to SAP HANA Vora customers.

As Hadoop adoption takes hold in large organizations and “data lakes” transform into enterprise data hubs, we are starting to see more customers run Cloudera’s Hadoop distribution alongside SAP HANA. In many cases, Hadoop acts as a landing zone, processing engine, and data discovery platform, feeding data into SAP HANA for analytics. Cloudera can also connect directly to SAP BI tools like SAP Predictive, SAP Lumira, and SAP Business Objects via Impala.

With SAP HANA Vora, customers can now combine their business data with big data in Hadoop, process that data via fast batch or near-real-time-streaming, and seamlessly connect to their SAP in-memory database to drive unique insights and improve decision making.

Cloudera’s enterprise data hub offering – with the leading, complete Spark integration and management – makes this easy. Also of note, many of the organizations that will likely be early adopters of SAP HANA Vora are in industries that enforce tight regulations on who can see and do what with their data.

When considering a Spark platform for SAP HANA Vora, we suggest looking for one with a shared data management and governance model that closely tracks how data is brought into Hadoop, accessed, and transformed, as well as where it goes outside of Hadoop. The data should also be encrypted and secure anywhere and everywhere it lands in Hadoop. Only Cloudera offers this comprehensive security and governance..

We look forward to working with SAP HANA and SAP HANA Vora customers.

Want to learn more about Spark? Check out our Spark Developer Training Portal

For more information on SAP and Cloudera, visit:

The post SAP, Cloudera & Spark: Streaming Analytics for Business Data and Big Data appeared first on Cloudera VISION.

Leave a Comment

Your email address will not be published. Required fields are marked *