Building a Scalable Process Using NiFi, Kafka and HBase on CDP

Navistar is a leading global manufacturer of commercial trucks. With a fleet of 350,000 vehicles, unscheduled maintenance and vehicle breakdowns created ongoing disruption to their business. Navistar required a diagnostics platform that would help them predict when a vehicle needed maintenance to minimize downtime. This platform needed to be able to collect, analyze and serve data from over 70 telematics and sensor data feeds from each vehicle in their fleet, including data measuring engine performance, coolant temperature, truck speed and brake wear. Navistar turned to Cloudera to help build an IoT-enabled remote diagnostics platform, called OnCommand® Connection, to monitor the health of their vehicles and to increase vehicle uptime.

This blog demonstrates the use of similar technologies to address issues much smaller in scope but with parallels to those Navistar faced. Data was pulled from a highly-modified, high-performance Corvette (see Fig 1) to show the steps of loading data from an external source, formatting it using Apache NiFi, pushing it to a stream source through Apache Kafka, and storing it using Apache HBase for additional analysis.

Fig 1. 2008 Corvette with Modified 6.8L Engine

For this specific example, the Corvette in question has had all of the original factory engine components replaced in favor of higher performance parts. The engine was torn down to its shell, the cylinders bored, the crankshaft and camshaft replaced, and new pistons and connecting rods were installed, chasing the goal of ~600 horsepower (see Fig 2). For this new engine configuration to work properly, the engine’s software underwent a complete overhaul. While pressing the throttle became significantly more dramatic, an unintended consequence was that the car’s original diagnostics and error systems were no longer accurate and therefore had to be disabled.

Fig 2. Engine mid re-build with all new shiny internals

To capture and analyze the Corvette’s sensor data, a path was needed for the data to flow from the car into an alternative analytics and diagnostics platform. The first step was to hook up a laptop to the Corvette’s diagnostics port (see Fig 3) to import sensor data onto a cloud-based storage location. S3 was used for this project.

Fig 3. Laptop connected to diagnostics port via USB

The next step was to use Cloudera Data Platform (CDP), Cloudera’s multi-function, multi-analytics platform, to access the services needed to move the data to its final storage destination for additional analysis. Using CDP Public Cloud, 3 data hubs were set up, each hosting a set of pre-packaged, open source services (see Fig 4):

  • The first setup was NiFi, a service that is built to automate and manage the flow of data. NiFi was used to import, format and move the Corvette’s data from source to its final storage point.
  • The next was setting up Kafka, a real-time streaming service that allows for high volumes of data to be available as a stream. Kafka gives the ability for stream processing of the data, while also allowing other users the option to subscribe to the data streams. In this example there are not any subscribers; however, this is an important concept that deserves a demonstration of how to set it up.
  • The final setup was HBase, a highly-scalable, column-oriented operational database that provides real-time read/write access. Once data was imported into HBase, Phoenix would be used to query and retrieve data.

Fig 4. Corvette data flow diagram from source to query.

Building the diagnostics platform using CDP to monitor the health and performance of the Corvette was a successful exercise. Using NiFi and Kafka to format and stream the sensor data into HBase now allows for advanced data engineering and processing to be performed regardless of how large the data set grows.

Next Steps

To see all this in action, please see links below to a few different sources showcasing the process that was created.

  • Video – If you’d like to see and hear how this was built, take a look at a quick 5-minute video showing real-time navigation of CDP running NiFi, Kafka and HBase.
  • Tutorials – If you’d like to do this at your own pace, see a detailed walkthrough with screenshots and line by line instructions of how to set this up.
  • MeetUps – If you want to talk directly with experts from Cloudera and even the owner of this Corvette, please join a virtual meetup to see his live presentation. There will be time for direct Q&A at the end.
  • CDP Users Page – To learn about other CDP resources built for users, including additional video, tutorials, blogs and events, click on the link.

The post Building a Scalable Process Using NiFi, Kafka and HBase on CDP appeared first on Cloudera Blog.

Leave a Comment

Your email address will not be published. Required fields are marked *