Latest Posts

Big Data & Brews: Anil Chakravarthy Diagrams the Big Data Ecosystem

Our last installment of Big Data & Brews with Anil touches on a cool topic. Of course, I like that we get to use the chalkboard but we also had a chance to break down how Informatica sees the ecosystem (hint, the data intelligence layer is the most promising). We also talked about what he sees happening in the next 10 years that will really accelerate change in the industry.

The full conversation is just a click away – tune in!


Stefan: What would be interesting to see is in this ecosystem really of data technologies, right, where are you guys are sitting and then where you see Hadoops, Teradatas, Microstrategies, Datmeers. I kind of see you as the fabric that brings it all together. Is there a central brain of that fabric?

Anil: Right. You know, we believe so. Let me just take a stab at how we think of the word. This is obviously a logical view and it has to be translated based on … We see the world as start with this is — think of this as data persistence. This world is obviously is changing very rapidly. It was basically the databases of the world. Could be anything from mainframe database to relational database, etc. Now Hadoop and NoSQL and this world could be either on the framework or in the cloud or a combination.

Then we see the world or what we think of as data infrastructure. So this is the world, which we have traditionally played in and this world is also changing rapidly because it obviously, when this changes, this has to change here. You have things like data ingestion, which is changing very rapidly. Somebody once joked to me that that whatever IBM worked on in the 1970s always will be useful at some point so it’s like that. Things, concepts like changes and capture. The concepts like real time, streaming, etc. so all of those are coming back, right?

You have ingestion. You have data integration. Obviously that’s where you put it together, the aggregation etc. I think you have a lot of work around data quality, which is increasingly, “How do you do quality, especially on unstructured data” and things like that. That becomes a lot of work to …read more

Hadoop Manufacturing Innovation & IoT

Grant Bodley

The advent of connected manufacturing has ushered in an era where low-cost machine sensors take thousands of measurements per second at many points across the manufacturing process. This stream of sensor data enables manufacturers to quickly detect emerging anomalies and solve issues before they impact yield and quality.

Big Data insights enable predictive analytics for those rapid, proactive process adjustments. Manufacturers can capitalize on this opportunity by following an approach that combines the power of Teradata with Hortonworks Data Platform’s storage and compute efficiencies at extreme scale. Working together, our technologies enable big data insights that can dramatically improve existing manufacturing processes.

Register for the Teradata Partners Event

On Wednesday October 21st from 12:00-12:45, I will be presenting a webinar along with Dale Glover, Teradata VP of Industry Consulting. Join us for 45 minutes to learn more about how manufacturing companies are utilizing Hadoop to:

Establish a Single View of data on products throughout their entire lifecycles
Build a 360° view of lifetime customer value
Optimize manufacturing quality and yield
Proactively maintain equipment to minimize the risk of downtime
Event Details
​Presentation Title: Hadoop for Manufacturing Innovation & IoT
Session Number: 3719
Date & Time: Wednesday October 21st from 12 to 12:45PM PST
Location: 202 AB
About the Speakers

Grant Bodley: Hortonworks GM for Global Manufacturing Solutions

As General Manager of Global Manufacturing Industry Solutions at Hortonworks, Grant Bodley brings over 25 years of manufacturing experience in working with leading Automotive, Industrial, High Tech, and Aerospace Manufacturers in leveraging Big Data Insights and high impact use-cases to transform their businesses. Prior to Hortonworks, Grant was Vice President of Manufacturing Industry Solutions at SAP for more than 10 years.

Dale Glover: Vice President of Industry Consulting for Teradata

Dale Glover is a Vice President of Industry Consulting for Teradata. His Industry Consulting team is responsible for helping clients successfully implement Business Intelligence and Analytics to drive business process impact and value. He is leading the transformation of this organization to support an analytic consulting focus across a broad ecosystem of platforms and tools. His advanced Applied Analytic Team is helping organizations move from Big Data insights into the realization of value from advanced analytics in day to day operations.

The post Hadoop Manufacturing Innovation & IoT appeared first on Hortonworks.

…read more

From Mechanical Engineer to Oil & Gas Data Scientist

I recently had the pleasure of visiting with Arvind Battula, Sr. Data Scientist at Schlumberger. We discussed his background as a chemical and mechanical engineer and his move onto the Data and Analytics team as a data scientist. The following is a transcript of my conversation with Arvind. We discussed his background, his interesting focus areas for data science in oil and gas, and technologies that he believes will help transform the industry.

Kohlleffel: Arvind, you entered the data science world recently on the Schlumberger Data and Analytics team and have a very interesting background coming from both chemical engineering and mechanical engineering disciplines. Tell me about your experience and engineering background.

Battula: Certainly, my background is diverse. I started my formal training as a chemical engineer. After my bachelors, I applied for graduate school in mechanical engineering to deal mostly with computational fluid dynamics. I wanted to pursue a Ph.D. in the same area, but my doctoral work changed direction to focus on nanophotonics, which is the interaction of nanometer-scale objects with light.

Kohlleffel: That makes for quite a compelling base of experience for your data science work. Now that you’ve moved to the Data and Analytics team, where have you focused so far?

Battula: My mechanical engineering background has been very helpful at Schlumberger since we are dealing with designing products that are used in the harshest conditions imaginable on the planet. In everything we do, we must consider very minute design details to ensure the most robust end product. Before we design and build parts and assemblies, we are very thorough in our calculations and modeling–to quantify our engineering and physics assumptions. This is where we leverage data and analytics to bring a new rigor to the process and move beyond some standard linear assumptions which can be obstacles to efficiently model complex phenomena across all variables.

For example, factors like high temperature, high pressure, stress, vibration, corrosion, aging all act in parallel on the mechanical systems. We can look deeply into that data to better understand the combinations of these variables that are causing mechanical failures and then we can bring together the data streams for both physics and engineering.

This non-linear root cause analysis shows us the real world we deal with on a daily basis. It is ideally suited to leveraging big data and analytics and it benefits multiple groups within our company including engineering, manufacturing, sustaining and maintenance.

In …read more

Big Data Expo comes to Utrecht, Netherlands


There’s excitement in the air as one of Benelux’s largest Big Data conferences “Big Data Expo”, comes to Utrecht in The Netherlands.

We’re sponsoring and you’ll find our experts Chris Harris and Jhon Masschelein presenting such topics as “5 Steps for Effective use of Apache Spark in Hortonworks Data Platform 2.3” and “Lessons Learned: 5 Common Hadoop Use Cases”. You can register here.

As Hortonworks continues to extended its footprint in Europe, we’re seeing some exciting use cases and an increasing momentum of enterprise adoption of Hadoop. The Hadoop Summit that we organized in Brussels early this year showcased some of the great European use cases. Here’s a short overview of one my favorites:

ING Bank: Destroying Data Silos for Creating a Predictive Bank

Hellmar Becker a Utrecht resident discusses breaking down Data Sillos and creating a centralized Datalake at ING. He also discusses the modernization of their data centers, migrating away from legacy systems within their governance and security framework.

Bart Buler, Hellmar’s co-presenter discusses the banks steps into becoming a truly predictive bank. Bart also provides some do’s, don’ts and difficulties in this journey and talks about the future for the bank including “integrating analytics as part of data flows”, “showing interactive results to individuals without access to the cluster” and many more.

You can more videos listed here

To conclude, Big Data Expo, will showcase an array of new technologies, exciting case studies and organizations making the most out of data. Come visit us at Stand 21 as my colleague Alfie Murray-Dudgeon pictured below awaits.

The post Big Data Expo comes to Utrecht, Netherlands appeared first on Hortonworks.

…read more

Big Data & Brews: Anil Chakravarthy & How Consumer Tech Will Influence Enterprise Tech

If the sky was the limit and we had unlimited storage and compute, what would the future of the data world look like? In part 4 of my interview with Informatica, acting CEO, Anil Chakravarthy, says we’re already seeing a preview of it in the consumer world. What does he mean? Watch below to find out more:


Stefan: Let me switch gears here a little bit. Where do you see the future really in the data world? If sky’s the limit, and we have unlimited storage on compute and, you know, Ray Kurzweil is right and we have chips are faster than our brains in something like five years. Where is this going?

Anil: Yes, to me actually I think we already see a preview of the future. I’m talking about enterprise data right now. I think we see a preview of that feature already in the consumer world. I mean think of the Apple App Store for example – what are there, over a million apps right now at this point? But the apps are already separated from the data. Your data that the apps operate on is kind of under your control; you may have a separate repository that you use for it, either your own or iCloud, etc, and the apps are extremely modular and the apps come and go very quickly, the data lives a lot longer.

If you contrast that with the enterprise world, the enterprise world has been one where the data has been very closely tied to the apps. You know you have ERP apps or CRM apps or other kinds of apps, or custom apps where the data models have been very closely tied. You still have some separation, that’s why you can reuse the data, but the data and the apps have been very closely tied together. To me, that world is going to go the same way as the consumer world already has gone. So if you ask me what’s the future, it’s like, the data models, the understanding of what different data types are, whether it’s schema-on-read or pre-defined schema and things like that, the data will be designed for durability and will be designed essentially to be used by a variety of apps, maybe cloud-based apps, maybe on-premise apps, etc, etc. The apps will become a lot more modular and the apps will come and go, and maybe apps may be …read more