Talent Gap

A big issue with Hadoop, especially early on, has been a shortage of  Java developers who know how to code Map-Reduce.  A number of tools have been created which help with this by allowing those with different skill sets interact with Hadoop.

Some popular tools include:

  • Spark is a fast and general engine for large-scale data processing. You will need to know something about one of these languages: Scala, Python, or Java. This one has been getting a lot of attention in the past year or so.
  • Hive is friendlier for database people with its SQL – like interface that generates map/reduce code with out Java programming.
  • Pig provides a “python-ish” or scripting type interface that will generate map/reduce code too
  • Datameer is a proprietary commercial product sits on top of Hadoop and provides an Excel-like interface for its core features, which exposes Hadoop to people with a variety of backgrounds, including business analyst types.

