Talent Gap
A big issue with Hadoop, especially early on, has been a shortage of Java developers who know how to code Map-Reduce. A number of tools have been created which help with this by allowing those with different skill sets interact with Hadoop.
Some popular tools include:
- Spark is a fast and general engine for large-scale data processing. You will need to know something about one of these languages: Scala, Python, or Java. This one has been getting a lot of attention in the past year or so.
- Hive is friendlier for database people with its SQL – like interface that generates map/reduce code with out Java programming.
- Pig provides a “python-ish” or scripting type interface that will generate map/reduce code too
- Datameer is a proprietary commercial product sits on top of Hadoop and provides an Excel-like interface for its core features, which exposes Hadoop to people with a variety of backgrounds, including business analyst types.