Staffing your big data team

Building the right team is as important as assembling the right IT infrastructure – and the needs differ just as dramatically. A traditional BI and analytics organization consists of three main groups:


  1. Analysts that develop reports often using sample data
  2. The data management team – modelers that take requests, find data, and develop models to answer the questions
  3. The infrastructure team responsible for the technical components.

In a big data world, we often see three new roles emerge and work more closely together: data engineers, data scientists and architects.

The data engineering team is a strategic necessity as data itself is more agile. You can think of them as the data workhorse. Data engineers collect data in its native format, where the data may be transformed many times to meet the needs of desired use cases now and in the future. They document, secure, and audit the data. Then create simple schemas and search indexes for each data set. This function scales with the size of the data asset, so start small. One dedicated individual or a small team can do a lot to set you up for early success.

The data science team is often the consumer of the data assets created by the data engineering team. I really want to emphasize the word, “team” here, because data science is rarely the job of a single person. You need a subject matter expert from the business (someone with decades of industry knowledge), a statistician, and one or more “hackers” who have the ability to use different tools and programming languages to work with the data. Many organizations start with a central group that is “loaned out” to business units.

Where these skills live, over time, can vary dramatically from company to company. For example, one company could have the statistical and SME skills reside within the business units, with a centralized “hacking-on-demand” team. Meanwhile, another company might have the business teams build models, and the centralized data science experts rationalize and optimize those models, bringing them into production.

Finally, an architect is critical as the data technology ecosystems rapidly evolve.  While the actual technical components are often physically operated by a different group, a centralized architect function needs to collaborate with the various data, IT, and business teams and maintain responsibility for exploring new technologies. One of the best examples of a big data architect is Phil Radley, Chief Data Architect at British Telecom.

So while roles and functions may change depending on where you are along your big data journey, one singular personality trait remains constant: Curiosity. Make sure as you assemble your big data team that each member has an insatiable curiosity. We find those who are unafraid to look at their data differently and ask interesting questions are the ones who often deliver the greatest business impact.

The post Staffing your big data team appeared first on Cloudera VISION.

Leave a Comment

Your email address will not be published. Required fields are marked *