Improving Multi-tenancy with Virtual Private Clusters
Noisy Neighbors in Large, Multi-Tenant Clusters
The typical Cloudera Enterprise Data Hub Cluster starts with a few dozen nodes in the customer’s datacenter hosting a variety of distributed services. Once configured and secured, the cluster administrator (admin) gives access to a few individuals to onboard their workloads. Over time, workloads start processing more data, tenants start onboarding more workloads, and administrators (admins) start onboarding more tenants. To address the increased demand, the admin adds more hardware capacity to the shared cluster.
While adding nodes addresses some scalability challenges, it quickly becomes a cost-inefficient way to deal with demand. There are new challenges that cannot be sufficiently addressed by simply adding more hardware. One of these challenges is the noisy neighbor problem, i.e., the undesirable effect of one user’s workload impacting the performance and stability of another user’s workload.
As the user and workload count in the shared cluster increases, cluster admins tend to become increasingly sophisticated in leveraging various resource management & multi-tenancy features that we offer within a cluster to maintain order. These include setting up YARN resource pools, managing priorities, configuring pre-emption, and dedicating nodes to specific services, or scheduling certain workloads on specialized hardware (e.g. GPUs).
Despite of all these levers, once clusters hit hundreds or thousands of users & workloads, admins may not be able to effectively control the ensuing chaos. A less experienced user might submit a poorly written Impala query that attempts to join a hundred different tables with no predicates that limit the size of the scans. Another user on a completely different team suddenly receives a message that a query, which had been running daily for months without problems, now suddenly times out. This seemingly inexplicable failure causes frustration. If the impacted user is not changing anything in the environment from his/her perspective, then predictability is expected in both the performance and the stability of workloads. When a mix of batch, interactive, and data serving workloads are added to the mix, the problem becomes nearly intractable. In other words, the extraordinarily difficult task of automatically and efficiently scheduling a large number of both long & short-running big data workloads running in various compute frameworks limits the level of multi-tenancy that can reasonably be achieved in a single cluster as usage scales up.
At a certain point, the admin may decide to split up the large, multi-tenant cluster into several smaller clusters with fewer tenants each to create a greater degree of isolation from noisy neighbors and thus ensure workload SLAs are met. While this approach provides isolation, it creates another significant challenge: duplication of data, metadata, and security policies, or ‘split-brain’ data lake. Now the admins need to synchronize multiple copies of the data and metadata and ensure that users across the many clusters are not viewing stale information. Furthermore, the admins would need to replicate all the security policies and ensure that changes are applied to all clusters. This duplication can become such a management nightmare that they may prefer to return to the original model and simply live with noisy neighbors.
A Better Approach: Virtual Private Clusters
Cloudera Manager (CM) 6.2 introduces “Virtual Private Clusters” (VPCs) to improve isolation without the downsides of data duplication and the ‘split-brain’ problem with data lakes. With CM 6.2, a cluster admin can create multiple, isolated, compute-only clusters that each point to one data repository, one data catalog, and one set of security policies.
To achieve this, Cloudera VPCs rely on the logical separation of compute services from base services. More specifically, distributed MapReduce, Hive, Impala, and Spark workloads run on the compute clusters, while HDFS, the Hive Metastore, and the Sentry service run on the base cluster. Multiple compute clusters can access the services in a base cluster through a new concept called the Data Context. A Data Context is simply a grouping of pointers to the base cluster services, with an easily recognizable name given by the cluster admin.
Beginning with CM 6.2, when an admin uses the cluster creation wizard, CM presents two options: (1) create a traditional cluster, or (2) create a compute cluster. For a new compute cluster, CM will ask for the Data Context in a drop-down menu. This Data Context will provide all the necessary configuration to the compute cluster to access the shared data, metadata, and data authorization policies resident in the base cluster. This also allows administrators of compute clusters to operate independently, unaware of the details of the base cluster, while keeping track of any relevant changes made to the base cluster that might indicate the need to restart the compute cluster. Similarly, through the Data Context, base-cluster administrators can see which compute clusters are dependent on any Data Context and thus might be impacted by a maintenance window or configuration change.
Virtual Private Clusters
With VPCs, cluster admins now can provide stronger isolation guarantees between tenants and workloads. A query running in Compute Cluster 1 does not share any compute resources with a query, ETL job or streaming job running in Compute Cluster 2. Furthermore, both Compute Cluster 1 and Compute Cluster 2 can operate on the same copy of the data with a single set of security policies. This model increases predictability without creating undesirable data silos.
Multi-tenancy Strategies with Virtual Private Clusters
Prior to using Virtual Private Clusters, cluster admins need to decide on an isolation strategy for the compute-only clusters. Here are three possible strategies they may consider:
1) By user, team, or business unit
Perhaps the most obvious means of dividing up clusters relies on assigning individuals, teams, or business units to their own dedicated clusters. In other words, tenants map to human beings. This strategy works well for managing internal chargebacks, limiting the impact of less sophisticated users on more experienced users, and overall encouraging individuals to think about and optimize their jobs and queries now that they have a smaller (but dedicated) cluster.
2) By workload type
Splitting compute clusters by type of workload is another good strategy. This approach seeks to optimize resource utilization or infrastructure efficiency. One cluster may run on memory-optimized VMs for maximum performance of Impala SQL queries, while another cluster may run on CPU-optimized VMs for compute-intensive Spark jobs. In this case, an individual with different types of workloads may have access to multiple clusters; the admin trades increased risk of a noisy neighbor for better infrastructure efficiency.
3) By workload priority
A third strategy splits clusters based on the overall priority of the workloads running on those clusters. We sometimes refer to this as splitting “dev/test” from “production” workloads, but we can generalize the approach by referring to the overall priority of the workload for the business. At one end of the spectrum, the admin may designate a few clusters as “mission critical”; these clusters are typically over-provisioned and provide strong SLA guarantees. At the other end of the spectrum, the admin may instantiate a number of low-priority dev clusters – these clusters may often run at capacity, not require performance guarantees, but also provide more agility and flexibility for experimentation.
While we present the above as independent strategies, admins can combine multiple strategies to achieve the optimal balance for their specific environments. For example, an admin may instantiate a compute-only cluster optimized for SQL analytics that is dedicated to dev/test queries coming from the finance team; this approach incorporates all 3 strategies.
In general, the more granular the admin approaches isolating users and workloads in dedicated compute-only clusters, the less likely users will impact each other. However, greater isolation diminishes the possibilities for running workloads with spare capacity, so infrastructure utilization tends to suffer. Virtual Private Clusters effectively gives administrators a knob to select the tradeoff that works best for them.
In a traditional cluster, Cloudera Manager co-locates compute and storage on the same physical infrastructure. This model allows the resource manager to schedule compute tasks on the same physical nodes that hold the data those tasks will process. In the VPC model, compute nodes are isolated from other compute nodes to provide strong isolation. Likewise, since we have one copy of the data and metadata to avoid silos, compute-to-compute isolation also necessitates separation of (at least some) compute from storage.
Therefore, in the VPC model, every read and write operation from the compute-only cluster traverses the network. This loss of data locality means IT and cluster administrators should pay close attention to network capacity to avoid significant performance degradations. As the number of nodes on the compute side increases, so does the overall traffic across the network.
Designing a proper network for VPCs extends beyond the scope of this blog post. Nonetheless, we recommend that customers adopt flat spine-leaf topologies with zero-to-low oversubscription ratios across any point in the network. While 100gbs NICs are always preferable, oversubscription ratios and cross-sectional bandwidth are much more important metrics to focus on when it comes to highly distributed, data-intensive workloads going across the network. Cloudera Manager 6.2 also includes tools for measuring network latency and cross-sectional bandwidth between compute clusters and base clusters to ensure you have an appropriate networking environment for using VPC.
We realize most customers will not want to take a large multi-tenant cluster with data locality and pull all the compute workloads into remote clusters overnight (leaving the compute capacity in existing nodes unused). For this reason, Virtual Private Clusters allows admins to run both models (local and remote storage) in the same environment. We achieve this by allowing the base cluster that holds the data, metadata, and security policies to also run compute workloads on the same nodes. In other words, the base cluster acts just like any other traditional cluster with data locality. This allows the admin to keep I/O intensive and performance sensitive workloads running on the base cluster, while other workloads that require greater isolation guarantees and may have less sensitivity to I/O performance can be deployed in compute-only clusters.
Conclusion and future work
Virtual Private Clusters can significantly improve multi-tenancy management and isolation guarantees in large, shared environments, allowing users of big data infrastructure to onboard more users and workloads faster, with less risk. Rather than becoming experts on intra-cluster multi-tenancy & resource management, admins can rely on infrastructure-level isolation to separate tenants. This approach is also more in line with cloud-based deployments where tenants are provided with their own dedicated compute environment (servers, virtual machines, or containers) to run their jobs and queries on shared data in a centralized storage service.
While VPCs brings much needed isolation capabilities to our customers with minimal disruption, the new Cloudera Data Platform will take this concept to a whole new level. Many of the seemingly unavoidable tradeoffs mentioned above will disappear as we adopt new infrastructure technologies providing greater efficiency and elasticity, and greatly improve how users provision and interact with Cloudera software, with purpose-built environments targeted to a range of workload types from the data collection at the edge, to AI in the data center, private cloud and public cloud. We look forward to sharing more information about these new capabilities in the near future.
Learn more about Cloudera’s platform here.
About the Authors:
Tom Deane is Director of Product Management for Compute and Storage at Cloudera. He manages product priorities for HDFS, YARN, Kubernetes, and Private Cloud technologies.
Lakshmi Randall is Director of Product Marketing at Cloudera, the enterprise data cloud company. Previously, she was a Research Director at Gartner (News – Alert) covering Data Warehousing, Data Integration, Big Data, Information Management, and Analytics practices.
The post Improving Multi-tenancy with Virtual Private Clusters appeared first on Cloudera Blog.