Scheduling in Hadoop clusters: Novel algorithms, state space collapse and delay minimization in heavy traffic
Friday, December 04, 2015
Abstract: A fundamental problem to all data-parallel applications is data locality. There are multiple levels of data locality, as a data block can be accessed in local memory or disk, within a rack or a data center, or across data centers. Scheduling with data locality is an affinity scheduling problem with an explosive number of task types and unknown task arrival rates. As a result, existing algorithms do not apply and the recently proposed JSQ-MaxWeight (Wang et. al. 2014) algorithm for two-level locality is delay-optimal only for a special heavy traffic scenario.
We found that for two-level locality, a simple priority algorithm is delay-optimal for all heavy traffic scenarios. With multi-level locality, we propose a novel algorithm. The key insight is that instead of focusing on scheduling alone, what if we co-design routing and scheduling? The insight is potentially applicable to affinity scheduling in general.
For both algorithms, we provide rigorous proofs for heavy-traffic optimality that use an ideal load decomposition and an interesting state space collapse. We implemented our algorithm in Hadoop clusters, and showed that it accelerates the data-processing phase of jobs by 11 times with hot-spots and 2.4 times without hot-spots.