CERN Computing Seminar

Cluster Management via Multi-Level Scheduling with Apache Mesos

by Benjamin Hindman (Mesosphere)

Europe/Zurich
31/3-004 - IT Amphitheatre (CERN)

31/3-004 - IT Amphitheatre

CERN

105
Show room on map
Description

Existing research has shown the benefits of running multi-level schedulers, either for single node parallel computation or multi-node distributed computation. But, there are some important practical considerations that must be addressed in order to use these multi-level scheduling architectures in multi-user production environments. In this presentation we'll discuss these practical considerations through lessons learned deploying Apache Mesos, a 2-level distributed scheduling system that has been used in organizations such as Twitter, PayPal, and Apple. We'll first highlight the multi-level scheduling systems that influenced Mesos as well as describe the 2-level Mesos architecture in detail. We'll then focus on the 1st-level scheduler of Mesos and the efficient multi-resource fair-sharing algorithm that it employs. Finally, we'll discuss the extensions that have been added over the years (or are being added today) driven by practical needs, from weights, to reservations, to quotas, to optimistic allocations, and deallocation.

About the speaker

Benjamin Hindman is a Founder and Chief Architect at Mesosphere where he leads a team building out core services for the Mesosphere Datacenter Operating System (DCOS). Ben co-created Apache Mesos as a PhD student at UC Berkeley before bringing it to Twitter where it now runs on tens of thousands of machines powering Twitter's datacenters. An academic at heart, his research in programming languages and distributed systems has been published in leading academic conferences.


Organised by: Jakob Blomer, PH Department and Miguel Angel Marquina
Computing Seminars /IT Department

More information