Quincy: Fair scheduling for Distributed Computing Clusters

On the next paper in the datacenter scheduling paper series, we'll cover Quincy, which is a paper published by Microsoft in 2008. This is also the paper that Firmament (http://firmament.io/) is primarily based on. http://www.sigops.org/sosp/sosp09/papers/isard-sosp09.pdf There is a growing number of data intensive jobs (and applications), and various jobs can take from minutes to even days. Therefore, it … Continue reading Quincy: Fair scheduling for Distributed Computing Clusters

Improving MapReduce Performance in Heterogeneous Environments

'Improving MapReduce Performance in Heterogeneous Environments' is the first paper in the collection of scheduling papers I'd like to cover. If you like to learn the motivation behind this series you can find it here. As mentioned, I will also add my personal comments about this paper from Mesos perspective. This is the very first … Continue reading Improving MapReduce Performance in Heterogeneous Environments

A survey of datacenter scheduling papers

Scheduling workloads in a cluster is not a new topic and have years of research behind it. But scheduling recently became popular because of the popularity of containers (Docker) and the rise of scheduling frameworks that can run more than just MapReduce and OpenMP jobs (Mesos and Kubernetes). Having the opportunity  to work on Mesos at Mesosphere and becoming a PMC in this … Continue reading A survey of datacenter scheduling papers

How to build Apache Mesos on Mac

Today I tried to build Apache Mesos on my Macbook Pro, although it's fairly simple there are just a few gotchas. So decided to put the steps here: 1, Clone the code (git clone http://github.com/apache/mesos.git) 2, Install homebrew (http://brew.sh/) 3, Add homebrew taps: % brew tap homebrew/versions % brew tap homebrew/science % brew tap homebrew/apache … Continue reading How to build Apache Mesos on Mac

Drill on AWS EMR

I'm happy to announce that Drill is now able to be launched on Amazon EMR! I worked with the Amazon EMR team to develop the Bootstrap action script that installs and configures Drill on EMR.How to RunFrom the Elastic MapReduce CLI (which you can install using theseinstructionshttp://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-cli-install.html)./elastic-mapreduce --create --alive --name "Drill.deploy" --instance-typem1.large --ami-version 3.0.1 --hbase … Continue reading Drill on AWS EMR

Introduction to Apache Drill talk

After working on Drill since Oct last year, I presented for the very first time last night at the Big Data Bellevue meetup Introduction to Apache Drill. The talk sparked many interesting discussions, especially around data cataloging, use cases, nested data, how to handle multi-tenancy and scheduling, etc. The slides are here: https://speakerdeck.com/tnachen/introduction-to-apache-drill-big-data-bellevue-20131023 Next talk … Continue reading Introduction to Apache Drill talk