Quincy: Fair scheduling for Distributed Computing Clusters

On the next paper in the datacenter scheduling paper series, we'll cover Quincy, which is a paper published by Microsoft in 2008. This is also the paper that Firmament (http://firmament.io/) is primarily based on. http://www.sigops.org/sosp/sosp09/papers/isard-sosp09.pdf There is a growing number of data intensive jobs (and applications), and various jobs can take from minutes to even days. Therefore, it … Continue reading Quincy: Fair scheduling for Distributed Computing Clusters →

Improving MapReduce Performance in Heterogeneous Environments

'Improving MapReduce Performance in Heterogeneous Environments' is the first paper in the collection of scheduling papers I'd like to cover. If you like to learn the motivation behind this series you can find it here. As mentioned, I will also add my personal comments about this paper from Mesos perspective. This is the very first … Continue reading Improving MapReduce Performance in Heterogeneous Environments →

A survey of datacenter scheduling papers

Scheduling workloads in a cluster is not a new topic and have years of research behind it. But scheduling recently became popular because of the popularity of containers (Docker) and the rise of scheduling frameworks that can run more than just MapReduce and OpenMP jobs (Mesos and Kubernetes). Having the opportunity to work on Mesos at Mesosphere and becoming a PMC in this … Continue reading A survey of datacenter scheduling papers →

2014 in review

The WordPress.com stats helper monkeys prepared a 2014 annual report for this blog. Here's an excerpt: A New York City subway train holds 1,200 people. This blog was viewed about 5,900 times in 2014. If it were a NYC subway train, it would take about 5 trips to carry that many people. Click here to … Continue reading 2014 in review →

Kafka common addons

The blog post I wrote for Packt is now published on their blog, talks about some common tools people use with Kafka: https://www.packtpub.com/books/content/common-kafka-addons Tim

Docker on Mesos 0.20

In our recent MesosCon survey to the existing Mesos users, one of the biggest feature ask was to have Docker integration into Mesos. Although users can already launch Docker images with Mesos thanks to the external containerizer work with Deimos, that approach still requires a external component to be installed on each slave and also we … Continue reading Docker on Mesos 0.20 →

How to build Apache Mesos on Mac

Today I tried to build Apache Mesos on my Macbook Pro, although it's fairly simple there are just a few gotchas. So decided to put the steps here: 1, Clone the code (git clone http://github.com/apache/mesos.git) 2, Install homebrew (http://brew.sh/) 3, Add homebrew taps: % brew tap homebrew/versions % brew tap homebrew/science % brew tap homebrew/apache … Continue reading How to build Apache Mesos on Mac →

Drill on AWS EMR

I'm happy to announce that Drill is now able to be launched on Amazon EMR! I worked with the Amazon EMR team to develop the Bootstrap action script that installs and configures Drill on EMR.How to RunFrom the Elastic MapReduce CLI (which you can install using theseinstructionshttp://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-cli-install.html)./elastic-mapreduce --create --alive --name "Drill.deploy" --instance-typem1.large --ami-version 3.0.1 --hbase … Continue reading Drill on AWS EMR →

Lifetime of a Query in Drill Alpha Release

Ellen invited me to give a talk at the Bay Area Apache Drill group, and after working on Limit operator end to end, it gave me the idea I can illustrate how Drill handles a query and what happens in each stage from the time it receives the query, to the point when the result … Continue reading Lifetime of a Query in Drill Alpha Release →

Introduction to Apache Drill talk

After working on Drill since Oct last year, I presented for the very first time last night at the Big Data Bellevue meetup Introduction to Apache Drill. The talk sparked many interesting discussions, especially around data cataloging, use cases, nested data, how to handle multi-tenancy and scheduling, etc. The slides are here: https://speakerdeck.com/tnachen/introduction-to-apache-drill-big-data-bellevue-20131023 Next talk … Continue reading Introduction to Apache Drill talk →