|
1 | | -#TensorFlow vs. Theano vs. Torch |
2 | | -In this study, I evaluate some popular deep learning frameworks. The candidates are listed in alphabet order: [TensorFlow](https://github.com/tensorflow/tensorflow), [Theano](https://github.com/Theano/Theano), and [Torch](https://github.com/torch/torch7). This is a dynamic document and the evaluation is based the current state of their code, not what the authors claim in white papers. |
| 1 | +# Evaluation of Deep Learning Toolkits |
| 2 | +In this study, I evaluate some popular deep learning toolkits. The candidates are listed in alphabetical order: [TensorFlow](https://github.com/tensorflow/tensorflow), [Theano](https://github.com/Theano/Theano), and [Torch](https://github.com/torch/torch7) [0]. This is a dynamic document and the evaluation, to the best of my knowledge, is based on the current state of their code. |
3 | 3 |
|
4 | | -This evaluation is mostly technical, to the best of my knowledge, and doesn't take into account the community size or who use what. If you find something wrong or incomplete, please help improve by creating an issue. |
| 4 | +I also provide ratings in each area because for a lot of people, ratings are useful. However, keep in mind that ratings are inherently subjective [1]. |
5 | 5 |
|
6 | | -##TensorFlow |
7 | | -<img src="http://www.androidcentral.com/sites/androidcentral.com/files/styles/large/public/article_images/2015/11/tensorflow.png" width="200"> |
| 6 | +If you find something wrong or incomplete, please help improve by creating an issue. |
8 | 7 |
|
9 | | -###Strengths |
10 | | -* Similar to Theano, TF allows specifying a symbolic graph of the network architecture via the Python interface. Since Theano pioneered the movement of symbolic graph and auto-gradient, I'll explain why this is great in Theano's section. |
11 | | -* Quick deployment of model: trained models can be deployed easily on a variety of devices (servers or mobile devices) without having to implement a separate model decoder or load Python/LuaJIT interpreter [1]. |
12 | | -* TensorBoard for visualization of the network architectures and the learning curves of trained models |
| 8 | +## Modeling Capability |
| 9 | +In this section, we evaluate each toolkit's ability to train common and state-of-the-art networks <u>without writing too much code</u>. Some of these networks are: |
| 10 | +- ConvNets: AlexNet, OxfordNet, GoogleNet |
| 11 | +- RecurrentNets: plain RNN, LSTM/GRU, bidirectional RNN |
| 12 | +- Sequential modeling with attention. |
13 | 13 |
|
14 | | -###Weaknesses |
15 | | -* TF performs much worse than its competitors, in both speed and memory usage, according to a [benchmark study by Soumith](https://github.com/soumith/convnet-benchmarks/issues/66). Note: Google is fixing these performance bugs. I'll update this remark accordingly when the issue is resolved. |
16 | | -* Symbolic loops (like `scan` in Theano) isn't ready (see [discussion here](https://github.com/tensorflow/tensorflow/issues/208)). Symbolic loops allow loops in a network (such as RNN) to be compiled, for better efficiency, rather than interpreted by Python (more details [here](http://deeplearning.net/software/theano/tutorial/loop.html)). |
| 14 | +In addition, we also evaluate the flexibility to create a new type of model. |
17 | 15 |
|
18 | | -##Theano |
19 | | -<img src="http://deeplearning.net/software/theano/_static/theano_logo_allblue_200x46.png" width="200"> |
20 | | -###Strengths |
21 | | -Theano is the first framework that uses the symbolic tensor graph [2] to specify a model. Any network can be represented as a graph of tensor flow. Hence, the tensor flow graph provides more flexibility than the layerwise approach used by Torch or Caffe. In the Torch/Caffe approach, you would need to define a new layer (with forward, backward, and gradient update functions) if what you want isn't already in the existing repository of layers. |
| 16 | +#### TensorFlow: 4 |
| 17 | +**For state-of-the-art models** |
| 18 | +- RNN API and implementation are suboptimal. The team also commented about it [here](https://github.com/tensorflow/tensorflow/issues/7) and [here](https://groups.google.com/a/tensorflow.org/forum/?utm_medium=email&utm_source=footer#!msg/discuss/B8HyI0tVtPY/aR43OIuUAwAJ). |
| 19 | +- Bidirectional RNN [not available yet](https://groups.google.com/a/tensorflow.org/forum/?utm_medium=email&utm_source=footer#!msg/discuss/lwgaL7WEuW4/UXaL4bYkAgAJ) |
| 20 | +- No 3D convolution, which is useful for video recognition |
22 | 21 |
|
23 | | -Why symbolic? The reason is efficiency. For example, you can specify a layer as `y = ReLU(W * x + b)` using Python syntax and only when you hit `run`, the graph processor will compile the graph into high-performance C++/CUDA code and carry out the computations. |
| 22 | +**For new models** |
| 23 | +In TF as in Theano, a network is specified as a symbolic graph of vector operations, such as matrix add/multiply or convolution. A layer is just a composition of those operations. The fine granularity of the building blocks (operations) allows users to invent new complex networks without worrying about backpropagation. |
24 | 24 |
|
25 | | -**Other strengths** |
| 25 | +The public release of TF doesn’t yet support loop and condition controls in the graph definition. This makes RNN implementations less ideal because they have to use Python loops and no graph compiler optimization can be made. |
26 | 26 |
|
27 | | -* There are several higher-level APIs that are built on top of Theano such as [Blocks](https://github.com/mila-udem/blocks), [Keras](https://github.com/fchollet/keras), etc. to make Theano easier to use for certain class of users (e.g. ones who are more familiar with the layerwise design of Caffe and Torch). |
28 | | -* Cross-platform: it works on Windows while TensorFlow and Torch do not (or very hacky to install) |
| 27 | +Google claimed to have this in their [white paper](http://download.tensorflow.org/paper/whitepaper2015.pdf) and [details are still being worked out](https://github.com/tensorflow/tensorflow/issues/208). |
29 | 28 |
|
30 | | -###Weaknesses |
31 | | -* The compilation process (from generated C++ code to binary) is slow. If you train a big net for several days, this overhead is nothing. However, we don't always train big models and this overhead becomes annoying. |
32 | | -TensorFlow doesn't have this problem. It simply maps the symbolic tensor operations to the corresponding function calls that are already compiled with the library. |
33 | 29 |
|
34 | | -* Even `import theano` is also slow because `import` apparently does a lot of stuffs. Also, after `import theano`, you are stuck with a pre-configured device (e.g. GPU0). |
35 | | -* Hard to improve and contribute as developers. The programming model is hacky: the whole code base is Python where C/CUDA code is packaged as Python string ([example here](https://github.com/Theano/Theano/blob/master/theano/tensor/nnet/conv.py#L1615)). |
| 30 | +#### Theano: 5 |
| 31 | +**State-of-the-art models** |
| 32 | +Theano has implementation for almost all of state-of-the-art networks, either in the form of a higher-level framework (e.g. [Blocks](https://github.com/mila-udem/blocks), [Keras](https://github.com/fchollet/keras), etc.) or in pure Theano. In fact, many recent research ideas (e.g. attentional model) started here. |
36 | 33 |
|
37 | | -##Torch |
38 | | -<img src="http://blog.johnassael.com/wp-content/uploads/2015/02/Screen-Shot-2015-02-23-at-05.59.09.png" width="200"> |
39 | | -###Strengths |
40 | | -* [Fastest among the mix](https://github.com/soumith/convnet-benchmarks) on convolutions |
41 | | -* *#minor* <br/> Have the [most comprehensive set of convolutions](https://github.com/torch/nn/blob/master/doc/convolution.md). It's worth to note that temporal convolution can be done in TensorFlow/Theano via `conv2d` but that's a trick. The native interface for temporal convolution in Torch makes it slightly more intuitive to use. |
| 34 | +**New models** |
| 35 | +Theano pioneered the trend of using symbolic graph for programming a network. Theano's symbolic API supports looping control, so-called [scan], which makes implementing RNNs easy and efficient. Users don't always have to define a new model at the tensor operations level. There are a few higher-level frameworks, mentioned above, which make model definition and training simpler. |
42 | 36 |
|
43 | | -###Weaknesses |
44 | | -* No Python interface! |
45 | | -* *#minor #highly-subjective* <br/> Like Caffe, Torch uses the layerwise approach instead of the mathematical graph approach of Theano/TensorFlow. This means that a network in Torch is a graph of layers while a network in Theano is a graph of mathematical functions (e.g. matrix add/multiply/etc.). Since a layer is just a function composition, Theano gives more flexibility. [3] |
| 37 | +#### Torch: 4.5 |
| 38 | +**State-of-the-art models** |
| 39 | +- Excellent for conv nets. It's worth noting that temporal convolution can be done in TensorFlow/Theano via `conv2d` but that's a trick. The native interface for temporal convolution in Torch makes it slightly more intuitive to use. |
| 40 | +- Rich set of RNNs available through a [non-official extension](https://github.com/Element-Research/rnn) [2] |
46 | 41 |
|
47 | | -___ |
| 42 | +**New models** |
| 43 | +In Torch, there are multiple ways (stack of layers or graph of layers) to define a network but essentially, a network is defined as a graph of layers. Because of this coarser granularity, Torch is considered less flexible because for new layer types. For new layer types, users have to implement the full forward, backward, and gradient input update. |
48 | 44 |
|
49 | | -###End Notes### |
50 | | -[1] See my [blog post](http://www.kentran.net/2014/12/challenges-in-machine-learning-practice.html) for why this is desirable. |
| 45 | +For those familiar with Caffe, this layerwise design is similar to Caffe. However, defining a new layer in Torch is much easier because you don't have to program in C++. Plus, in Torch, the difference between new layer definition and network definition is minimal. In Caffe, layers are defined in C++ while networks are defined via `Protobuf`. |
51 | 46 |
|
52 | | -[2] Any framework that uses the symbolic tensor flow graph model must have automatic differentiation. That's why I don't emphasize the auto-diff part. |
| 47 | +<center> |
| 48 | +<img src="http://i.snag.gy/0loNv.jpg" height="450"> <img src="https://camo.githubusercontent.com/49ac7d0f42e99d979c80a10d0ffd125f4b3df0ea/68747470733a2f2f7261772e6769746875622e636f6d2f6b6f7261796b762f746f7263682d6e6e67726170682f6d61737465722f646f632f6d6c70335f666f72776172642e706e67" height="450"><br> |
| 49 | +<i>Left: graph model of CNTK/Theano/TensorFlow; Right: graph model of Caffe/Torch</i> |
| 50 | +</center> |
53 | 51 |
|
54 | | -[3] To be fair, the comparison between layerwise design and tensor graph design is similar to the comparison between C# and C++. Both have their merits. However, for Theano and TensorFlow, it's fairly easy to add popular layer types as simple Python functions. Then, you would have both the flexibility of the native Theano and the convenience of Torch for those layers. |
| 52 | + |
| 53 | +## Interfaces |
| 54 | + |
| 55 | +#### TensorFlow: 4.9 |
| 56 | +TF supports two interfaces: Python and C++. This means that you can do experiments in a rich, high-level environment and deploy your model in an environment that requires native code or low latency. |
| 57 | + |
| 58 | +It would be perfect if TF supports `F#` or `TypeScript`. The lack of static type in Python is just ... painful :). |
| 59 | + |
| 60 | +#### Theano: 4.5 |
| 61 | +Python |
| 62 | + |
| 63 | +#### Torch: 4 |
| 64 | +Torch runs on LuaJIT, which is amazingly fast (comparable with industrial languages such as C++/C#/Java). Hence developers don't have to think about symbolic programming, which can be limited, when using Torch. They can just write all kinds of computations without worrying about performance penalty. |
| 65 | + |
| 66 | +However, let's face it, Lua is not yet a mainstream language. |
| 67 | + |
| 68 | +## Model Deployment |
| 69 | +How easy to deploy a new model? |
| 70 | + |
| 71 | +#### TensorFLow: 4.5 |
| 72 | +TF supports C++ interface and the library can be compiled/optimized on ARM architectures because it uses [Eigen](eigen.tuxfamily.org) (instead of a BLAS library). This means that you can deploy your trained models on a variety of devices (servers or mobile devices) without having to implement a separate model decoder or load Python/LuaJIT interpreter [3]. |
| 73 | + |
| 74 | +TF doesn't work on Windows yet so TF models can't be deployed on Windows devices though. |
| 75 | + |
| 76 | +#### Theano: 3 |
| 77 | +The lack of low-level interface and the inefficiency of Python interpreter makes Theano less attractive for industrial users. For a large model, the overhead of Python isn’t too bad but the dogma is still there. |
| 78 | + |
| 79 | +The cross-platform nature (mentioned below) enables a Theano model to be deployed in a Windows environment. Which helps it gain some points. |
| 80 | + |
| 81 | +#### Torch: 3 |
| 82 | +Torch require LuaJIT to run models. This makes it less attractive than bare bone C++ support of TF. It’s not just the performance overhead, which is minimal. The bigger problem is integration, at API level, with a bigger production pipeline. |
| 83 | + |
| 84 | + |
| 85 | +## Performance |
| 86 | +### Single-GPU |
| 87 | +All of these toolkits call cuDNN so as long as there’s no major computations or memory allocations at the outer level, they should perform similarly. |
| 88 | + |
| 89 | +Soumith@FB has done some [benchmarking for ConvNets](https://github.com/soumith/convnet-benchmarks). Deep Learning is not just about feedforward convnets, not just about ImageNet, and certainly not just about a few passes over the network. However, Soumith’s benchmark is the only notable one as of today. So we will base the Single-GPU performance rating based on his benchmark. |
| 90 | + |
| 91 | +#### TensorFLow: 3 |
| 92 | +TF only uses cuDNN v2 and even so, its performance is ~1.5x slower than Torch with cuDNN v2. It also runs out of memory when training GoogleNet with batch size 128. More details [here](https://github.com/soumith/convnet-benchmarks/issues/66). |
| 93 | + |
| 94 | +A few issues have been identified in that thread: excessive memory allocation, different tensor layout from cuDNN’s, no in-place op, etc. |
| 95 | + |
| 96 | +#### Theano: 3 |
| 97 | +On big networks, Theano’s performance is on par with Torch7, according to [this benchmark](http://arxiv.org/pdf/1211.5590v1.pdf). The main issue of Theano is startup time, which is terrible, because Theano has to compile C/CUDA code to binary. We don’t always train big models. In fact, DL researchers often spend more time debugging than training big models. TensorFlow doesn’t have this problem. It simply maps the symbolic tensor operations to the already-compiled corresponding function calls. |
| 98 | + |
| 99 | +Even `import theano` takes time because this `import` apparently does a lot of stuffs. Also, after `import Theano`, you are stuck with a pre-configured device (e.g. `GPU0`). |
| 100 | + |
| 101 | +#### Torch: 5 |
| 102 | +Simply awesome without the \*bugs\* that TensorFlow and Theano have. |
| 103 | + |
| 104 | +### Multi-GPU |
| 105 | +I haven’t yet tried training these toolkits on multiple GPUs so this evaluation is currently about the ease for multi-GPU and/or distributed training. |
| 106 | + |
| 107 | +#### TensorFlow: 4 |
| 108 | +The programming model for using multiple GPUs in a single box is fairly straight-forward. The memory transfer from GPU to CPU and aggregating results from multiple GPUs is fairly seamless. TF provides an [example here](https://github.com/tensorflow/tensorflow/blob/1d76583411038767f673a0c96174c80eaf9ff42f/tensorflow/models/image/cifar10/cifar10_multi_gpu_train.py). |
| 109 | + |
| 110 | +Distributed training isn't publicly available yet but [there's a plan](https://github.com/tensorflow/tensorflow/issues/23) and it's high-priority. |
| 111 | + |
| 112 | +#### Theano: 2 |
| 113 | +Theano doesn’t support multi-GPU natively. There’s a lot of low-level programming that programmers need to do, e.g. spawning processes and aggregating results using the multiprocessing library. A tutorial is provided [here](https://github.com/Theano/Theano/wiki/Using-Multiple-GPUs). |
| 114 | + |
| 115 | +#### Torch: 3. |
| 116 | +Torch’s multi-GPU training (available by using [fbcunn](https://github.com/facebook/fbcunn) package) looks less seamless than TensorFlow but more so than Theano. [Here’s an illustrating example](https://github.com/soumith/imagenet-multiGPU.torch/blob/master/train.lua). |
| 117 | + |
| 118 | +There's no known plan for distributed training. For most organizations, distributed training is a hype though. Even if a company or a team can afford a cluster of GPUs, running jobs in parallel (due to hyper-parameter sweep and/or multiple users) often provides better utilization of the cluster. |
| 119 | + |
| 120 | +## Model Debugging |
| 121 | +TF has a visualization companion called TensorBoard. Using TensorBoard, you can track any variable (weights change, accuracy, etc.) over time. This is useful for debugging and analyzing the learning curves of models. The [logging mechanism](http://tensorflow.org/how_tos/summaries_and_tensorboard/index.html#serializing-the-data) is fairly simple. |
| 122 | + |
| 123 | + |
| 124 | +## Architecture |
| 125 | +Developer Zone |
| 126 | + |
| 127 | +#### TensorFlow: 5 |
| 128 | +TF has a clean, modular architecture with multiple frontends and execution platforms. Details are in the [white paper](http://download.tensorflow.org/paper/whitepaper2015.pdf). |
| 129 | + |
| 130 | +<img src="http://i.snag.gy/sJlZe.jpg" width="500"> |
| 131 | + |
| 132 | +#### Theano: 3 |
| 133 | +The architecture is fairly hacky: the whole code base is Python where C/CUDA code is packaged as Python string. This makes it hard to navigate, debug, refactor, and hence contribute as developers. |
| 134 | + |
| 135 | +#### Torch: 5 |
| 136 | +Torch7 and nn libraries are also well-designed with clean, modular interfaces. |
| 137 | + |
| 138 | +## Ecosystem |
| 139 | +#### TensorFlow: 5 |
| 140 | +Python and C++ |
| 141 | + |
| 142 | +#### Theano: 4 |
| 143 | +Python |
| 144 | + |
| 145 | +#### Torch: 3 |
| 146 | +Lua is not a mainstream language and hence libraries built for it are not as rich as ones built for Python. |
| 147 | + |
| 148 | +Plus, there’s an increasingly-popular JIT project for Python, called [PyPy](http://pypy.org/). The advantage of LuaJIT would become minimal if/when PyPy becomes as fast as LuaJIT and fully-compatible with CPython. |
| 149 | + |
| 150 | + |
| 151 | +## Cross-platform |
| 152 | +While Theano works on all OSes, TF and Torch do not work on Windows and there's no known plan to port from either camp. |
| 153 | + |
| 154 | +<br> |
| 155 | +___ |
| 156 | + |
| 157 | +**End Notes** |
| 158 | + |
| 159 | +[0] There are other popular toolkits that I haven’t included in my review yet for a wide variety of reasons: [Caffe](https://github.com/BVLC/caffe), [CNTK](http://cntk.codeplex.com), [MXNet](https://github.com/dmlc/mxnet). |
| 160 | + |
| 161 | +[1] Note that I don’t aggregate ratings because different users/developers have different priorities. |
| 162 | + |
| 163 | +[2] Disclaimer: I haven’t analyzed this extension carefully. |
| 164 | + |
| 165 | +[3] See my [blog post](http://www.kentran.net/2014/12/challenges-in-machine-learning-practice.html) for why this is desirable. |
0 commit comments