Author: Han Xiao https://hanxiao.github.io
A collection of frequently-used deep learning blocks I have implemented in Tensorflow. It covers the core tasks in NLP such as embedding, encoding, matching and pooling. All implementations follow a modularized design pattern which I called the "block-design". More details can be found in my blog post.
- Python >= 3.6
- Tensorflow >= 1.6
A collection of sequence encoding blocks. Input is a sequence with shape of [B, L, D], output is another sequence in [B, L, D'], where B is batch size, L is the length of the sequence and D and D' are the dimensions.
| Name | Dependencies | Description | Reference | 
|---|---|---|---|
| LSTM_encode | a fast multi-layer bidirectional LSTM implementation based on CudnnLSTM. Expect to be 5~10x faster than the standard tfLSTMCell. However, it can only run on GPU. | Tensorflow doc on CudnnLSTM | |
| TCN_encode | Res_DualCNN_encode | a temporal convolution network described in the paper, basically a multi-layer dilated CNN with special padding to ensure the causality | An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling | 
| Res_DualCNN_encode | CNN_encode | a sub-block used by TCN_encode. It is a two-layer CNN with spatial dropout in-between, then followed by a residual connection and a layer-norm. | An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling | 
| CNN_encode | a standard conv1dimplementation onLaxis, with the possibility to set different paddings | Convolutional Neural Networks for Sentence Classification | 
A collection of sequence matching blocks, aka. attention. Input are two sequnces: context in the shape of [B, L_c, D], and query in the shape of [B, L_q, D]. The output is a sequence has the same length as context, i.e. with shape of [B, L_c, D]. Each position in the output should encodes the relevance of that position in context to the complete query.
| Name | Dependencies | Description | Reference | 
|---|---|---|---|
| Attentive_match | basic attention mechanism with different scoring functions, also supports future blinding. | additive: Neural machine translation by jointly learning to align and translate;scaled: Attention is all you need | |
| Transformer_match | a multi-head attention block from "Attention is all you need" | Attention is all you need | |
| AttentiveCNN_match | Attentive_match | the light version of attentive convolution, with the possibility of future blinding to ensure causality. | Attentive Convolution | 
| BiDaf_match | attention flow layer used in bidaf model. | Bidirectional Attention Flow for Machine Comprehension | 
A collection of pooling blocks. It fuses/reduces on the time axis L. Input is a sequence with shape of [B, L, D], output is in [B, D].
| Name | Dependencies | Description | Reference | 
|---|---|---|---|
| SWEM_pool | do pooling on the input sequence, supports max/avg. pooling, hierarchical avg. max pooling. | Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms | 
There are also some convolution-based pooling blocks build on SWEM_pool, but they are for experimental purpose. Thus, I will not list them here.
A collection of positional encoding on the sequence.
| Name | Dependencies | Description | Reference | 
|---|---|---|---|
| SinusPositional_embed | generate a sinusoid signal that has the same length of the input sequence | Attention is all you need | |
| Positional_embed | parameterize the absolute position of the tokens in the input sequence | A Convolutional Encoder Model for Neural Machine Translation | 
A collection of multi-task learning blocks. So far only the "cross-stitch block" is available.
| Name | Dependencies | Description | Reference | 
|---|---|---|---|
| CrossStitch | a cross-stitch block, modeling the correlation & self-correlation of two tasks | Cross-stitch Networks for Multi-task Learning | |
| Stack_CrossStitch | CrossStitch | stacking multiple cross-stitch blocks together with shared/separated input | Cross-stitch Networks for Multi-task Learning | 
A collection of auxiliary functions, e.g. masking, normalizing, slicing.
Run app.py for a simple test on toy data.