Skip to content

Commit 329bfef

Browse files
authored
Merge pull request bytedance#1 from lileicc/patch-1
update readme to make it clear
2 parents 014c4ff + a021fbb commit 329bfef

File tree

1 file changed

+10
-7
lines changed

1 file changed

+10
-7
lines changed

README.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
1-
# Byseqlib
2-
Byseqlib is a high performance inference library for SOTA NLU/NLG models. It's built on
3-
CUDA official library([cuBLAS](https://docs.nvidia.com/cuda/cublas/index.html),
4-
[Thrust](https://docs.nvidia.com/cuda/thrust/index.html), [CUB](http://nvlabs.github.io/cub/)) and custom kernel functions which are specially fused and
5-
optimized for these widely used models. In addition to model components, we also provide codes
1+
# Byseqlib: A High Performance Inference Library for Sequence Processing and Generation
2+
3+
Byseqlib is a high performance inference library for sequence processing and generation implemented in CUDA.
4+
It enables highly efficient computation of modern NLP models such as **BERT**, **GPT2**, **Transformer**, etc.
5+
It is therefore best useful for *Machine Translation*, *Text Generation*, *Dialog**Language Modelling*, and other related tasks using these models.
6+
7+
The library is built on top of CUDA official library([cuBLAS](https://docs.nvidia.com/cuda/cublas/index.html),
8+
[Thrust](https://docs.nvidia.com/cuda/thrust/index.html), [CUB](http://nvlabs.github.io/cub/)) and custom kernel functions which are specially fused and optimized for these widely used models. In addition to model components, we also provide codes
69
manage model weights trained from deepleanring framework and servers as a custom backend for
710
[TensorRT Inference
811
Server](https://docs.nvidia.com/deeplearning/sdk/inference-server-archived/tensorrt_inference_server_120/tensorrt-inference-server-guide/docs/quickstart.html)(referred
@@ -11,8 +14,8 @@ your own model architectures just with a little code modification.
1114

1215

1316
## Features
14-
- Currently supports Transformer(with beam search) and GPT-2 language model.
15-
- Out-of-the-box end-to-end model server based on trtis.
17+
- Currently supports BERT, Transformer(with beam search) and GPT-2 language model.
18+
- Out-of-the-box end-to-end model server based on TRTIS.
1619
- In addition to FP32, FP16 inference is also supported with no loss of accuracy even when the model weight is in FP32.
1720
- High inference performance compared with TensorFlow(8x+ speedup on Transformer with beam search,
1821
4x+ speedup on GPT-2 LM).

0 commit comments

Comments
 (0)