Skip to content

Commit b8f27e3

Browse files
committed
Update README to new API
1 parent ae094c8 commit b8f27e3

File tree

1 file changed

+13
-9
lines changed

1 file changed

+13
-9
lines changed

README.md

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -39,24 +39,28 @@ pip install bert-pytorch
3939
## Quickstart
4040

4141
**NOTICE : Your corpus should be prepared with two sentences in one line with tab(\t) separator**
42+
43+
### 0. Prepare your corpus
4244
```
43-
Welcome to the \t the jungle \n
44-
I can stay \t here all night \n
45+
Welcome to the \t the jungle\n
46+
I can stay \t here all night\n
4547
```
4648

47-
### 1. Building vocab based on your corpus
48-
```shell
49-
bert-vocab -c data/corpus.small -o data/corpus.small.vocab
49+
or tokenized corpus (tokenization is not in package)
5050
```
51+
Wel_ _come _to _the \t _the _jungle\n
52+
_I _can _stay \t _here _all _night\n
53+
```
54+
5155

52-
### 2. Building BERT train dataset with your corpus
56+
### 1. Building vocab based on your corpus
5357
```shell
54-
bert-dataset -d data/corpus.small -v data/corpus.small.vocab -o data/dataset.small
58+
bert-vocab -c data/corpus.small -o data/vocab.small
5559
```
5660

57-
### 3. Train your own BERT model
61+
### 2. Train your own BERT model
5862
```shell
59-
bert -d data/dataset.small -v data/corpus.small.vocab -o output/bert.model
63+
bert -c data/dataset.small -v data/vocab.small -o output/bert.model
6064
```
6165

6266
## Language Model Pre-training

0 commit comments

Comments
 (0)