File tree 1 file changed +13
-9
lines changed
1 file changed +13
-9
lines changed Original file line number Diff line number Diff line change @@ -39,24 +39,28 @@ pip install bert-pytorch
39
39
## Quickstart
40
40
41
41
** NOTICE : Your corpus should be prepared with two sentences in one line with tab(\t) separator**
42
+
43
+ ### 0. Prepare your corpus
42
44
```
43
- Welcome to the \t the jungle \n
44
- I can stay \t here all night \n
45
+ Welcome to the \t the jungle\n
46
+ I can stay \t here all night\n
45
47
```
46
48
47
- ### 1. Building vocab based on your corpus
48
- ``` shell
49
- bert-vocab -c data/corpus.small -o data/corpus.small.vocab
49
+ or tokenized corpus (tokenization is not in package)
50
50
```
51
+ Wel_ _come _to _the \t _the _jungle\n
52
+ _I _can _stay \t _here _all _night\n
53
+ ```
54
+
51
55
52
- ### 2 . Building BERT train dataset with your corpus
56
+ ### 1 . Building vocab based on your corpus
53
57
``` shell
54
- bert-dataset -d data/corpus.small -v data/corpus.small.vocab - o data/dataset .small
58
+ bert-vocab -c data/corpus.small -o data/vocab .small
55
59
```
56
60
57
- ### 3 . Train your own BERT model
61
+ ### 2 . Train your own BERT model
58
62
``` shell
59
- bert -d data/dataset.small -v data/corpus .small.vocab -o output/bert.model
63
+ bert -c data/dataset.small -v data/vocab .small -o output/bert.model
60
64
```
61
65
62
66
## Language Model Pre-training
You can’t perform that action at this time.
0 commit comments