Datasets' cache not re-used

## Describe the bug
For most tokenizers I have tested (e.g. the RoBERTa tokenizer), the data preprocessing cache are not fully reused in the first few runs, although their `.arrow` cache files are in the cache directory.

## Steps to reproduce the bug
Here is a reproducer. The GPT2 tokenizer works perfectly with caching, but not the RoBERTa tokenizer in this example.
```python
from datasets import load_dataset
from transformers import AutoTokenizer

raw_datasets = load_dataset("wikitext", "wikitext-2-raw-v1")
# tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
text_column_name = "text"
column_names = raw_datasets["train"].column_names

def tokenize_function(examples):
    return tokenizer(examples[text_column_name], return_special_tokens_mask=True)

tokenized_datasets = raw_datasets.map(
    tokenize_function,
    batched=True,
    remove_columns=column_names,
    load_from_cache_file=True,
    desc="Running tokenizer on every text in dataset",
)
```

## Expected results
No tokenization would be required after the 1st run. Everything should be loaded from the cache.

## Actual results
Tokenization for some subsets are repeated at the 2nd and 3rd run. Starting from the 4th run, everything are loaded from cache.

## Environment info

- `datasets` version: 1.18.3
- Platform: Ubuntu 18.04.6 LTS
- Python version: 3.6.9
- PyArrow version: 6.0.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Datasets' cache not re-used #3847

Describe the bug

Steps to reproduce the bug

Expected results

Actual results

Environment info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Datasets' cache not re-used #3847

Description

Describe the bug

Steps to reproduce the bug

Expected results

Actual results

Environment info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions