Skip to content

[Tokenizers] Port BERTTokenizers #6991

Open
@ericstj

Description

@ericstj

Porting BERTTokenizers enables several text embedding generation models. Requires #6988.

https://github.com/huggingface/text-embeddings-inference?tab=readme-ov-file#text-embeddings.
https://github.com/huggingface/transformers/blob/v4.37.0/src/transformers/models/bert/tokenization_bert.py#L137

cc @luisquintanilla

We already have some BERT implementation which may be sufficient.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1Priority of the issue for triage purpose: Needs to be fixed soon.enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions