This repository was archived by the owner on Dec 18, 2021. It is now read-only.
This repository was archived by the owner on Dec 18, 2021. It is now read-only.
apply filter on text before ngram detection #18
Closed
Description
I ran into an issue which could be solved by running some custom filters (I do not mean Lucene filters, but more things like predefined filters, eg lowercase, uppercase, ...) :
I get the following french tweet with an uppercase text :
COMMENT DES GENS PEUVENT TROUVER DES CÉLÉBRITÉS DANS LES MAGASINS JE PEUX MÊME PAS TROUVER MA MÈRE
which is detected as english :
{
"language": "en",
"probability": 0.9999937971825049
}
But when I ask for the exact same text lowercased,
comment des gens peuvent trouver des célébrités dans les magasins je peux même pas trouver ma mère
{
"language": "fr",
"probability": 0.9999970343219597
}
french is now detected
Metadata
Metadata
Assignees
Labels
No labels