You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Dec 18, 2021. It is now read-only.
I ran into an issue which could be solved by running some custom filters (I do not mean Lucene filters, but more things like predefined filters, eg lowercase, uppercase, ...) :
I get the following french tweet with an uppercase text :
COMMENT DES GENS PEUVENT TROUVER DES CÉLÉBRITÉS DANS LES MAGASINS JE PEUX MÊME PAS TROUVER MA MÈRE
which is detected as english :
The lang detect module is case sensitive. It is possible that for french, ngram freqs for upper case have not been recorded. This is due to the upstream lang detect module, not this plugin.
I ran into an issue which could be solved by running some custom filters (I do not mean Lucene filters, but more things like predefined filters, eg lowercase, uppercase, ...) :
I get the following french tweet with an uppercase text :
COMMENT DES GENS PEUVENT TROUVER DES CÉLÉBRITÉS DANS LES MAGASINS JE PEUX MÊME PAS TROUVER MA MÈRE
which is detected as english :
But when I ask for the exact same text lowercased,
comment des gens peuvent trouver des célébrités dans les magasins je peux même pas trouver ma mère
french is now detected
The text was updated successfully, but these errors were encountered: