You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Dec 18, 2021. It is now read-only.
Yes, the input data can not be reliable processed if text is either short (single words) or short and mixed. To me it makes sense: in first text there is the word facebook and posts, in the second there is no english word.
This restriction is due to the underlying lang detect module, this plugin can not change this.
I see the point that URL is not text. But there is many data that is not text. So I think URL/URI is only one example.
For this plugin, I think the most viable approach is to only use input for lang detect that is preprocessed in the sense that it is recognizable language.
Most general approach would be part-of-speech (POS) tagging like in natural language processing / text mining. It would be a good idea to combine POS tagger with language detection like this plugin can do.
On small text with url in it, english is almost always detected
Example :
an arabic tweet with an url :
Produces :
English is detected with a greater probability...
Without any url :
Produces :
english is not even detected !
I can submit a pull request, I've already done the changes on my own.
The text was updated successfully, but these errors were encountered: