Skip to content

Poor accuracy with non-US regional settings #4910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ddobric opened this issue Mar 3, 2020 · 6 comments · Fixed by #5145
Closed

Poor accuracy with non-US regional settings #4910

ddobric opened this issue Mar 3, 2020 · 6 comments · Fixed by #5145
Assignees
Labels
bug Something isn't working image Bugs related image datatype tasks P1 Priority of the issue for triage purpose: Needs to be fixed soon.

Comments

@ddobric
Copy link

ddobric commented Mar 3, 2020

System information

  • OS Windows 10:
  • .NET Core 2.2:

Issue

Getting poor accuracy running the training code on a system with non-US regional settings. The issue is number format. After replacing ',' as decimal symbol to '.' all works fine.

Accuracy with ',' decimal symbol:

image

Accuracy with '.' decimal symbol:

image

Is there some way to take a control of localization in .NET ML?

Thanks
Damir

@mstfbl
Copy link
Contributor

mstfbl commented Mar 3, 2020

Hi Damir,

You bring up a very good edge case we shouldn't forget, which is that decimal points are represented with commas in a lot of regions outside the US, (For example, 3.5 * 10^1 is represented as 3.5 in the US and 3,5 in Europe, and 3.5 * 10^5 is represented as 350,000 in the US and 350.000 in Europe). We will look into this, thank you!

Edit: Another good edge case is the negative (-) sign. In some cultures (e.g. Faroese in Denmark), -1 is denoted as: −1. Here, the negative sign has ASCII value of 8722 (which is not on the normal ASCII table), and the default negative sign has ASCII value of 45. In addition, some cultures have text written from right to left (e.g. Arabic, Hebrew), so with these the negative sign will be to the right of the value (e.g. 3-, 4.05-).

@mstfbl mstfbl added bug Something isn't working P1 Priority of the issue for triage purpose: Needs to be fixed soon. labels Mar 3, 2020
@justinormont
Copy link
Contributor

We may want to setup a CI leg which tests another culture. Perhaps "de-DE" or "ru-RU".

@ddobric
Copy link
Author

ddobric commented Mar 5, 2020

How about this or similar?

MLContext ctx = new MLContext(.., culture: CultureInfo.InvariantCulure)

@mstfbl
Copy link
Contributor

mstfbl commented Mar 5, 2020

Here is a .NET Fiddle that demonstrates some of these cultural differences in .NET Core 3.1 (major thanks to @justinormont for the idea and the initial implementation): https://dotnetfiddle.net/LtAtoi

@mstfbl mstfbl self-assigned this Apr 13, 2020
@mstfbl
Copy link
Contributor

mstfbl commented Apr 13, 2020

Hey @ddobric , can you provide sample code and data for which you've used to obtain the difference in your results above? It'll be helpful in testing the validity of a PR to fix this issue. Thanks!

@harishsk harishsk added the image Bugs related image datatype tasks label Apr 29, 2020
@mstfbl
Copy link
Contributor

mstfbl commented May 4, 2020

Hi @ddobric, I'm reaching out again to ask if you can provide sample code and data with which you've encountered this issue. Thanks.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working image Bugs related image datatype tasks P1 Priority of the issue for triage purpose: Needs to be fixed soon.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants