[MRG+1] Make csr row norms support fused types #6785

yenchenlin · 2016-05-16T06:39:07Z

Since csr_row_norms is called by row_norms function defined in sklearn/utils/tests/test_extmath.py , and row_norms is used widely in k_means_.py,
it will be useful if csr_row_norms function also supports cython fused types.

However, making this change would degrade the precision of the function.
In order to pass the local test, I have to alleviate the strictness by changing assert_array_almost_equal's last n decimal digit from 5 to 4.

May @MechCoder and @jnothman give me some advice on this trade-off?

TomDLT · 2016-05-16T14:41:44Z

Tests are failing, you have to reduce the precision for the sqrt also.
It's normal to loose precision with float32: We have around 6 significant digits, and the row norms are around 100, so a precision of 1e-4 seems legit to me.

jnothman · 2016-05-17T00:02:38Z

sklearn/utils/sparsefuncs_fast.pyx

-        np.ndarray[int, ndim=1, mode="c"] indptr = X.indptr
+        unsigned int n_samples = shape[0]
+        unsigned int n_features = shape[1]
+        np.ndarray[double, ndim=1, mode="c"] norms


Why is this no longer DOUBLE?

You mean your comment or the dtype?

Sorry, it's my mistake.

BTW, since type double comes from import numpy as np and typenp.float64_t comes from cimport numpy as np,
is there a big performance difference between them?

I don't see how double comes from import numpy as np. double is a C type whose size is officially unspecified, but to be honest I'm not sure if that's the only reason the more precise numpy type is preferred over the C type.

yenchenlin · 2016-05-19T04:26:50Z

@jnothman Sorry, my execution time test before is not correct.

After debugging and running it again, result shows that running time increases from 9.9s to 13s if we explicitly cast every entry as we multiply it.

It seems that it indeed cause a big runtime hit ...

yenchenlin · 2016-05-19T04:30:20Z

However, test can be passed if I change test's precision from 1e-6 to 1e-4 as @TomDLT suggested.
Does 1e-4 seems legit to you? @jnothman @MechCoder

jnothman · 2016-05-23T03:04:15Z

Oh. I thought I'd replied to this. But perhaps for the same reason than I'm still not sure what to say, I didn't. I did briefly try to look for a BLAS or similar function (I'm not very familiar with what's available) that might support fast sum-of-squares, potentially with a result in higher precision than the input. I agree that 30% seems a substantial runtime hit. I suppose we can accept the loss in precision. :/

MechCoder · 2016-05-23T15:58:15Z

sklearn/utils/tests/test_extmath.py


-    assert_array_almost_equal(sq_norm, row_norms(X, squared=True), 5)
-    assert_array_almost_equal(np.sqrt(sq_norm), row_norms(X))
+        assert_array_almost_equal(sq_norm, row_norms(X, squared=True), 4)


It might be a good idea to test for the float64 and float32 dtype separately, with a higher precision for the float64 dtype. WDYT?

MechCoder · 2016-05-23T16:22:05Z

Just a minor comment, +1 otherwise

MechCoder · 2016-05-25T14:51:48Z

@jnothman merge?

jnothman · 2016-05-25T20:53:03Z

Let's do it.

jnothman reviewed May 17, 2016
View reviewed changes

Make csr_row_norms support fused types

65e91f7

yenchenlin force-pushed the make-csr_row_norms-support-fused-types branch from 8930eb6 to ad211a1 Compare May 19, 2016 03:03

yenchenlin changed the title ~~Make csr row norms support fused types~~ [MRG] Make csr row norms support fused types May 21, 2016

yenchenlin mentioned this pull request May 22, 2016

[MRG+2] MAINT: Remove add_row_csr #6676

Merged

MechCoder reviewed May 23, 2016
View reviewed changes

MechCoder changed the title ~~[MRG] Make csr row norms support fused types~~ [MRG+1] Make csr row norms support fused types May 23, 2016

Test row_norms for float32 data

c7d6f9f

yenchenlin force-pushed the make-csr_row_norms-support-fused-types branch from ad211a1 to c7d6f9f Compare May 25, 2016 12:29

jnothman merged commit a6a6ff6 into scikit-learn:master May 25, 2016

rth mentioned this pull request Aug 31, 2017

TfidfVectorizer insists on np.float64 #6468

Closed

Uh oh!

[MRG+1] Make csr row norms support fused types #6785

[MRG+1] Make csr row norms support fused types #6785

Uh oh!

Conversation

yenchenlin commented May 16, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomDLT commented May 16, 2016

Uh oh!

jnothman May 17, 2016

Choose a reason for hiding this comment

Uh oh!

MechCoder May 17, 2016

Choose a reason for hiding this comment

Uh oh!

jnothman May 17, 2016

Choose a reason for hiding this comment

Uh oh!

yenchenlin May 19, 2016

Choose a reason for hiding this comment

Uh oh!

jnothman May 19, 2016

Choose a reason for hiding this comment

Uh oh!

yenchenlin commented May 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yenchenlin commented May 19, 2016

Uh oh!

jnothman commented May 23, 2016

Uh oh!

MechCoder May 23, 2016

Choose a reason for hiding this comment

Uh oh!

MechCoder commented May 23, 2016

Uh oh!

MechCoder commented May 25, 2016

Uh oh!

jnothman commented May 25, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yenchenlin commented May 16, 2016 •

edited

Loading

yenchenlin commented May 19, 2016 •

edited

Loading