Skip to content

significant speedup for "to_scalar_or_list" #2938

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sdementen opened this issue Nov 27, 2020 · 3 comments
Open

significant speedup for "to_scalar_or_list" #2938

sdementen opened this issue Nov 27, 2020 · 3 comments
Labels
P3 backlog performance something is slow

Comments

@sdementen
Copy link

hello,

While investigating a slowness in plotly, I have stumbled upon the to_scalar_or_list function (

) that was taking much time.
After some tinkering, I came with the two following changes that vastly improves the performance:

  • move out of the function the lines 38/39 (

    np = get_module("numpy", should_load=False)
    ) with the get_module as it is slow and run each time the function is called (when handling a list of 10k elements, 10k calls) ==> can this be done once in plotly instead of dynamically in each function ? (I see the get_module is also used in many other places in the package)

  • move the simplest case (v is a basic type) first as for the case of an iterable of size N, it will first do lot of complex tests for the iterable and then N times also all the complex tests for each items.

So at the end, it looks like

np = get_module("numpy", should_load=False)
pd = get_module("pandas", should_load=False)
# Utility functions
# -----------------
def to_scalar_or_list(v):
    # Handle the case where 'v' is a non-native scalar-like type,
    # such as numpy.float32. Without this case, the object might be
    # considered numpy-convertable and therefore promoted to a
    # 0-dimensional array, but we instead want it converted to a
    # Python native scalar type ('float' in the example above).
    # We explicitly check if is has the 'item' method, which conventionally
    # converts these types to native scalars.

    # check first for the simple case
    if isinstance(v,(int,float,str)):
        return v
    if np and np.isscalar(v) and hasattr(v, "item"):
        return v.item()
    if isinstance(v, (list, tuple)):
        return [to_scalar_or_list(e) for e in v]
    elif np and isinstance(v, np.ndarray):
        if v.ndim == 0:
            return v.item()
        return [to_scalar_or_list(e) for e in v]
    elif pd and isinstance(v, (pd.Series, pd.Index)):
        return [to_scalar_or_list(e) for e in v]
    elif is_numpy_convertable(v):
        return to_scalar_or_list(np.array(v))
    else:
        return v
@nicolaskruchten
Copy link
Contributor

This is pretty interesting, thanks! Can you provide a simple benchmark shows these improvements? For the given figure you're looking at, is the speedup on the scale of 50% or...?

@sdementen
Copy link
Author

I was experimenting with https://community.plotly.com/t/speeding-up-plotting-large-timeseries-x5/47638 and was sending lists for x,y. There, the difference was very important (dividing by 10 the time taken to do the assignment trace["x"] = x.tolist().
I removed the tolist() so I am not experiencing the issue.
However, both optimizations (a) get_module out of function and b) handling single element first and iterator after) are nevertheless worthwhile.

@sdementen
Copy link
Author

For benchmarking, passing as argument to the function a list of 10000 float, I have a speedup of 90%.

@gvwilson gvwilson self-assigned this Jun 18, 2024
@gvwilson gvwilson removed their assignment Aug 2, 2024
@gvwilson gvwilson added P3 backlog performance something is slow labels Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 backlog performance something is slow
Projects
None yet
Development

No branches or pull requests

3 participants