significant speedup for "to_scalar_or_list" #2938

sdementen · 2020-11-27T14:04:44Z

hello,

While investigating a slowness in plotly, I have stumbled upon the to_scalar_or_list function (

plotly.py/packages/python/plotly/_plotly_utils/basevalidators.py

Line 30 in abd8609

def to_scalar_or_list(v):

) that was taking much time.
After some tinkering, I came with the two following changes that vastly improves the performance:

move out of the function the lines 38/39 (

plotly.py/packages/python/plotly/_plotly_utils/basevalidators.py

Line 38 in abd8609

np = get_module("numpy", should_load=False)

) with the get_module as it is slow and run each time the function is called (when handling a list of 10k elements, 10k calls) ==> can this be done once in plotly instead of dynamically in each function ? (I see the get_module is also used in many other places in the package)
move the simplest case (v is a basic type) first as for the case of an iterable of size N, it will first do lot of complex tests for the iterable and then N times also all the complex tests for each items.

So at the end, it looks like

np = get_module("numpy", should_load=False)
pd = get_module("pandas", should_load=False)
# Utility functions
# -----------------
def to_scalar_or_list(v):
    # Handle the case where 'v' is a non-native scalar-like type,
    # such as numpy.float32. Without this case, the object might be
    # considered numpy-convertable and therefore promoted to a
    # 0-dimensional array, but we instead want it converted to a
    # Python native scalar type ('float' in the example above).
    # We explicitly check if is has the 'item' method, which conventionally
    # converts these types to native scalars.

    # check first for the simple case
    if isinstance(v,(int,float,str)):
        return v
    if np and np.isscalar(v) and hasattr(v, "item"):
        return v.item()
    if isinstance(v, (list, tuple)):
        return [to_scalar_or_list(e) for e in v]
    elif np and isinstance(v, np.ndarray):
        if v.ndim == 0:
            return v.item()
        return [to_scalar_or_list(e) for e in v]
    elif pd and isinstance(v, (pd.Series, pd.Index)):
        return [to_scalar_or_list(e) for e in v]
    elif is_numpy_convertable(v):
        return to_scalar_or_list(np.array(v))
    else:
        return v

The text was updated successfully, but these errors were encountered:

nicolaskruchten · 2020-11-27T14:52:25Z

This is pretty interesting, thanks! Can you provide a simple benchmark shows these improvements? For the given figure you're looking at, is the speedup on the scale of 50% or...?

sdementen · 2020-11-27T15:13:13Z

I was experimenting with https://community.plotly.com/t/speeding-up-plotting-large-timeseries-x5/47638 and was sending lists for x,y. There, the difference was very important (dividing by 10 the time taken to do the assignment trace["x"] = x.tolist().
I removed the tolist() so I am not experiencing the issue.
However, both optimizations (a) get_module out of function and b) handling single element first and iterator after) are nevertheless worthwhile.

sdementen · 2020-11-27T15:19:11Z

For benchmarking, passing as argument to the function a list of 10000 float, I have a speedup of 90%.

gvwilson self-assigned this Jun 18, 2024

gvwilson removed their assignment Aug 2, 2024

gvwilson added P3 backlog performance something is slow labels Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

significant speedup for "to_scalar_or_list" #2938

significant speedup for "to_scalar_or_list" #2938

sdementen commented Nov 27, 2020

nicolaskruchten commented Nov 27, 2020

sdementen commented Nov 27, 2020

sdementen commented Nov 27, 2020

significant speedup for "to_scalar_or_list" #2938

significant speedup for "to_scalar_or_list" #2938

Comments

sdementen commented Nov 27, 2020

nicolaskruchten commented Nov 27, 2020

sdementen commented Nov 27, 2020

sdementen commented Nov 27, 2020