Skip to content

Misleading error message when an invalid y column name is provided to px.line #4040

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
joshHug opened this issue Jan 26, 2023 · 4 comments
Labels
bug something broken P3 backlog

Comments

@joshHug
Copy link

joshHug commented Jan 26, 2023

When attempting to plot multiple y values using px.line (or similar), erroneous column names lead to a misleading error message that states that the length of the arguments is incorrect, when in fact the problem is that there the column name is wrong.

Simple to reproduce:

import plotly.express as px
import pandas as pd
import numpy as np
x = np.linspace(0, 10, 100)
data1 = x**2
data2 = x**3
df = pd.DataFrame({"x": x, "data1": data1, "data2": data2})
px.line(df, x = "x", y = ["data1", "data2"]) # works fine
px.line(df, x = "x", y = ["data1", "Data2"]) 

# this last line gives a misleading error message: 
# ValueError: All arguments should have the same length. The length of argument `y` is 2, 
# whereas the length of  previously-processed arguments ['x'] is 100
@nicolaskruchten
Copy link
Contributor

Yes, we should definitely find a way to give you a better hint there that there's a typo in the contents of y. It's a bit tricky to do and we're love some help to figure out how to do it! The relevant code is near here: https://github.com/plotly/plotly.py/blob/master/packages/python/plotly/plotly/express/_core.py#L1247

@nicolaskruchten
Copy link
Contributor

The reason it's tricky is that we land deep in that if block based on some determination that happened way ahead of time here https://github.com/plotly/plotly.py/blob/master/packages/python/plotly/plotly/express/_core.py#L1315 ... basically up front PX decides if it's in "wide mode" based on whether or not the contents of y are all column names, and then much later if they're not, it checks the length of the data y points to against the length of the data x points to.

@nicolaskruchten
Copy link
Contributor

See also #3359 and #3474 and #2586

@abarpan3
Copy link

abarpan3 commented Jul 14, 2023

Hi, was exploring this issue a bit. So the build_dataframe function has another function is_col_list to check if the supplied argument for x/y is a list or not (hence wide-form) https://github.com/plotly/plotly.py/blob/master/packages/python/plotly/plotly/express/_core.py#L1339

Now this is_col_list function has a check inside it to see if a column is present in the dataframe, and if not it returns wide_mode as false.
https://github.com/plotly/plotly.py/blob/master/packages/python/plotly/plotly/express/_core.py#L1036

Here, in the is_col_list functionwe can catch the erroneous column name and give the user a more helpful error message. So I tried to make 2 changes,

  1. Line 1035 : Checking if y value is of pd.Series type and returning wide_mode as false, if the condition is fulfilled. This takes care of any direct array inputs in x/y arguments like below example.
    px.scatter(df1,x = df1.sales, y = df2.market)

  2. **Line 1042 - 1048 ** : Checking additionally if col_name supplied in list is present in df. And returning a more helpful error message. Attached image shows both changes

image

Could anyone kindly provide any feedback if this seems like the right approach?

@gvwilson gvwilson self-assigned this Jul 11, 2024
@gvwilson gvwilson removed their assignment Aug 2, 2024
@gvwilson gvwilson added the P3 backlog label Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something broken P3 backlog
Projects
None yet
Development

No branches or pull requests

5 participants