Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented May 24, 2025

📄 14% (0.14x) speedup for _facet_grid_color_numerical in plotly/figure_factory/_facet_grid.py

⏱️ Runtime : 1.29 seconds 1.13 seconds (best of 8 runs)

📝 Explanation and details

Here’s an optimized version of your provided code, focused on.

  • Avoiding repeated DataFrame lookups and groupbys.
  • More efficient label lookups.
  • Reducing dict constructions per call (especially for marker, trace dicts, and annotation dicts).
  • Avoiding unnecessary .tolist() conversions and DataFrame creation.
  • Inlining fast paths and minimizing branching and overheads in hot loops.
  • Preallocating where possible.

Details.

  • Avoid calculating df[color_name] repeatedly inside loops, cache instead.
  • Use local variables for attribute accesses.
  • Use get for dict-style label lookup to avoid possible KeyErrors.
  • make_subplots isn’t optimized (it forwards to plotly), kept as-is for compatibility.
  • For groupby heavy code, use df.groupby(fields, sort=False) to avoid unnecessary sorting, and cache df[color_name].values.
  • Use .get() for dict label lookup to safely fallback.
  • Condense dict constructions outside hotspot functions (like marker_dict), reusing objects.
  • Inline variable paths and branch reductions in _annotation_dict.
  • When creating empty DataFrames, cache them rather than repeating.

Below is the rewritten, faster version.


Summary of key changes.

  • Label lookup is O(1) and safely falls back if missing.
  • Only one empty DataFrame for missing facets is ever constructed.
  • df[].values used for all colors/data for efficient numpy array access.
  • Markers and trace dicts are built once per loop using cached color arrays.
  • Removed .tolist() based empty checks; replaced with isnull().all(axis=None) on cached empty dataframe.
  • Minimized repeated dict constructions by reusing colorbar_dict, marker dict template, etc.
  • Branching in _annotation_dict is flattened for speed.

If your DataFrames are large, these changes will effectively reduce memory allocation and improve runtime especially for the hot-path (_facet_grid_color_numerical).

Let me know if you would like further micro-optimization or vectorization based on the nature of your DataFrames or inputs!

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 34 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests Details
import pandas as pd
# imports
import pytest  # used for our unit tests
from plotly.figure_factory._facet_grid import _facet_grid_color_numerical

# function to test and helpers are defined above


# Basic Test Cases
# ----------------

def test_no_faceting_basic_scatter():
    """Test with no facet_row or facet_col, basic scatter, color numerical."""
    df = pd.DataFrame({
        "x": [1, 2, 3],
        "y": [4, 5, 6],
        "color": [10, 20, 30]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    # The trace should have correct x, y, and marker color
    trace = fig.data[0]

def test_facet_row_scatter():
    """Test with faceting on rows only."""
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [5, 6, 7, 8],
        "color": [10, 20, 30, 40],
        "row": ["A", "A", "B", "B"]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=2,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.1,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    # Each trace should have correct x values
    xs = [list(trace.x) for trace in fig.data]
    # Annotation text should include "A" and "B"
    texts = [ann["text"] for ann in annotations]

def test_facet_col_scatter():
    """Test with faceting on columns only."""
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [5, 6, 7, 8],
        "color": [10, 20, 30, 40],
        "col": ["C", "D", "C", "D"]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.1,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    xs = [list(trace.x) for trace in fig.data]
    texts = [ann["text"] for ann in annotations]

def test_facet_row_col_scatter():
    """Test with faceting on both rows and columns."""
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "color": [100, 200, 300, 400],
        "row": ["R1", "R1", "R2", "R2"],
        "col": ["C1", "C2", "C1", "C2"]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    texts = [ann["text"] for ann in annotations]

def test_trace_type_scattergl():
    """Test with trace_type scattergl instead of scatter."""
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [3, 4],
        "color": [5, 6],
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scattergl",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )

def test_custom_facet_labels():
    """Test with custom facet labels as dict."""
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [3, 4],
        "color": [5, 6],
        "row": ["foo", "bar"]
    })
    facet_row_labels = {"foo": "Alpha", "bar": "Beta"}
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=2,
        num_of_cols=1,
        facet_row_labels=facet_row_labels,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    # There should be two annotations, with custom label text
    texts = [ann["text"] for ann in annotations]

# Edge Test Cases
# ---------------

def test_empty_dataframe():
    """Test with empty dataframe."""
    df = pd.DataFrame(columns=["x", "y", "color"])
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    trace = fig.data[0]

def test_missing_facet_group():
    """Test with missing facet group (row/col combination does not exist)."""
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [3, 4],
        "color": [5, 6],
        "row": ["A", "B"],
        "col": ["C", "C"]
    })
    # Only (A,C) and (B,C) exist, but grid is 2x2
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    # Two traces should have data, two should be empty
    nonempty = [t for t in fig.data if len(t.x) > 0]
    empty = [t for t in fig.data if len(t.x) == 0]

def test_nan_in_color_column():
    """Test with NaN values in the color column."""
    df = pd.DataFrame({
        "x": [1, 2, 3],
        "y": [4, 5, 6],
        "color": [10, None, 30]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    # The color array should contain a None/NaN value
    colors = list(fig.data[0].marker.color)

def test_no_x_or_y_column():
    """Test with x=None or y=None."""
    df = pd.DataFrame({
        "x": [1, 2, 3],
        "y": [4, 5, 6],
        "color": [10, 20, 30]
    })
    # No x
    fig, _ = _facet_grid_color_numerical(
        df=df,
        x=None,
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    # No y
    fig, _ = _facet_grid_color_numerical(
        df=df,
        x="x",
        y=None,
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )

def test_flipped_rows_and_cols():
    """Test with flipped_rows and flipped_cols True."""
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [3, 4],
        "color": [5, 6],
        "row": ["foo", "bar"],
        "col": ["baz", "qux"]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=True,
        flipped_cols=True,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )

def test_custom_kwargs_marker_and_trace():
    """Test with custom kwargs_marker and kwargs_trace."""
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [3, 4],
        "color": [5, 6]
    })
    fig, _ = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={"name": "TestTrace"},
        kwargs_marker={"size": 10}
    )

# Large Scale Test Cases
# ---------------------

def test_large_number_of_rows():
    """Test with a large number of rows (facets)."""
    n = 50
    df = pd.DataFrame({
        "x": list(range(n)),
        "y": list(range(n)),
        "color": list(range(n)),
        "row": [f"R{i}" for i in range(n)]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=n,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.01,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )

def test_large_number_of_cols():
    """Test with a large number of columns (facets)."""
    n = 40
    df = pd.DataFrame({
        "x": list(range(n)),
        "y": list(range(n)),
        "color": list(range(n)),
        "col": [f"C{i}" for i in range(n)]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=n,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.01,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )

def test_large_grid():
    """Test with a large grid of row and col facets."""
    n = 10
    df = pd.DataFrame({
        "x": list(range(n*n)),
        "y": list(range(n*n)),
        "color": list(range(n*n)),
        "row": [f"R{i}" for i in range(n) for _ in range(n)],
        "col": [f"C{j}" for _ in range(n) for j in range(n)]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=n,
        num_of_cols=n,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.005,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )

def test_large_data_per_facet():
    """Test with large data per facet (1000 points, 4 facets)."""
    n = 1000
    df = pd.DataFrame({
        "x": list(range(n))*4,
        "y": list(range(n))*4,
        "color": list(range(n))*4,
        "row": ["A"]*n + ["B"]*n + ["A"]*n + ["B"]*n,
        "col": ["C"]*n + ["C"]*n + ["D"]*n + ["D"]*n
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.02,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    for trace in fig.data:
        pass

def test_large_unique_color_values():
    """Test with many unique color values."""
    n = 500
    df = pd.DataFrame({
        "x": list(range(n)),
        "y": list(range(n)),
        "color": list(range(n)),
    })
    fig, _ = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.01,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import random

# We'll use pandas for DataFrame construction in tests
import pandas as pd
import plotly.graph_objs as go
# imports
import pytest
from plotly.figure_factory._facet_grid import _facet_grid_color_numerical

# function to test and its dependencies are assumed to be defined above

# ------------------------
# Basic Test Cases
# ------------------------

def test_single_trace_no_faceting():
    # Test with no faceting, just a colored scatter
    df = pd.DataFrame({
        "x": [1, 2, 3],
        "y": [4, 5, 6],
        "color": [10, 20, 30]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    # The trace should have correct x, y, and marker color
    trace = fig.data[0]

def test_facet_by_row():
    # Test faceting by row
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "color": [100, 200, 300, 400],
        "row": ["A", "A", "B", "B"]
    })
    facet_row_labels = {"A": "Alpha", "B": "Beta"}
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=2,
        num_of_cols=1,
        facet_row_labels=facet_row_labels,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=True,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    # Check that each trace's x, y correspond to the correct group
    xs = [list(trace.x) for trace in fig.data]
    ys = [list(trace.y) for trace in fig.data]
    texts = [a["text"] for a in annotations]

def test_facet_by_col():
    # Test faceting by column
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "color": [100, 200, 300, 400],
        "col": ["C1", "C2", "C1", "C2"]
    })
    facet_col_labels = {"C1": "Col1", "C2": "Col2"}
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=facet_col_labels,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=True,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    xs = [list(trace.x) for trace in fig.data]
    texts = [a["text"] for a in annotations]

def test_facet_by_row_and_col():
    # Test faceting by both row and column
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "color": [100, 200, 300, 400],
        "row": ["R1", "R1", "R2", "R2"],
        "col": ["C1", "C2", "C1", "C2"]
    })
    facet_row_labels = {"R1": "Row1", "R2": "Row2"}
    facet_col_labels = {"C1": "Col1", "C2": "Col2"}
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=facet_row_labels,
        facet_col_labels=facet_col_labels,
        trace_type="scatter",
        flipped_rows=True,
        flipped_cols=True,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    # Each trace should have one point
    for trace in fig.data:
        pass
    texts = [a["text"] for a in annotations]

def test_marker_and_trace_kwargs_are_applied():
    # Test that marker and trace kwargs are passed through
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [3, 4],
        "color": [5, 6]
    })
    fig, _ = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={"name": "mytrace"},
        kwargs_marker={"size": 10, "opacity": 0.5},
    )
    trace = fig.data[0]

# ------------------------
# Edge Test Cases
# ------------------------

def test_empty_dataframe():
    # Test with an empty DataFrame
    df = pd.DataFrame(columns=["x", "y", "color"])
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    trace = fig.data[0]

def test_missing_facet_value():
    # Test with missing facet combinations
    df = pd.DataFrame({
        "x": [1, 2, 3],
        "y": [10, 20, 30],
        "color": [100, 200, 300],
        "row": ["A", "A", "B"],
        "col": ["C1", "C2", "C1"]
    })
    facet_row_labels = {"A": "Alpha", "B": "Beta"}
    facet_col_labels = {"C1": "Col1", "C2": "Col2"}
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=facet_row_labels,
        facet_col_labels=facet_col_labels,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    # The missing cell should have an empty trace
    # Find the trace with empty x/y
    empty_traces = [t for t in fig.data if len(t.x) == 0 and len(t.y) == 0]

def test_nan_in_color_column():
    # Test with NaN values in color column
    df = pd.DataFrame({
        "x": [1, 2, 3],
        "y": [4, 5, 6],
        "color": [10, float('nan'), 30]
    })
    fig, _ = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    trace = fig.data[0]

def test_non_numeric_color_column():
    # Test with non-numeric color column (should raise or handle gracefully)
    df = pd.DataFrame({
        "x": [1, 2, 3],
        "y": [4, 5, 6],
        "color": ["red", "green", "blue"]
    })
    # Should not raise, but marker.color will be non-numeric
    fig, _ = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    trace = fig.data[0]

def test_single_row_single_col_facet():
    # Test with only one unique value in facet_row and facet_col
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [3, 4],
        "color": [5, 6],
        "row": ["A", "A"],
        "col": ["B", "B"]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels={"A": "Alpha"},
        facet_col_labels={"B": "Bravo"},
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    texts = [a["text"] for a in annotations]

def test_no_x_or_y_column():
    # Test with None for x or y
    df = pd.DataFrame({
        "a": [1, 2, 3],
        "b": [4, 5, 6],
        "color": [7, 8, 9]
    })
    # No x
    fig, _ = _facet_grid_color_numerical(
        df=df,
        x=None,
        y="b",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    trace = fig.data[0]
    # No y
    fig, _ = _facet_grid_color_numerical(
        df=df,
        x="a",
        y=None,
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    trace = fig.data[0]

# ------------------------
# Large Scale Test Cases
# ------------------------

def test_large_number_of_points():
    # Test with a large number of data points
    N = 900
    df = pd.DataFrame({
        "x": list(range(N)),
        "y": list(range(N)),
        "color": list(range(N))
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    trace = fig.data[0]

def test_large_number_of_facets():
    # Test with a large number of facets (rows and cols)
    nrows = 10
    ncols = 10
    N = nrows * ncols
    df = pd.DataFrame({
        "x": list(range(N)),
        "y": list(range(N)),
        "color": list(range(N)),
        "row": [f"R{i}" for i in range(nrows) for _ in range(ncols)],
        "col": [f"C{j}" for _ in range(nrows) for j in range(ncols)],
    })
    facet_row_labels = {f"R{i}": f"Row{i}" for i in range(nrows)}
    facet_col_labels = {f"C{j}": f"Col{j}" for j in range(ncols)}
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=nrows,
        num_of_cols=ncols,
        facet_row_labels=facet_row_labels,
        facet_col_labels=facet_col_labels,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.01,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    # Check that all annotation texts are present
    texts = [a["text"] for a in annotations]
    for i in range(nrows):
        pass
    for j in range(ncols):
        pass

def test_large_facet_with_missing_cells():
    # Test with large number of facets but with some missing cells
    nrows = 8
    ncols = 8
    N = nrows * ncols
    # Remove some cells, e.g. only fill diagonal
    data = []
    for i in range(nrows):
        for j in range(ncols):
            if i == j:
                data.append({"x": i, "y": j, "color": i+j, "row": f"R{i}", "col": f"C{j}"})
    df = pd.DataFrame(data)
    facet_row_labels = {f"R{i}": f"Row{i}" for i in range(nrows)}
    facet_col_labels = {f"C{j}": f"Col{j}" for j in range(ncols)}
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=nrows,
        num_of_cols=ncols,
        facet_row_labels=facet_row_labels,
        facet_col_labels=facet_col_labels,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.01,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    nonempty = [t for t in fig.data if len(t.x) > 0]
    empty = [t for t in fig.data if len(t.x) == 0]

def test_performance_large_dataframe(monkeypatch):
    # This test ensures the function does not take too long for large data
    N = 999
    df = pd.DataFrame({
        "x": list(range(N)),
        "y": list(range(N)),
        "color": list(range(N)),
        "row": [f"R{i%10}" for i in range(N)],
        "col": [f"C{j%10}" for j in range(N)],
    })
    facet_row_labels = {f"R{i}": f"Row{i}" for i in range(10)}
    facet_col_labels = {f"C{j}": f"Col{j}" for j in range(10)}

    import time
    t0 = time.time()
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=10,
        num_of_cols=10,
        facet_row_labels=facet_row_labels,
        facet_col_labels=facet_col_labels,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.01,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    t1 = time.time()
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_facet_grid_color_numerical-mb2f1y3j and push.

Codeflash

Here’s an optimized version of your provided code, focused on.
- **Avoiding repeated DataFrame lookups and groupbys.**
- **More efficient label lookups.**
- **Reducing dict constructions per call (especially for marker, trace dicts, and annotation dicts).**
- **Avoiding unnecessary .tolist() conversions and DataFrame creation.**
- **Inlining fast paths and minimizing branching and overheads in hot loops.**
- **Preallocating where possible.**

### Details.
- Avoid calculating `df[color_name]` repeatedly inside loops, cache instead.
- Use local variables for attribute accesses.
- Use `get` for dict-style label lookup to avoid possible KeyErrors.
- `make_subplots` isn’t optimized (it forwards to plotly), kept as-is for compatibility.
- For groupby heavy code, use `df.groupby(fields, sort=False)` to avoid unnecessary sorting, and cache `df[color_name].values`.
- Use `.get()` for dict label lookup to safely fallback.
- Condense dict constructions outside hotspot functions (like `marker_dict`), reusing objects.
- Inline variable paths and branch reductions in `_annotation_dict`.
- When creating empty DataFrames, cache them rather than repeating.

Below is the rewritten, faster version.



---

### Summary of key changes.
- **Label lookup is O(1) and safely falls back if missing.**
- **Only one empty DataFrame for missing facets is ever constructed.**
- **df[<col>].values used for all colors/data for efficient numpy array access.**
- **Markers and trace dicts are built once per loop using cached color arrays.**
- **Removed .tolist() based empty checks; replaced with isnull().all(axis=None) on cached empty dataframe.**
- **Minimized repeated dict constructions by reusing colorbar_dict, marker dict template, etc.**
- **Branching in `_annotation_dict` is flattened for speed.**

If your DataFrames are large, these changes will effectively reduce memory allocation and improve runtime especially for the hot-path (_facet_grid_color_numerical).

Let me know if you would like further micro-optimization or vectorization based on the nature of your DataFrames or inputs!
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label May 24, 2025
@codeflash-ai codeflash-ai bot requested a review from misrasaurabh1 May 24, 2025 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants