fix: issue of running automl in SVM and Decision Tree #429

HaibinLai · 2025-03-26T20:31:07Z

The issue of running automl in SVM and Decision Tree lies on two different directions.

HaibinLai · 2025-03-27T05:18:31Z

Decision Tree Wrong Case

When running automl for Decision Tree, geochempi went out an error on self.criterion for ValueError: Some value(s) of y are negative which is not allowed for Poisson regression.

Running logs:

The observation log shows that the error occurs when a negative value appears in feature Y.

The existing test in Regression.xlsx does not have negative values. We will use Feature Engineering to construct a negative value feature for testing:

The error in the Decision Tree is that there are negative values in the y_train we are trying to predict. When the "poisson" criterion is selected in the decision tree algorithm, it does not allow negative values in y, which causes the error. We can either pre-check the values of y in the FLAML options or remove the parameter selection to resolve this bug.

SVM Failure case

Initial Guess: issue of amount of data

![[Pasted image 20250216190651.png]]
We first conduct an experiment to test if the given data is too big for SVM.

%% samples:
110

features:
40 %%

However, this issue still occurs with smaller datasets.

-> Inference: The problem is not related to the size of the data.

-> Speculation:

A bug in AutoML (low probability).
The SVM algorithm, when certain specific parameters are selected, may cause the computation/optimization problem to become unsolvable.

Attempting to test speculation 2: Experiment:

Manually tune the parameters to identify a situation where:

The SVM algorithm struggles to solve the optimization problem.
Specifically, check for parameter settings that might lead to computational difficulties or unsolvable conditions, such as very small or large values for certain hyperparameters.

Using Parameters:

{
    "kernel": "linear",
    "C": 1.0,
    "shrinking": true,
    "degree": 3,
    "gamma": "scale"
}

if gamma='scale' (default) is passed then it uses

It takes about half an hour to run out the result

Linear Kernel：

$$K(x, y) = x^T y$$

rbf kernel can do search in 1s

{
    "kernel": "rbf",
    "C": 1.0,
    "shrinking": true,
    "degree": 3,
    "gamma": 0.1
}

Gaussian RBF Kernel）**：

$$ K(x, y) = \exp\left(-\frac{|x - y|^2}{2\sigma^2}\right)$$

poly kernel is slow:

{
    "kernel": "poly",
    "C": 1.0,
    "shrinking": true,
    "degree": 3,
    "gamma": 0.1
}

Polynomial Kernel：

$$ K(x, y) = (x^T y + c)^d$$

Sigmod is fast：

{
    "kernel": "sigmoid",
    "C": 1.0,
    "shrinking": true,
    "degree": 3,
    "gamma": 0.1
}

Sigmoid Kernel：

$$ K(x, y) = \tanh(\alpha x^T y + c)$$

fix issue of running automl in SVM and Decision Tree

af3da43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: issue of running automl in SVM and Decision Tree #429

fix: issue of running automl in SVM and Decision Tree #429

Uh oh!

HaibinLai commented Mar 26, 2025

Uh oh!

HaibinLai commented Mar 27, 2025

Uh oh!

Uh oh!

fix: issue of running automl in SVM and Decision Tree #429

Are you sure you want to change the base?

fix: issue of running automl in SVM and Decision Tree #429

Uh oh!

Conversation

HaibinLai commented Mar 26, 2025

Uh oh!

HaibinLai commented Mar 27, 2025

Decision Tree Wrong Case

SVM Failure case

Initial Guess: issue of amount of data

Uh oh!

Uh oh!