Skip to content

fix: issue of running automl in SVM and Decision Tree #429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

HaibinLai
Copy link
Contributor

The issue of running automl in SVM and Decision Tree lies on two different directions.

@HaibinLai
Copy link
Contributor Author

image

Decision Tree Wrong Case

When running automl for Decision Tree, geochempi went out an error on self.criterion for ValueError: Some value(s) of y are negative which is not allowed for Poisson regression.

Running logs:
image

image

The observation log shows that the error occurs when a negative value appears in feature Y.

The existing test in Regression.xlsx does not have negative values. We will use Feature Engineering to construct a negative value feature for testing:

The error in the Decision Tree is that there are negative values in the y_train we are trying to predict. When the "poisson" criterion is selected in the decision tree algorithm, it does not allow negative values in y, which causes the error. We can either pre-check the values of y in the FLAML options or remove the parameter selection to resolve this bug.

image

image

SVM Failure case

image

Initial Guess: issue of amount of data

![[Pasted image 20250216190651.png]]
We first conduct an experiment to test if the given data is too big for SVM.

%% samples:
110

features:
40 %%

However, this issue still occurs with smaller datasets.

-> Inference: The problem is not related to the size of the data.

-> Speculation:

  1. A bug in AutoML (low probability).
  2. The SVM algorithm, when certain specific parameters are selected, may cause the computation/optimization problem to become unsolvable.

Attempting to test speculation 2: Experiment:

Manually tune the parameters to identify a situation where:

  • The SVM algorithm struggles to solve the optimization problem.
  • Specifically, check for parameter settings that might lead to computational difficulties or unsolvable conditions, such as very small or large values for certain hyperparameters.

image

Using Parameters:

{
    "kernel": "linear",
    "C": 1.0,
    "shrinking": true,
    "degree": 3,
    "gamma": "scale"
}

if gamma='scale' (default) is passed then it uses

It takes about half an hour to run out the result

image

  1. Linear Kernel

    $$K(x, y) = x^T y$$

rbf kernel can do search in 1s
image

{
    "kernel": "rbf",
    "C": 1.0,
    "shrinking": true,
    "degree": 3,
    "gamma": 0.1
}
  1. Gaussian RBF Kernel)**:

$$ K(x, y) = \exp\left(-\frac{|x - y|^2}{2\sigma^2}\right)$$

poly kernel is slow:

image

{
    "kernel": "poly",
    "C": 1.0,
    "shrinking": true,
    "degree": 3,
    "gamma": 0.1
}
  1. Polynomial Kernel

$$ K(x, y) = (x^T y + c)^d$$

Sigmod is fast:

image

{
    "kernel": "sigmoid",
    "C": 1.0,
    "shrinking": true,
    "degree": 3,
    "gamma": 0.1
}
  1. Sigmoid Kernel

$$ K(x, y) = \tanh(\alpha x^T y + c)$$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant