-
-
Notifications
You must be signed in to change notification settings - Fork 144
[wip] reworking serialization #710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Warning Rate limit exceeded@PGijsbers has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 12 minutes and 24 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughThis update removes the Changes
Sequence Diagram(s)sequenceDiagram
participant Framework as Framework Script
participant Callee as callee.result()
participant Config as Config Namespace
participant File as Result File
Framework->>Callee: result(predictions, ... )
Callee->>Config: Access config.output_predictions_file
Callee->>Callee: Add "output_file" key to result dict if missing
Callee->>File: Dump result dict as JSON
Callee-->>Framework: Return result dict
sequenceDiagram
participant serialize_data
participant np as numpy
participant json as json_dump
serialize_data->>np: np.save(path, data, allow_pickle=False)
np-->>serialize_data: Success or ValueError (if object dtype)
alt object dtype array
serialize_data->>serialize_data: squeeze array
serialize_data->>json: json_dump(data.tolist(), path)
else non-object dtype array
serialize_data->>np: np.save(path, data, allow_pickle=False)
end
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #710 +/- ##
===========================================
- Coverage 69.46% 51.10% -18.36%
===========================================
Files 57 36 -21
Lines 6821 5262 -1559
===========================================
- Hits 4738 2689 -2049
- Misses 2083 2573 +490 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
amlb/utils/serialization.py (1)
219-235:⚠️ Potential issueReference to removed
config.numpy_picklebreaks.npzloading
numpy_allow_picklewas removed, yetdeserialize_data()still uses it:with np.load(path, allow_pickle=config.numpy_pickle) as loaded:This raises
AttributeError.
Given the new policy (“no pickle”), simply drop the arg:-with np.load(path, allow_pickle=config.numpy_pickle) as loaded: - return loaded +with np.load(path, allow_pickle=False) as loaded: + return loadedOptionally expose an explicit
allow_pickleflag inser_configif future flexibility is required.🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 219-269: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 225-234: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
amlb/utils/serialization.py(3 hunks)examples/custom/extensions/GradientBoosting/exec.py(0 hunks)examples/custom/extensions/Stacking/exec.py(0 hunks)frameworks/AutoGluon/exec.py(0 hunks)frameworks/AutoGluon/exec_ts.py(0 hunks)frameworks/FEDOT/exec.py(0 hunks)frameworks/FEDOT/exec_ts.py(0 hunks)frameworks/GAMA/exec.py(0 hunks)frameworks/H2OAutoML/exec.py(0 hunks)frameworks/MLPlan/exec.py(0 hunks)frameworks/NaiveAutoML/exec.py(0 hunks)frameworks/RandomForest/exec.py(0 hunks)frameworks/SapientML/exec.py(0 hunks)frameworks/TPOT/exec.py(0 hunks)frameworks/TunedRandomForest/exec.py(0 hunks)frameworks/autosklearn/exec.py(0 hunks)frameworks/flaml/exec.py(0 hunks)frameworks/hyperoptsklearn/exec.py(0 hunks)frameworks/lightautoml/exec.py(0 hunks)frameworks/mljarsupervised/exec.py(0 hunks)frameworks/oboe/exec.py(0 hunks)frameworks/shared/callee.py(1 hunks)
💤 Files with no reviewable changes (20)
- examples/custom/extensions/Stacking/exec.py
- frameworks/GAMA/exec.py
- frameworks/lightautoml/exec.py
- frameworks/oboe/exec.py
- frameworks/RandomForest/exec.py
- frameworks/mljarsupervised/exec.py
- frameworks/TunedRandomForest/exec.py
- frameworks/SapientML/exec.py
- frameworks/MLPlan/exec.py
- frameworks/TPOT/exec.py
- frameworks/hyperoptsklearn/exec.py
- frameworks/autosklearn/exec.py
- frameworks/flaml/exec.py
- frameworks/FEDOT/exec_ts.py
- frameworks/AutoGluon/exec_ts.py
- frameworks/H2OAutoML/exec.py
- frameworks/AutoGluon/exec.py
- examples/custom/extensions/GradientBoosting/exec.py
- frameworks/NaiveAutoML/exec.py
- frameworks/FEDOT/exec.py
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: randomforest/iris
- GitHub Check: randomforest/cholesterol
- GitHub Check: randomforest/kc2
- GitHub Check: pytest
- GitHub Check: run_benchmark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/workflows/run_all_frameworks.yml (1)
167-172: Remove temporary "Check Things" debug step.This step listing venv directories is useful only during troubleshooting and clutters CI logs. Consider removing it or gating it behind a
debugflag once venv setup issues are resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.github/workflows/run_all_frameworks.yml (1)
167-173: Eliminate Uselessechoand Quote Command Substitution
In the diagnostic step,echo $(pwd)is redundant—pwdalone suffices—and the unquoted command substitution could trigger word splitting. Replace:echo $(pwd)with:
pwdThis removes the unnecessary
echoand avoids potential splitting issues.🧰 Tools
🪛 actionlint (1.7.7)
168-168: shellcheck reported issue in this script: SC2046:warning:4:6: Quote this to prevent word splitting
(shellcheck)
168-168: shellcheck reported issue in this script: SC2005:style:4:6: Useless echo? Instead of 'echo $(cmd)', just use 'cmd'
(shellcheck)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/run_all_frameworks.yml(1 hunks)frameworks/shared/setup.sh(1 hunks)
🧰 Additional context used
🪛 actionlint (1.7.7)
.github/workflows/run_all_frameworks.yml
168-168: shellcheck reported issue in this script: SC2046:warning:4:6: Quote this to prevent word splitting
(shellcheck)
168-168: shellcheck reported issue in this script: SC2005:style:4:6: Useless echo? Instead of 'echo $(cmd)', just use 'cmd'
(shellcheck)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: randomforest/kc2
- GitHub Check: randomforest/cholesterol
- GitHub Check: randomforest/iris
frameworks/shared/setup.sh
Outdated
| REQ_FILE="$SHARED_DIR/requirements.txt" | ||
|
|
||
| for line in $(grep -vE '^\s*#' "$REQ_FILE" | grep -vE '^\s*$'); do | ||
| pkg=$(echo "$line" | sed -E 's/[=><~!].*$//') | ||
| # In a line like "numpy==1.12.0" then pkg=numpy and line is the whole line | ||
|
|
||
| if ! PY -c "import $pkg" &> /dev/null; then | ||
| echo "$pkg not found. Installing from requirements.txt..." | ||
| PIP install --no-cache-dir "$line" | ||
| else | ||
| echo "$pkg is already installed by the framework, using that instead." | ||
| fi | ||
| done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid word-splitting in the requirements loop
The for line in $(grep ...) construct will split on whitespace and may mis-handle requirements with spaces or special characters. Switch to a while IFS= read -r line pattern to preserve each line intact and eliminate the extra subshell:
- for line in $(grep -vE '^\s*#' "$REQ_FILE" | grep -vE '^\s*$'); do
+ while IFS= read -r line; do
pkg=${line%%[=><~!]*}
if ! PY -c "import $pkg" &> /dev/null; then
echo "$pkg not found. Installing from requirements.txt..."
PIP install --no-cache-dir "$line"
else
echo "$pkg is already installed by the framework, using that instead."
fi
- done
+ done < "$REQ_FILE"This approach avoids splitting issues, reduces forking (sed can be replaced with parameter expansion), and is more robust.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In frameworks/shared/setup.sh around lines 50 to 62, the loop uses `for line in
$(grep ...)` which causes word-splitting and can mishandle requirements with
spaces or special characters. Replace this loop with a `while IFS= read -r line`
construct to read each line intact without splitting. Also, remove the external
`sed` command by using shell parameter expansion to extract the package name
from each line. This will make the script more robust and efficient.
Ideal scenario there are only two formats left in the end:
but might need to support sparse data through scipy.
In any case I hope to avoid situations where pickle is used which leaks dependencies between libraries used by the benchmark software and the integration scripts.
Summary by CodeRabbit
Refactor
Chores
Tests
CI