Description
1. It produces a first unchanged exp
UPDATE: Addressed in #5600
$ git clone [email protected]:iterative/example-get-started.git
$ cd example-get-started
$ dvc pull
$ dvc exp run
This works even when there are no changes to the committed project version (HEAD
). Below we can see there are differences in metrics or params:
$ dvc exp show --no-pager
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━
┃ Experiment ┃ Created ┃ avg_prec ┃ roc_auc ┃ prepare.split ┃ prepare.seed ┃ featurize.max_features ┃ featurize.ngrams ┃ train.seed ┃ train.n_est ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━
│ workspace │ - │ 0.60405 │ 0.9608 │ 0.2 │ 20170428 │ 3000 │ 2 │ 20170428 │ 100 │
│ master │ Mar 01, 2021 │ 0.60405 │ 0.9608 │ 0.2 │ 20170428 │ 3000 │ 2 │ 20170428 │ 100 │
│ └── exp-44136 │ 02:22 AM │ 0.60405 │ 0.9608 │ 0.2 │ 20170428 │ 3000 │ 2 │ 20170428 │ 100 │
└───────────────┴──────────────┴──────────┴─────────┴───────────────┴──────────────┴────────────────────────┴──────────────────┴────────────┴─────────────┴─
exp diff
doesn't print anything.
Is there a use case for this? Otherwise I'd vote to block it.
2. It checks out data dependencies (destructive)
Continuing the previous CLI block
...
$ truncate --size=20M data/data.xml # This is a stage dep
$ dvc exp run
33% Checkout|███████████ ... # I can see this flash momentarily
...
ERROR: Reproduced experiment conflicts with existing experiment 'exp-44136'. To overwrite the existing experiment run:
Even when I changed a dependency, which could be the basis for my experiment, all the pipeline data was checked out again (undoing my manual changes), so this exp
is the same as the previous one (1).
BTW if this was the first time I exp run
, it would be easy to miss that the data I changed was restored silently in the process. I'd just see that the exp results are the same as in HEAD
which would be misleading.
2.2 Does it really behave exactly like repro
?
https://dvc.org/doc/command-reference/exp/run reads:
dvc exp run is equivalent to dvc repro for experiments. It has the same behavior when it comes to targets and stage execution
We also say:
Before using this command, you'll probably want to make modifications such as data and code updates...
But when I change an add
ed data file and try exp run
, it undoes those changes and reports that no stage has changed. repro
re-adds it instead, and then runs any stage downstream.
3. Not all changes to "code" can be queued
Extracted to #5801
In the docs we say "Before running an experiment, you'll probably want to make modifications such as data and code updates, or hyperparameter tuning." Is this the intended behavior?
Because, if we see dvc.yaml as code — and I think we do as we one of DVC's principles is pipeline codification — then this statement isn't completely true when it comes to queueing experiments. It works with regular experiments though, which use the workspace files (not a tmp copy), which makes me think this may be unintended (as we want all kinds of experiments to be consistent in behavior AFAIK).
Specifically, if you create (or modify) dvc.yaml between queued runs and then try to --run-all
, you get errors. Example:
$ git init; dvc init
$ git add --all; git commit -m "`dvc -V`"
$ dvc stage add -n hi -o hi "echo hey > hi"
Creating 'dvc.yaml'
Adding stage 'hi' in 'dvc.yaml'
...
$ dvc exp run
... # works
$ dvc exp run --queue
Queued experiment '16c7340' for future execution.
$ dvc stage add -fn hi -o hello "echo hi > hello"
Modifying stage 'hi' in 'dvc.yaml'
...
$ dvc exp run --queue
Queued experiment '41791e2' for future execution.
$ dvc exp run --run-all
!ERROR: 'dvc.yaml' does not exist
ERROR: 'dvc.yaml' does not exist
ERROR: Failed to reproduce experiment '41791e2'
ERROR: Failed to reproduce experiment '16c7340'