exp run: checks out data dependencies (destructive) [QA]

### ~~1. It produces a first unchanged exp~~
UPDATE: Addressed in #5600

```console
$ git clone git@github.com:iterative/example-get-started.git
$ cd example-get-started
$ dvc pull

$ dvc exp run
```

This works even when there are no changes to the committed project version (`HEAD`). Below we can see there are differences in metrics or params:

```
$  dvc exp show --no-pager
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━
┃ Experiment    ┃ Created      ┃ avg_prec ┃ roc_auc ┃ prepare.split ┃ prepare.seed ┃ featurize.max_features ┃ featurize.ngrams ┃ train.seed ┃ train.n_est ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━
│ workspace     │ -            │  0.60405 │  0.9608 │ 0.2           │ 20170428     │ 3000                   │ 2                │ 20170428   │ 100         │
│ master        │ Mar 01, 2021 │  0.60405 │  0.9608 │ 0.2           │ 20170428     │ 3000                   │ 2                │ 20170428   │ 100         │
│ └── exp-44136 │ 02:22 AM     │  0.60405 │  0.9608 │ 0.2           │ 20170428     │ 3000                   │ 2                │ 20170428   │ 100         │
└───────────────┴──────────────┴──────────┴─────────┴───────────────┴──────────────┴────────────────────────┴──────────────────┴────────────┴─────────────┴─
```
> `exp diff` doesn't print anything.

Is there a use case for this? Otherwise I'd vote to block it.

### 2. It checks out data dependencies (destructive)

> Continuing the previous CLI block

```console
...
$ truncate --size=20M data/data.xml  # This is a stage dep
$ dvc exp run
 33% Checkout|███████████ ...  # I can see this flash momentarily
...
ERROR: Reproduced experiment conflicts with existing experiment 'exp-44136'. To overwrite the existing experiment run:
```

Even when I changed a dependency, which could be the basis for my experiment, all the pipeline data was checked out again (undoing my manual changes), so this `exp` is the same as the previous one (1).

BTW if this was the first time I `exp run`, it would be easy to miss that the data I changed was restored silently in the process. I'd just see that the exp results are the same as in `HEAD` which would be misleading.

#### 2.2 Does it really behave exactly like `repro`?

https://dvc.org/doc/command-reference/exp/run reads:

> dvc exp run is equivalent to dvc repro for experiments. It has the same behavior when it comes to targets and stage execution

We also say:

> Before using this command, you'll probably want to make modifications such as data and code updates...

But when I change an `add`ed data file and try `exp run`, it undoes those changes and reports that no stage has changed. `repro` re-adds it instead, and then runs any stage downstream.

### ~~3. Not all changes to "code" can be queued~~

> Extracted to #5801

<details>

In [the docs](https://dvc.org/doc/command-reference/exp/run) we say "Before running an experiment, you'll probably want to make modifications such as data and code updates, or hyperparameter tuning." Is this the intended behavior?

Because, if we see dvc.yaml as code — and I think we do as we one of DVC's principles is _pipeline codification_ — then this statement isn't completely true when it comes to queueing experiments. It works with regular experiments though, which use the workspace files (not a tmp copy), which makes me think this may be unintended (as we want all kinds of experiments to be consistent in behavior AFAIK).

Specifically, if you create (or modify) dvc.yaml between queued runs and then try to `--run-all`, you get errors. Example:

```console
$ git init; dvc init
$ git add --all; git commit -m "`dvc -V`"
$ dvc stage add -n hi -o hi "echo hey > hi"
Creating 'dvc.yaml'
Adding stage 'hi' in 'dvc.yaml'
...
$ dvc exp run
... # works

$ dvc exp run --queue
Queued experiment '16c7340' for future execution.
$ dvc stage add -fn hi -o hello "echo hi > hello"
Modifying stage 'hi' in 'dvc.yaml'
...
$ dvc exp run --queue
Queued experiment '41791e2' for future execution.

$ dvc exp run --run-all
!ERROR: 'dvc.yaml' does not exist
ERROR: 'dvc.yaml' does not exist
ERROR: Failed to reproduce experiment '41791e2'
ERROR: Failed to reproduce experiment '16c7340'
```

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

exp run: checks out data dependencies (destructive) [QA] #5593

1. It produces a first unchanged exp

2. It checks out data dependencies (destructive)

2.2 Does it really behave exactly like `repro`?

3. Not all changes to "code" can be queued

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

exp run: checks out data dependencies (destructive) [QA] #5593

Description

1. It produces a first unchanged exp

2. It checks out data dependencies (destructive)

2.2 Does it really behave exactly like repro?

3. Not all changes to "code" can be queued

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

2.2 Does it really behave exactly like `repro`?