Inspecting past pipeline runs
Inspecting a finished pipeline run and its outputs.
Last updated
Was this helpful?
Inspecting a finished pipeline run and its outputs.
Last updated
Was this helpful?
Ever trained a model yesterday and forgotten where its artifacts are stored? This tutorial shows you how to:
List pipelines and discover their runs in Python or via the CLI
Drill down into an individual run to inspect steps, settings and metadata
Load output artifacts such as models or datasets straight back into your code
We'll work our way down the ZenML object hierarchy—from pipelines → runs → steps → artifacts—giving you a complete guide to accessing your past work.
Before starting this tutorial, make sure you have:
ZenML installed and configured
At least one pipeline that has been run at least once
Basic understanding of
The hierarchy of pipelines, runs, steps, and artifacts is as follows:
As you can see from the diagram, there are many layers of 1-to-N relationships.
Let's investigate how to traverse this hierarchy level by level:
If you're not sure which pipeline you need to fetch, you can find a list of all registered pipelines in the ZenML dashboard, or list them programmatically either via the Client or the CLI.
Each pipeline can be executed many times, resulting in several Runs. Let's explore how to access them.
You can get a list of all runs of a pipeline using the runs
property of the pipeline:
The result will be a list of the most recent runs of this pipeline, ordered from newest to oldest.
To access the most recent run of a pipeline, you can either use the last_run
property or access it through the runs
list:
Calling a pipeline executes it and then returns the response of the freshly executed run:
The run that you get back is the model stored in the ZenML database at the point of the method call. This means the pipeline run is still initializing and no steps have been run. To get the latest state, you can get a refreshed version from the client:
The status of a pipeline run. There are five possible states: initialized, failed, completed, running, and cached.
Depending on the stack components you use, you might have additional component-specific metadata associated with your run, such as the URL to the UI of a remote orchestrator. You can access this component-specific metadata via the run_metadata
attribute:
Within a given pipeline run you can further zoom in on individual steps using the steps
attribute:
Similar to the run, you can use the step
object to access a variety of useful information:
The parameters used to run the step via step.config.parameters
The step-level settings via step.config.settings
Component-specific step metadata, such as the URL of an experiment tracker or model deployer, via step.run_metadata
Each step of a pipeline run can have multiple output and input artifacts that we can inspect via the outputs
and inputs
properties.
To inspect the output artifacts of a step, you can use the outputs
attribute, which is a dictionary that can be indexed using the name of an output. Alternatively, if your step only has a single output, you can use the output
property as a shortcut:
Similarly, you can use the inputs
and input
properties to get the input artifacts of a step:
Note that the output of a step corresponds to a specific artifact version.
If you'd like to fetch an artifact or an artifact version directly, it is easy to do so with the Client
:
Regardless of how one fetches it, each artifact contains a lot of general information about the artifact as well as datatype-specific metadata and visualizations.
All output artifacts saved through ZenML will automatically have certain datatype-specific metadata saved with them. NumPy Arrays, for instance, always have their storage size, shape
, dtype
, and some statistical properties saved with them. You can access such metadata via the run_metadata
attribute of an output:
ZenML automatically saves visualizations for many common data types. Using the visualize()
method you can programmatically show these visualizations in Jupyter notebooks:
While most of this tutorial has focused on fetching objects after a pipeline run has been completed, the same logic can also be used within the context of a running pipeline.
This is often desirable in cases where a pipeline is running continuously over time and decisions have to be made according to older runs.
For example, this is how we can fetch the last pipeline run of the same pipeline from within a ZenML step:
Putting it all together, here's a complete example that demonstrates how to load the model trained by the svc_trainer
step of an example pipeline:
Here are solutions for common issues you might encounter when working with pipeline runs and artifacts:
If you get an error indicating a run was not found:
If you're not sure what the output name of a step is:
Now that you know how to inspect and retrieve information from past pipeline runs, you can:
Build pipelines that make decisions based on previous runs
Create comparison reports between different experiment configurations
Load trained models for evaluation or deployment
Extract and analyze metrics across multiple runs
After you have run a pipeline at least once, you can fetch the pipeline via the method:
Check out the for more information on the Client
class and its purpose.
You can use the method to get a list of all pipelines registered in ZenML:
Alternatively, you can also use the pipeline_model.get_runs()
method which allows you to specify detailed parameters for filtering or pagination. See the for more information.
If you already know the exact run that you want to fetch (e.g., from looking at the dashboard), you can use the method to fetch the run directly without having to query the pipeline first:
Similar to pipelines, you can query runs by either ID, name, or name prefix, and you can also discover runs through the Client or CLI via the or zenml pipeline runs list
commands.
Each run has a collection of useful information which can help you reproduce your runs. In the following, you can find a list of some of the most useful pipeline run information, but there is much more available. See the definition for a comprehensive list.
The pipeline_configuration
is an object that contains all configurations of the pipeline and pipeline run, including the :
If you're only calling each step once inside your pipeline, the invocation ID will be the same as the name of your step. For more complex pipelines, check out to learn more about the invocation ID.
If you are using , you can easily view your pipeline runs by opening the sidebar (click on the ZenML icon). You can then click on any particular pipeline run to see its status and some other metadata. If you want to delete a run, you can also do so from the same sidebar view.
See the definition for a comprehensive list of available information.
Check out to see what the output names of your steps are and how to customize them.
You can read more about metadata in .
If you're not in a Jupyter notebook, you can simply view the visualizations in the ZenML dashboard by running zenml login --local
and clicking on the respective artifact in the pipeline run DAG instead. Check out the to learn more about how to build and view artifact visualizations in ZenML!
As shown in the example, we can get additional information about the current run using the StepContext
, which is explained in more detail in the .
Combine with to compare model variants
Explore for more advanced data handling
Learn about for scaling your pipelines