Table of contents:
This repo hosts an example of streamlining boiler-plate codes in SageMaker's training entrypoint script.
The acronym smepu
stands for SageMaker entry point
utilities.
Main features:
-
Configure logger to consistently send logs to Amazon CloudWatch log streams.
-
Passthrough CLI-args to a wrapped estimator, so the entrypoint autors do not have to write the boiler-plate codes that "parses those 10+ CLI args, and calls another estimator with those args."
- Implementation note: this is made possible thanks to the
gluonts.core.serde.decode()
function.
- Implementation note: this is made possible thanks to the
-
Automatically disable fancy outputs when running as Amazon SageMaker training jobs.
pip install \
'git+https://github.com/aws-samples/amazon-sagemaker-entrypoint-utilities@master#egg=smepu'
or:
git clone \
https://github.com/aws-samples/amazon-sagemaker-entrypoint-utilities.git
cd amazon-sagemaker-entrypoint-utilities
pip install -e .
Pre-requisite: know how to write an Amazon SageMaker training entrypoint.
A working hello-world example is provided under examples/
. There are two
versions provided:
entrypoint.py
usesargparse
to parse hyperparameters.entrypoint-click.py
usesclick
to parse hyperparameters.
Use examples/entrypoint.sh
to quickly observe the behavior of those train
entrypoints when they run directly on your Python environment in your machine.
[NOTE: not to be confused with "Amazon SageMaker local mode" which refers to running the script on a SageMaker container running on a SageMaker notebook instance.]
Running the train script directly on your Python environment is a useful trick to speed-up your "dev + functional-test" cycle. Typically this stage utilizes synthetic tiny dataset, and you heavily leverage your favorite dev tools (i.e., unit-test frameworks, code debuggers, etc.).
After this, you can perform "compatibility test" by running your train script on a Amazon SageMaker training container (whether on "Amazon SageMaker local mode" or a training instance), to iron-out compatibilities issues.
When your scripts have been fully tested, then you can start your actual, large-scale model training & experimentation on Amazon SageMaker training instances.
Sample runs:
# Run entrypoint script outside of SageMaker.
examples/00-hello-world/entrypoint.sh
# Mimic running on Amazon SageMaker: automatically off tqdm.
SM_HOSTS=abcd examples/00-hello-world/entrypoint.sh
# Run click-version of entrypoint
examples/00-hello-world/entrypoint.sh -click
To experiment with different hyperparameters, see DummyEstimator
in
examples/00-hello-world/dummyest.py
, and if necessary modify accordingly
complex_args
in examples/00-hello-world/entrypoint.sh
.
Feel free to further explore other sample scripts under examples/
.
See CONTRIBUTING for more formation.
This library is licensed under the MIT-0 License. See the LICENSE file.