Skip to content

Sample utilities to shorten or simplify Amazon SageMaker's training entrypoint: logging handlers, silenced tqdm, hyperparameter parsings, etc.

License

Notifications You must be signed in to change notification settings

yapweiyih/amazon-sagemaker-entrypoint-utilities

 
 

Repository files navigation

Utilities for Amazon SageMaker Training Entrypoint

Table of contents:

1. Overview

This repo hosts an example of streamlining boiler-plate codes in SageMaker's training entrypoint script.

The acronym smepu stands for SageMaker entry point utilities.

Main features:

  1. Configure logger to consistently send logs to Amazon CloudWatch log streams.

  2. Passthrough CLI-args to a wrapped estimator, so the entrypoint autors do not have to write the boiler-plate codes that "parses those 10+ CLI args, and calls another estimator with those args."

    • Implementation note: this is made possible thanks to the gluonts.core.serde.decode() function.
  3. Automatically disable fancy outputs when running as Amazon SageMaker training jobs.

    • Silence tqdm when training on Amazon SageMaker, to reduce the noise of your Amazon CloudWatch logs.

    • Plain output (i.e., no color, no fancy) for wasabi, and spacy CLI (e.g., train or convert).

2. Installation

pip install \
    'git+https://github.com/aws-samples/amazon-sagemaker-entrypoint-utilities@master#egg=smepu'

or:

git clone \
    https://github.com/aws-samples/amazon-sagemaker-entrypoint-utilities.git

cd amazon-sagemaker-entrypoint-utilities
pip install -e .

3. Usage

Pre-requisite: know how to write an Amazon SageMaker training entrypoint.

A working hello-world example is provided under examples/. There are two versions provided:

  1. entrypoint.py uses argparse to parse hyperparameters.
  2. entrypoint-click.py uses click to parse hyperparameters.

Use examples/entrypoint.sh to quickly observe the behavior of those train entrypoints when they run directly on your Python environment in your machine.

[NOTE: not to be confused with "Amazon SageMaker local mode" which refers to running the script on a SageMaker container running on a SageMaker notebook instance.]

Running the train script directly on your Python environment is a useful trick to speed-up your "dev + functional-test" cycle. Typically this stage utilizes synthetic tiny dataset, and you heavily leverage your favorite dev tools (i.e., unit-test frameworks, code debuggers, etc.).

After this, you can perform "compatibility test" by running your train script on a Amazon SageMaker training container (whether on "Amazon SageMaker local mode" or a training instance), to iron-out compatibilities issues.

When your scripts have been fully tested, then you can start your actual, large-scale model training & experimentation on Amazon SageMaker training instances.

Sample runs:

# Run entrypoint script outside of SageMaker.
examples/00-hello-world/entrypoint.sh

# Mimic running on Amazon SageMaker: automatically off tqdm.
SM_HOSTS=abcd examples/00-hello-world/entrypoint.sh

# Run click-version of entrypoint
examples/00-hello-world/entrypoint.sh -click

To experiment with different hyperparameters, see DummyEstimator in examples/00-hello-world/dummyest.py, and if necessary modify accordingly complex_args in examples/00-hello-world/entrypoint.sh.

Feel free to further explore other sample scripts under examples/.

4. Security

See CONTRIBUTING for more formation.

5. License

This library is licensed under the MIT-0 License. See the LICENSE file.

About

Sample utilities to shorten or simplify Amazon SageMaker's training entrypoint: logging handlers, silenced tqdm, hyperparameter parsings, etc.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%