From a10083217d453968ab2428485b1afb0d9b5f2d26 Mon Sep 17 00:00:00 2001 From: Amy Unruh Date: Tue, 16 Jan 2018 07:42:34 -0800 Subject: [PATCH 1/2] added a notebook header --- ...sing_tf.estimator.train_and_evaluate.ipynb | 28 ++++++++++++++----- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/ml/census_train_and_eval/using_tf.estimator.train_and_evaluate.ipynb b/ml/census_train_and_eval/using_tf.estimator.train_and_evaluate.ipynb index c0b3634..5d71b8e 100644 --- a/ml/census_train_and_eval/using_tf.estimator.train_and_evaluate.ipynb +++ b/ml/census_train_and_eval/using_tf.estimator.train_and_evaluate.ipynb @@ -106,7 +106,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "collapsed": true + }, "outputs": [], "source": [ "from __future__ import division\n", @@ -675,7 +677,9 @@ "If you take a look in the `trainer` subdirectory of this directory, you'll see that it contains essentially the same code that's in this notebook, just packaged for deployment. `trainer.task` is the entry point, and when that file is run, it calls `tf.estimator.train_and_evaluate`. \n", "(You can read more about how to package your code [here](https://cloud.google.com/ml-engine/docs/packaging-trainer)). \n", "\n", - "We'll test training via `gcloud` locally first, to make sure that we have everything packaged up correctly." + "We'll test training via `gcloud` locally first, to make sure that we have everything packaged up correctly.\n", + "\n", + "### Test training locally via `gcloud`" ] }, { @@ -878,7 +882,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "collapsed": true + }, "outputs": [], "source": [ "!cat config_custom_gpus.yaml" @@ -898,7 +904,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "collapsed": true + }, "outputs": [], "source": [ "job_name = \"census_job_%s\" % (int(time.time()))\n", @@ -914,7 +922,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "collapsed": true + }, "outputs": [], "source": [ "!gcloud ml-engine jobs submit training $JOB_NAME --scale-tier $SCALE_TIER \\\n", @@ -958,7 +968,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "collapsed": true + }, "outputs": [], "source": [ "# We'll use the `hptuning_config.yaml` file for this run.\n", @@ -968,7 +980,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "collapsed": true + }, "outputs": [], "source": [ "!gcloud ml-engine jobs submit training $JOB_NAME --scale-tier $SCALE_TIER \\\n", From e94a879cfcfc3dd1112e8cbbc68293f1fec4fcc3 Mon Sep 17 00:00:00 2001 From: Amy Unruh Date: Tue, 16 Jan 2018 11:52:16 -0800 Subject: [PATCH 2/2] use nbviewer for some of the notebook links in the readme --- ml/census_train_and_eval/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ml/census_train_and_eval/README.md b/ml/census_train_and_eval/README.md index 4e37532..8954952 100644 --- a/ml/census_train_and_eval/README.md +++ b/ml/census_train_and_eval/README.md @@ -35,7 +35,7 @@ You can read more about this model and its use [here](https://research.googleblo We're using Estimators because they give us built-in support for distributed training and evaluation (along with other nice features). You should nearly always use Estimators to create your TensorFlow models. You can build a Custom Estimator if none of the pre-made Estimators suit your purpose. -See the accompanying [notebook](using_tf.estimator.train_and_evaluate.ipynb) for the details of defining our Estimator, including specifying the expected format of the input data. +See the accompanying [notebook](https://nbviewer.jupyter.org/github/amygdala/code-snippets/blob/master/ml/census_train_and_eval/using_tf.estimator.train_and_evaluate.ipynb#First-step:-create-an-Estimator) for the details of defining our Estimator, including specifying the expected format of the input data. The data is in csv format, and looks like this: ``` @@ -83,7 +83,7 @@ The `Dataset` API is much more performant than using `feed_dict` or the queue-ba In this simple example, our datasets are too small for the use of the Datasets API to make a large difference, but with larger datasets it becomes much more important. -The `input_fn` definition is the following. It uses a couple of helper functions that are defined in the [notebook](using_tf.estimator.train_and_evaluate.ipynb). +The `input_fn` definition is the following. It uses a couple of helper functions that are defined in the [notebook](https://nbviewer.jupyter.org/github/amygdala/code-snippets/blob/master/ml/census_train_and_eval/using_tf.estimator.train_and_evaluate.ipynb#Define-input-functions-(using-Datasets)). `parse_label_column` is used to convert the label strings (in our case, ' <=50K' and ' >50K') into [one-hot](https://en.wikipedia.org/wiki/One-hot) encodings.