Nono.MA

DECEMBER 2, 2021

SageMaker can be quite confusing. Here are some notes I took while learning how the model and output parameters work.

  • model_dir is provided as an Estimator function parameter.
  • output_dir and output_data_dir are provided as Estimator hyperparameters.

(See how to provide these arguments in code below.)

After a successful run, whatever is saved to each of these directories will be uploaded to a specific S3 location within your job's folder and bucket.

  • model.tar.gz will contain files saved to /opt/ml/model
  • output.tar.gz will contain files saved to /opt/ml/output and (inside of the data subfolder) the files saved to /opt/ml/output/data

Here's the sample directory tree with a train.py entry point that saves a text file to each of these locations.

# Files saved to /opt/ml/model/
model.tar.gz
    model.txt

# Files saved to /opt/ml/output/
output.tar.gz
    output.txt
    success
    # Files saved to /opt/ml/output/data/
    data/
        output_data.txt

# Files in the Estimator's source_dir
source/
    sourcedir.tar.gz
        # All files in your source_dir

Here's how you'd override these locations in your Estimator.

# Create a TensorFlow Estimator
estimator = sagemaker.tensorflow.estimator.TensorFlow(
    ...
    model_dir='/opt/ml/model',
    hyperparameters={
        'output_data_dir': '/opt/ml/output/data/',
        'output_dir': '/opt/ml/output/',
    },
    ...
)

And here's how you'd read their values inside of your entry point, e.g., train.py. Note that, even if you don't pass these three variables to your Estimator and its hyperparameters, you can capture them in your entry point script by defaulting to the SageMaker environment variables, namely, SM_MODEL_DIR, SM_OUTPUT_DIR, and SM_OUTPUT_DATA_DIR, which default to /opt/ml/model, /opt/ml/output, and /opt/ml/output/data.

import argparse
import os

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("--model_dir", type=str,
                                    default=os.environ.get('SM_MODEL_DIR'),
                                       help="Directory to save model files.")
    parser.add_argument("--output_dir", type=str,
                                    default=os.environ.get('SM_OUTPUT_DIR'),
                                       help="Directory to save output artifacts.")
    parser.add_argument("--output_data_dir", type=str,
                                    default=os.environ.get('SM_OUTPUT_DATA_DIR'),
                                       help="Directory to save output data artifacts.")

    opt = parser.parse_args()
    print(f'model_dir › {opt.model_dir}')
    print(f'output_dir › {opt.output_dir}')
    print(f'output_data_dir › {opt.output_data_dir}')

Testing this functionality, I saved a text file to each of these locations to see what the SageMaker SDK was uploading to S3. (The resulting directory structure can be seen above.)

# Save a text file to model_dir
f = open(os.path.join(opt.model_dir, 'model.txt'), 'w')
f.write('Contents of model.txt!')
f.close()

# Save a text file to output_dir
f = open(os.path.join(opt.output_dir, 'output.txt'), 'w')
f.write('Contents of output.txt!')
f.close()

# Save a text file to output_data_dir
f = open(os.path.join(opt.output_data_dir, 'output_data.txt'), 'w')
f.write('Contents of output_data.txt!')
f.close()

What's the difference between output_dir and output_data_dir?

SageMaker provides two different folders, the parent output folder and the output data subfolder. According to official AWS GitHub samples, output_dir is the directory where training success/failure indications will be written—which is an empty file named either success or failure—, output_data_dir is reserved to save non-model artifacts, such as diagrams, TensorBoard logs, or any other artifacts you want to generate during the training process.


I hope the read was helpful!

JULY 15, 2021

Here are a few helper functions to list Lambda functions and layers (and to count them) using the AWS Command Line Interface (AWS CLI) to inspect the serverless resources of your Amazon Web Services (AWS) account.

Listing Lambda Layers of a Function

aws lambda get-function --function-name {name|arn} | \
jq .Configuration.Layers
[
  {
    "Arn": "arn:aws:lambda:us-west-2:00000000:layer:layer-name:1",
    "CodeSize": 1231231
  }
]

Counting Lambda Layers of a Function

aws lambda get-function --function-name {name|arn} | \
jq '.Configuration.Layers | length'
# Returns 1 (or number of layers attached to function)

Counting Lambda Layers in an AWS account

aws lambda list-layers | \
jq '.Layers | length'
# Returns 4 (or number of layers in your account)

Listing All Layers in an AWS account

aws lambda list-layers
{
    "Layers": [
        {
            "LayerName": "layer-name",
            "LayerArn": "arn:aws:lambda:us-west-2:0123456789:layer:layer-name",
            "LatestMatchingVersion": {
                "LayerVersionArn": "arn:aws:lambda:us-west-2:0123456789:layer:layer-name:1",
                "Version": 1,
                "Description": "Layer Description",
                "CreatedDate": "2021-07-14T14:00:27.370+0000",
                "CompatibleRuntimes": [
                    "python3.7"
                ],
                "LicenseInfo": "MIT"
            }
        },
        {
            "LayerName": "another-layer-name",
            "LayerArn": "arn:aws:lambda:us-west-2:0123456789:layer:another-layer-name",
            "LatestMatchingVersion": {
                "LayerVersionArn": "arn:aws:lambda:us-west-2:0123456789:layer:another-layer-name:4",
                "Version": 4,
                "Description": "Layer Description",
                "CreatedDate": "2021-07-14T11:41:45.520+0000",
                "CompatibleRuntimes": [
                    "python3.6"
                ],
                "LicenseInfo": "MIT"
            }
        }
    ]
}

Listing Lambda Functions in an AWS account

aws lambda list-functions
{
    "Functions": [
        {
            "FunctionName": "function-name",
            "FunctionArn": "arn:aws:lambda:us-west-2:0123456789:function:function-name",
            "Runtime": "python3.7",
            "Role": "arn:aws:iam::0123456789:role/role-name",
            "Handler": "lambda_function.lambda_handler",
            "CodeSize": 1234,
            "Description": "Function description.",
            "Timeout": 30,
            "MemorySize": 128,
            "LastModified": "2021-07-14T16:48:19.052+0000",
            "CodeSha256": "28ua8s0aw0820492r=",
            "Version": "$LATEST",
            "Environment": {
                "Variables": {
                }
            },
            "TracingConfig": {
                "Mode": "PassThrough"
            },
            "RevisionId": "1b0be4c3-4eb6-4254-9061-050702646940",
            "Layers": [
                {
                    "Arn": "arn:aws:lambda:us-west-2:0123456789:layer:layer-name:1",
                    "CodeSize": 1563937
                }
            ],
            "PackageType": "Zip"
        }
    ]
}

JANUARY 19, 2021

Here's how to execute a deployed AWS Lambda function with the AWS command-line interface.

Create a payload.json file that contains a JSON payload.

{
  "foo": "bar"
}

Then convert the payload to base64.

base64 payload.json
# returns ewogICJmb28iOiAiYmFyIgp9Cg==

And replace the contents of payload.json with that base64 string.

ewogICJmb28iOiAiYmFyIgp9Cg==

Invoke your Lambda function using that payload.

aws lambda invoke \
--function-name My-Lambda-Function-Name \
--payload file://payload.json \
output.json

The request's response will be printed in the console and the output will be saved in output.json.

If you're developing locally, you can use the aws lambda update-function-code function to synchronize your local code with your Lambda funciton.

NOVEMBER 4, 2020

The Amazon Web Services (AWS) command-line interface — the AWS Cli — lets you update the code of a Lambda function right from the cli. Here's how.

aws lambda update-function-code \
--function-name my-function-name \
--region us-west-2 \
--zip-file fileb://lambda.zip

Let's understand what you need to run this command.

  • aws lambda update-function-code - to execute this command you need the awscli installed on your machine and your authentication information has to be configured to your account
  • --function-name - this is the name of an existing Lambda function in your AWS account
  • --region - the region in which your Lambda lives (in this case, it's Oregon, whose code is us-west-2, you can see a list of regions and their codes here)
  • --zip-file - this is the path to your zipped Lambda code with the fileb:// prefix, in the example, there's a lambda.zip file in the current directory, alternatively you can use the --s3-bucket and --s3-key to use a zip file from an S3 bucket)

After your function code has been updated, you can invoke the Lambda function to verify everything is working as expected.

If you want to learn more about this command, here's the AWS CLI command reference guide, and here's the free Kindle version. Among other things, it lets you create Lambda Layer versions, invoke functions, and much more.

LAST UPDATED JUNE 3, 2020

2020.06.03

I've found that if the creation or starting of a notebook takes longer than 5 minutes the notebook will fail, plus re-creating the conda environment every time you start an existing notebook makes the wait really long. Another solution which I'm preferring now is to use these persistent-conda-ebs scripts—on-create.sh and on-start.sh—provided by Amazon Sagemaker as examples. To keep it short, they download Miniconda and create an environment on-create with whatever Python version you choose, you can customize your environment (say, installing Python packages with pip or conda inside of it), and then that environment is persistent across sessions and future starts that will run the on-start script and have your notebook running in 1–2 minutes. Hope that helps! That's the way I'm using lifecycle configurations now.


2020.03.24

Here's something I learned about Amazon SageMaker today at work.

You can create notebook instances with different instance types (say, ml.t2.medium or ml.p3.2xlarge) and use a set of kernels that have been setup for you. These are conda (Anaconda) environments exposed as Jupyter notebook kernels that execute the commands you write on the Python notebook.

What I learned today that I didn't know is that you can create your own conda environment and expose them as kernels so you're not limited to run with the kernels offered by Amazon AWS.

This is the sample environment I setup today. These commands should be run on a Terminal window in a SageMaker notebook but they most likely can run on any environment with conda installed.

# Create new conda environment named env_tf210_p36
$ conda create --name env_tf210_p36 python=3.6 tensorflow-gpu=2.1.0 ipykernel tensorflow-datasets matplotlib pillow keras

# Enable conda on bash
$ echo ". /home/ec2-user/anaconda3/etc/profile.d/conda.sh" >> ~/.bashrc

# Enter bash (if you're not already running in bash)
$ bash

# Activate your freshly created environment
$ conda activate env_tf210_p36

# Install GitHub dependencies
$ pip install git+https://github.com/tensorflow/examples.git

# Now you have your environment setup - Party!
# ..

# When you're ready to leave
$ conda deactivate

How do we expose our new conda environment as a SageMaker kernel?

# Activate the conda environment (as it has ipykernel installed)
$ conda activate env_tf210_p36

# Expose your conda environment with ipykernel
$ python -m ipykernel install --user --name env_tf210_p36 --display-name "My Env (tf_2.1.0 py_3.6)"

After reloading your notebook instance you should see your custom environment appear in the launcher and in the notebook kernel selector.

What if you don't want to repeat this process over and over and over?

You can create a lifecycle configuration on SageMaker that will run this initial environment creation setup every time you create a new notebook instance. (You create a new Lifecycle Configuration and paste the following code inside of the Create Notebook tab.)


#!/bin/bash

set -e

# OVERVIEW
# This script creates and configures the env_tf210_p36 environment.

sudo -u ec2-user -i <<EOF

echo ". /home/ec2-user/anaconda3/etc/profile.d/conda.sh" >> ~/.bashrc

# Create custom conda environment
conda create --name env_tf210_p36 python=3.6 tensorflow-gpu=2.1.0 ipykernel tensorflow-datasets matplotlib pillow keras -y

# Activate our freshly created environment
source /home/ec2-user/anaconda3/bin/activate env_tf210_p36

# Install git-repository dependencies
pip install -q git+https://github.com/tensorflow/examples.git

# Expose environment as kernel
python -m ipykernel install --user --name env_tf210_p36 --display-name My_Env_tf_2.1.0_py_3.6

# Deactivate environment
source /home/ec2-user/anaconda3/bin/deactivate

EOF

That way you won't have to setup each new notebook instance you create. You'll just have to pick the lifecycle you just created. Take a look at Amazon SageMaker notebook instance Lifecycle Configuration samples.

Want to see older publications? Visit the archive.

Listen to Getting Simple .