Nono.MA

How SageMaker's model_dir, output_dir, and output_data_dir Parameters Work

DECEMBER 2, 2021

SageMaker can be quite confusing. Here are some notes I took while learning how the model and output parameters work.

  • model_dir is provided as an Estimator function parameter.
  • output_dir and output_data_dir are provided as Estimator hyperparameters.

(See how to provide these arguments in code below.)

After a successful run, whatever is saved to each of these directories will be uploaded to a specific S3 location within your job's folder and bucket.

  • model.tar.gz will contain files saved to /opt/ml/model
  • output.tar.gz will contain files saved to /opt/ml/output and (inside of the data subfolder) the files saved to /opt/ml/output/data

Here's the sample directory tree with a train.py entry point that saves a text file to each of these locations.

# Files saved to /opt/ml/model/
model.tar.gz
    model.txt

# Files saved to /opt/ml/output/
output.tar.gz
    output.txt
    success
    # Files saved to /opt/ml/output/data/
    data/
        output_data.txt

# Files in the Estimator's source_dir
source/
    sourcedir.tar.gz
        # All files in your source_dir

Here's how you'd override these locations in your Estimator.

# Create a TensorFlow Estimator
estimator = sagemaker.tensorflow.estimator.TensorFlow(
    ...
    model_dir='/opt/ml/model',
    hyperparameters={
        'output_data_dir': '/opt/ml/output/data/',
        'output_dir': '/opt/ml/output/',
    },
    ...
)

And here's how you'd read their values inside of your entry point, e.g., train.py. Note that, even if you don't pass these three variables to your Estimator and its hyperparameters, you can capture them in your entry point script by defaulting to the SageMaker environment variables, namely, SM_MODEL_DIR, SM_OUTPUT_DIR, and SM_OUTPUT_DATA_DIR, which default to /opt/ml/model, /opt/ml/output, and /opt/ml/output/data.

import argparse
import os

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("--model_dir", type=str,
                                    default=os.environ.get('SM_MODEL_DIR'),
                                       help="Directory to save model files.")
    parser.add_argument("--output_dir", type=str,
                                    default=os.environ.get('SM_OUTPUT_DIR'),
                                       help="Directory to save output artifacts.")
    parser.add_argument("--output_data_dir", type=str,
                                    default=os.environ.get('SM_OUTPUT_DATA_DIR'),
                                       help="Directory to save output data artifacts.")

    opt = parser.parse_args()
    print(f'model_dir › {opt.model_dir}')
    print(f'output_dir › {opt.output_dir}')
    print(f'output_data_dir › {opt.output_data_dir}')

Testing this functionality, I saved a text file to each of these locations to see what the SageMaker SDK was uploading to S3. (The resulting directory structure can be seen above.)

# Save a text file to model_dir
f = open(os.path.join(opt.model_dir, 'model.txt'), 'w')
f.write('Contents of model.txt!')
f.close()

# Save a text file to output_dir
f = open(os.path.join(opt.output_dir, 'output.txt'), 'w')
f.write('Contents of output.txt!')
f.close()

# Save a text file to output_data_dir
f = open(os.path.join(opt.output_data_dir, 'output_data.txt'), 'w')
f.write('Contents of output_data.txt!')
f.close()

What's the difference between output_dir and output_data_dir?

SageMaker provides two different folders, the parent output folder and the output data subfolder. According to official AWS GitHub samples, output_dir is the directory where training success/failure indications will be written—which is an empty file named either success or failure—, output_data_dir is reserved to save non-model artifacts, such as diagrams, TensorBoard logs, or any other artifacts you want to generate during the training process.


I hope the read was helpful!

AwsSagemakerCode