SageMaker can be quite confusing. Here are some notes I took while learning how the model and output parameters work.
model_dir
is provided as an Estimator function parameter.output_dir
and output_data_dir
are provided as Estimator hyperparameters.(See how to provide these arguments in code below.)
After a successful run, whatever is saved to each of these directories will be uploaded to a specific S3 location within your job's folder and bucket.
model.tar.gz
will contain files saved to /opt/ml/model
output.tar.gz
will contain files saved to /opt/ml/output
and (inside of the data
subfolder) the files saved to /opt/ml/output/data
Here's the sample directory tree with a train.py
entry point that saves a text file to each of these locations.
# Files saved to /opt/ml/model/
model.tar.gz
model.txt
# Files saved to /opt/ml/output/
output.tar.gz
output.txt
success
# Files saved to /opt/ml/output/data/
data/
output_data.txt
# Files in the Estimator's source_dir
source/
sourcedir.tar.gz
# All files in your source_dir
Here's how you'd override these locations in your Estimator.
# Create a TensorFlow Estimator
estimator = sagemaker.tensorflow.estimator.TensorFlow(
...
model_dir='/opt/ml/model',
hyperparameters={
'output_data_dir': '/opt/ml/output/data/',
'output_dir': '/opt/ml/output/',
},
...
)
And here's how you'd read their values inside of your entry point, e.g., train.py
. Note that, even if you don't pass these three variables to your Estimator and its hyperparameters, you can capture them in your entry point script by defaulting to the SageMaker environment variables, namely, SM_MODEL_DIR
, SM_OUTPUT_DIR
, and SM_OUTPUT_DATA_DIR
, which default to /opt/ml/model
, /opt/ml/output
, and /opt/ml/output/data
.
import argparse
import os
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("--model_dir", type=str,
default=os.environ.get('SM_MODEL_DIR'),
help="Directory to save model files.")
parser.add_argument("--output_dir", type=str,
default=os.environ.get('SM_OUTPUT_DIR'),
help="Directory to save output artifacts.")
parser.add_argument("--output_data_dir", type=str,
default=os.environ.get('SM_OUTPUT_DATA_DIR'),
help="Directory to save output data artifacts.")
opt = parser.parse_args()
print(f'model_dir › {opt.model_dir}')
print(f'output_dir › {opt.output_dir}')
print(f'output_data_dir › {opt.output_data_dir}')
Testing this functionality, I saved a text file to each of these locations to see what the SageMaker SDK was uploading to S3. (The resulting directory structure can be seen above.)
# Save a text file to model_dir
f = open(os.path.join(opt.model_dir, 'model.txt'), 'w')
f.write('Contents of model.txt!')
f.close()
# Save a text file to output_dir
f = open(os.path.join(opt.output_dir, 'output.txt'), 'w')
f.write('Contents of output.txt!')
f.close()
# Save a text file to output_data_dir
f = open(os.path.join(opt.output_data_dir, 'output_data.txt'), 'w')
f.write('Contents of output_data.txt!')
f.close()
SageMaker provides two different folders, the parent output folder and the output data subfolder. According to official AWS GitHub samples, output_dir
is the directory where training success/failure indications will be written—which is an empty file named either success
or failure
—, output_data_dir
is reserved to save non-model artifacts, such as diagrams, TensorBoard logs, or any other artifacts you want to generate during the training process.
I hope the read was helpful!