I've found that if the creation or starting of a notebook takes longer than 5 minutes the notebook will fail, plus re-creating the conda environment every time you start an existing notebook makes the wait really long. Another solution which I'm preferring now is to use these persistent-conda-ebs scripts—
on-start.sh—provided by Amazon Sagemaker as examples. To keep it short, they download Miniconda and create an environment
on-create with whatever Python version you choose, you can customize your environment (say, installing Python packages with
conda inside of it), and then that environment is persistent across sessions and future starts that will run the
on-start script and have your notebook running in 1–2 minutes. Hope that helps! That's the way I'm using lifecycle configurations now.
Here's something I learned about Amazon SageMaker today at work.
You can create notebook instances with different instance types (say,
ml.p3.2xlarge) and use a set of kernels that have been setup for you. These are
conda (Anaconda) environments exposed as Jupyter notebook kernels that execute the commands you write on the Python notebook.
What I learned today that I didn't know is that you can create your own
conda environment and expose them as kernels so you're not limited to run with the kernels offered by Amazon AWS.
This is the sample environment I setup today. These commands should be run on a Terminal window in a SageMaker notebook but they most likely can run on any environment with
# Create new conda environment named env_tf210_p36 $ conda create --name env_tf210_p36 python=3.6 tensorflow-gpu=2.1.0 ipykernel tensorflow-datasets matplotlib pillow keras # Enable conda on bash $ echo ". /home/ec2-user/anaconda3/etc/profile.d/conda.sh" >> ~/.bashrc # Enter bash (if you're not already running in bash) $ bash # Activate your freshly created environment $ conda activate env_tf210_p36 # Install GitHub dependencies $ pip install git+https://github.com/tensorflow/examples.git # Now you have your environment setup - Party! # .. # When you're ready to leave $ conda deactivate
How do we expose our new
conda environment as a SageMaker kernel?
# Activate the conda environment (as it has ipykernel installed) $ conda activate env_tf210_p36 # Expose your conda environment with ipykernel $ python -m ipykernel install --user --name env_tf210_p36 --display-name "My Env (tf_2.1.0 py_3.6)"
After reloading your notebook instance you should see your custom environment appear in the launcher and in the notebook kernel selector.
What if you don't want to repeat this process over and over and over?
You can create a lifecycle configuration on SageMaker that will run this initial environment creation setup every time you create a new notebook instance. (You create a new Lifecycle Configuration and paste the following code inside of the Create Notebook tab.)
#!/bin/bash set -e # OVERVIEW # This script creates and configures the env_tf210_p36 environment. sudo -u ec2-user -i <<EOF echo ". /home/ec2-user/anaconda3/etc/profile.d/conda.sh" >> ~/.bashrc # Create custom conda environment conda create --name env_tf210_p36 python=3.6 tensorflow-gpu=2.1.0 ipykernel tensorflow-datasets matplotlib pillow keras -y # Activate our freshly created environment source /home/ec2-user/anaconda3/bin/activate env_tf210_p36 # Install git-repository dependencies pip install -q git+https://github.com/tensorflow/examples.git # Expose environment as kernel python -m ipykernel install --user --name env_tf210_p36 --display-name My_Env_tf_2.1.0_py_3.6 # Deactivate environment source /home/ec2-user/anaconda3/bin/deactivate EOF
That way you won't have to setup each new notebook instance you create. You'll just have to pick the lifecycle you just created. Take a look at Amazon SageMaker notebook instance Lifecycle Configuration samples.