Deploying Pytorch models for free with Docker, AWS ECR and AWS Lambda

Published in

Analytics Vidhya

5 min readJul 5, 2021

I’ve always dreamed of building applications running deep learning models in production. But deep learning frameworks and models are usually quite big and therefore exceed every free hosting solution (Heroku, AWS EC2, etc). And as a student, I can’t spend money on servers ($250/month for Heroku 😱!)

I finally found a solution!

Disclaimer: the solution presented here allows you to experiment for free. However, keeping the application alive for a whole month can lead to exceeding AWS free tier. In the last step (step 5) of this tutorial, I will show how you can set up an automatic clean up to avoid exceeding the free tier threshold.

The solution we will build looks like this:

You can find all the code in the Github repo.

Step 0: requirements

For this tutorial you will need:

How Docker works (from: https://www.saagie.com/blog/your-first-steps-into-docker)

Docker — it’s a tool to build and deploy applications in a consistent environment. Follow the installation guide to install: https://docs.docker.com/get-docker/
SAM CLI — Serverless Application Model, is a tool to help you create, manage and deploy serverless applications on AWS. The installation guide is really good: https://aws.amazon.com/serverless/sam/

Step 1: initialise the project with SAM CLI

Once you have everything set up, you can start a new project from a template:

sam init

Select:

AWS quick-start templates
Image (artifact is an image uploaded to an ECR image repository)
amazon/python3.8-base
PyTorch Machine Learning Inference API

Then cd into the project you just created.

Step 2: build the model

For illustration purposes, I have created and trained a model for machine translation (French to English). You can find the code in the Github repo under /app/model.py. I won’t go into the details as it is not the purpose of this post.

Next, we need to create the handler, this is the function that will process requests. It needs to have the following:

pre-processing the data (e.g. one-hot encoding, converting to tensor, etc)
loading the trained model
making the prediction (model.forward())
processing the output (e.g. one-hot tensor to text)
returning the output as a JSON object

Step 3: uploading the Docker image and deploying the lambda function

The Docker image will be hosted in a repository in AWS ECR. When the lambda function gets triggered (i.e when a user sends a request) it will run the Docker image which will execute the handler with the data sent in the request.

To create an ECR repository run:

aws ecr create-repository --repository-name <REPOSITORY_NAME> --image-scanning-configuration scanOnPush=true --region <REGION>

Don’t forget to set:

<REPOSITORY_NAME> to any name you’d like (for me it’s lambda-pytorch)
<REGION> to an AWS region (e.g. us-east-2)

⚠️ all of your resources need to be in the same region

When prompted, copy the repositoryUri. This is the address of the repository you just created. You will need it in the next step to tell SAM where to upload the Docker image.

Now that the repository is created, you can build the docker image:

sam build

This will create a build.toml file in the .aws-sam folder.

⚠️ make sure to re-run the build command whenever you change your Dockerfile or the application code. Otherwise, you will deploy the older version.

Once it’s done you can run the deploy command to launch the whole deploy stack:

upload the Docker image to your ECR image repository,
create the lambda function,
create the API Gateway that will receive the requests from the web and trigger the lambda function to process them.

sam deploy --guided --stack-name <STACK_NAME>

The cool thing with stacks is that once deployed, you can run the deploy command again ( sam deploy --guided) and it will check if the resources already exist. If it is the case, it will update them with the new ones. So everything is managed and optimized for you!

Step 4: test your Lambda function

To test your Lambda function, you first need to find the API URL (where to send the request).

Go to AWS management console and to Lambda function and click on your Lambda function. Scroll down to API endpoint. It should look something like this: https://abcdefg.execute-api.aws-region.amazonaws.com/Prod

To send a request to the API, run the following command in your terminal:

curl --header "Content-Type: application/json" --request POST --data '{"sentence": "Les deux chiens marchent dans le parc."}' <API_URL>

Step 5: avoid the bill with lifecycle policy rules

ECR free tier only allows 500 MB-month, i.e if you host a Docker image bigger than 500 MB for a whole month, you will exceed the free tier threshold.

There is a chance that your image exceeds this 500 MB threshold. It is not a problem as long as you don’t leave your image for a whole month.

In order to avoid deleting your image manually (and messing up the whole stack automation) or even worse, forgetting to delete it 😱, I will show you how you can set up a lifecycle policy rule.

Lifecycle policy rules allow you to manage images in ECR by defining actions that should be applied automatically (e.g. cleaning up images based on an expiration period).

To set up a lifecycle policy rule:

Go to AWS management console
Go to Elastic Container Registry
Click on your repository (the one that you created in step 2)
On the left side menu, click Lifecycle Policy
Click create rule
Under Match criteria, set Since Image pushed and enter the expiration period you want (e.g 1 will automatically delete every image 24 hours after they’ve been uploaded)
And then Save

Voila! your images will be automatically deleted once the expiration period is over. This way you won’t receive bad surprises.

🔑 If you are still afraid, you can check your consumption in Billing in the AWS Management Console.

Conclusion

I hope you liked this tutorial. If you have any questions please leave them in the comments.

Let me know if you would be interested in going further (unit testing, using Github actions to automatically test/build/deploy on push, building a fully-fledged web app running deep learning models).