Serving and deploying Pytorch models for free with FastAPI and AWS Lambda — Part 1 [FAILED]

Disclaimer: this tutorial ended up failing because of the size of the Pytorch library, but there are still a lot of great things to learn about (Github workflows, FastAPI, AWS S3 and Lambda). In the second part, I will try to find another free solution to serve Pytorch models in production.

Link to the Github repo:

The solution is as follow:

  • the model is built with Pytorch (machine translation)

Setup Git and Github repository

You know the drill:

  1. Create a new repository

Setup virtual environment

Virtual environments ensure a consistent development environment so that you run the same packages (and versions) locally as well as in your remote instance (and with your colleagues if you’re in a team). We will use virtualenv

pip install virtualenv

Create the virtual environment (cd inside the app folder):

# virtualenv venv
# if you have the following error: "command not found", try this:
python -m virtualenv venv

Activate the virtual environment:

source venv/bin/activate

Install the required dependencies:

Create a requirements.txt file at the root of the app folder and paste this:

# requirements.txt

You can now install these dependencies by running the command :

pip install -r requirements.txt

Build and test your REST API with FastAPI and Pytest

You can find a great FastAPI tutorial on the official website:

Here is the folder structure:

└── app
├── tests
| ├──
| └──
└── api
└── v1

Write the tests for your API in and run them with the command: (you can find the tests on Github)

python -m pytest

You should have errors as we didn’t build the app yet. This is called TDD — Test Driven Development! 💡 we write the tests before implementing the functionalities.

Let’s build the app until the tests go green 🚦

(find the code in Github)

Recall: don’t forget to commit your changes regularly

Create automation with Github Actions

mkdir .github/workflows
touch .github/workflows/main.yml

The 2 jobs that we need are:

  • Continuous Integration (CI)

to run the tests, package our FastAPI into a Lambda function and push it to GitHub.

  • Continuous Deployment (CD)

to download the lambda artefact that has been uploaded during the previous job and deploy it to AWS Lambda by linking it with AWS S3. This job is only executed if the previous one succeeded.

So it looks like this (find the whole script in the Github repo):

name: CI/CD Pipelineon:
branches: [ main ]


Now push everything to Github and go to the Actions tab. You should see the log of the Github Action. If you have any error read the logs carefully and make the necessary changes (let me know in the comment down below if you struggle to fix the error).

Configure AWS Lambda and set up CI/CD

Here are the steps to set up everything:

  1. Signup to AWS, create a Lambda function

GitHub Secrets are used to store confidential information. In our case, we need to store our AWS_SECRET_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_DEFAULT_REGION.

Add your secrets to your Github repo. Go to Settings, Secrets.
To find the values:

  1. AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY: go to AWS Management Console, on the top right click on your account and My Security Credentials, Access Keys, Create New Access Key

Hopefully, everything should work 🤞. Push your changes to Github and go to Actions to look at the logs.

Create an API Gateway

Now we need to create a REST API Gateway to allow the world to send requests to our lambda function.

  1. Go to API Gateway (in AWS Console)

Pytorch Deep Learning Model

Install PyTorch and update requirements.txt:

pip install torch torchtext spacy
pip freeze > requirements.txt

Build, train, and save the model

You can find all the code in the Github repo to build, train and save a deep learning model for machine translation.

Serve the model

  1. create a new POST endpoint

Finally, push everything and let your CI/CD do the rest for you 😎

… and it failed … 🤦

Your lambda function should be less than ~200MB, but ours is around 2GB… The reason is that Pytorch itself takes more than 1GB and our model has around one million parameters!

I thought it would be okay because AWS Lambda offers a lot of GB. But this storage is meant to be used during the processing of requests and not to store the lambda function itself.
So we could for example install Pytorch during the processing of the API request but it would take a lot of time (and thus money!).

To be continued…

In part 2 we will try another way using AWS ECR and Docker. But let me figure it out first…



French ML undergrad — writing about GraphQL security @ — freelance developer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store

French ML undergrad — writing about GraphQL security @ — freelance developer