Serving and deploying Pytorch models for free with FastAPI and AWS Lambda — Part 1 [FAILED]

5 min readMay 30, 2021

Disclaimer: this tutorial ended up failing because of the size of the Pytorch library, but there are still a lot of great things to learn about (Github workflows, FastAPI, AWS S3 and Lambda). In the second part, I will try to find another free solution to serve Pytorch models in production.

Link to the Github repo:

The solution is as follow:

  • the model is built with Pytorch (machine translation)
  • the model is served through a REST API built with FastAPI
  • the application is then compressed and stored in an AWS S3 bucket
  • finally, the API is deployed as a lambda function on AWS Lambda

Setup Git and Github repository

You know the drill:

  1. Create a new repository
  2. Add .gitignore (Python template)
  3. Add
  4. Clone on your local machine
  5. Don’t forget to commit changes after every step! ⚠️

Setup virtual environment

Virtual environments ensure a consistent development environment so that you run the same packages (and versions) locally as well as in your remote instance (and with your colleagues if you’re in a team). We will use virtualenv

pip install virtualenv

Create the virtual environment (cd inside the app folder):

# virtualenv venv
# if you have the following error: "command not found", try this:
python -m virtualenv venv

Activate the virtual environment:

source venv/bin/activate

Install the required dependencies:

Create a requirements.txt file at the root of the app folder and paste this:

# requirements.txt

You can now install these dependencies by running the command :

pip install -r requirements.txt

Build and test your REST API with FastAPI and Pytest

You can find a great FastAPI tutorial on the official website:

Here is the folder structure:

└── app
├── tests
| ├──
| └──
└── api
└── v1

Write the tests for your API in and run them with the command: (you can find the tests on Github)

python -m pytest

You should have errors as we didn’t build the app yet. This is called TDD — Test Driven Development! 💡 we write the tests before implementing the functionalities.

Let’s build the app until the tests go green 🚦

(find the code in Github)

Recall: don’t forget to commit your changes regularly

Create automation with Github Actions

mkdir .github/workflows
touch .github/workflows/main.yml

The 2 jobs that we need are:

  • Continuous Integration (CI)

to run the tests, package our FastAPI into a Lambda function and push it to GitHub.

  • Continuous Deployment (CD)

to download the lambda artefact that has been uploaded during the previous job and deploy it to AWS Lambda by linking it with AWS S3. This job is only executed if the previous one succeeded.

So it looks like this (find the whole script in the Github repo):

name: CI/CD Pipelineon:
branches: [ main ]


Now push everything to Github and go to the Actions tab. You should see the log of the Github Action. If you have any error read the logs carefully and make the necessary changes (let me know in the comment down below if you struggle to fix the error).

Configure AWS Lambda and set up CI/CD

Here are the steps to set up everything:

  1. Signup to AWS, create a Lambda function
  2. Select: author from scratch
  3. Select: runtime: python 3.7
  4. Select: choose or create an execution role
  5. Select: create function
  6. Create an S3 bucket in AWS console management. ⚠️ it should be in the same region as your Lambda function
  7. Change runtime settings (should be lambda_function.lambda_handler now), change it to main.handler
  8. Update your Github Actions config file (main.yml) to deploy your function to Lambda (don’t forget to change YOUR_LAMBDA_FUNCTION_NAME and YOUR_S3_BUCKET with your own names)

GitHub Secrets are used to store confidential information. In our case, we need to store our AWS_SECRET_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_DEFAULT_REGION.

Add your secrets to your Github repo. Go to Settings, Secrets.
To find the values:

  1. AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY: go to AWS Management Console, on the top right click on your account and My Security Credentials, Access Keys, Create New Access Key
  2. AWS_DEFAULT_REGION is the AWS Region of your lambda function (and S3 bucket as they need to be the same). Just put the lower case part, e.g. eu-west-3 or us-west-2.

Hopefully, everything should work 🤞. Push your changes to Github and go to Actions to look at the logs.

Create an API Gateway

Now we need to create a REST API Gateway to allow the world to send requests to our lambda function.

  1. Go to API Gateway (in AWS Console)
  2. Click on Build on the card that says REST API but doesn’t have Private on it
  3. Click New API and enter a name for this API, then Create API
  4. Actions > Create Method > ANY > Use Lambda Proxy integration > Save
  5. Add a proxy: Actions > Create Resource > Configure as proxy > Create Resource
  6. Actions > Deploy API > [New Staging] > “dev” (or “staging” or whatever you want)

Pytorch Deep Learning Model

Install PyTorch and update requirements.txt:

pip install torch torchtext spacy
pip freeze > requirements.txt

Build, train, and save the model

You can find all the code in the Github repo to build, train and save a deep learning model for machine translation.

Serve the model

  1. create a new POST endpoint
  2. create utilities: process data (string to encoded tensor), load pre-trained model, feedforward, process output (encoded tensor to string)

Finally, push everything and let your CI/CD do the rest for you 😎

… and it failed … 🤦

Your lambda function should be less than ~200MB, but ours is around 2GB… The reason is that Pytorch itself takes more than 1GB and our model has around one million parameters!

I thought it would be okay because AWS Lambda offers a lot of GB. But this storage is meant to be used during the processing of requests and not to store the lambda function itself.
So we could for example install Pytorch during the processing of the API request but it would take a lot of time (and thus money!).

To be continued…

In part 2 we will try another way using AWS ECR and Docker. But let me figure it out first…




French ML undergrad — writing about GraphQL security @ — freelance developer