I don’t often write articles about “building a [insert your thing here]” because there are usually already a heap of articles around. For URL shorteners, there certainly are - I would guess almost as many as there are “Getting started with [insert programming language here]” articles, which will invariably print “Hello World” to your terminal.
So why would anyone read my take on “Building a URL Shortener”? I have pondered this question and wanted to list out the things I feel you will learn and take away from this article:
- Introduction to a clean project structure for SAM templates with Python lambdas with centrally managed dependencies. No more modifying buried requirements.txt files all over the place!
- How to create an API using AWS Powertools for Lambda (Python) - if you use Python for Lambda, you need Powertools in your life!
- How to create a re-usable SAM template with configurable parameters so you can install it multiple times
- Some key things I learned along the way in automating 99.99999999995% of the project with SAM CLI (not everything works the first time!).
How the URL Shortener Works
The URL shortening process is very straightforward. The process goes like this:
- User clicks on a short link or types it into the browser address bar
- Browser resolves DNS for the URL shortener domain
- Browser makes a GET request to the API Gateway, passing the shortened URL path to the API.
- API gateway invokes the python Lambda passing it a standard HTTP Lambda Proxy message detailing the request.
- Lambda processes the AWS Message and performs a lookup to DynamoDB using a simple key/value style lookup
- If a destination URL record is found, the API will return an HTTP 301 status code and a response header containing the real Location to be opened.
- The browser will silently open the new location and record in its cache that the original shortened URL should always redirect to the Location provided by the API 301 response.
- If a destination URL is not found, the API will return a temporary 302 redirect response to a defined URL rather than display an error message. In this way, every URL will appear valid, and the browser will not record this as a cached permanent redirect.
There are many different ways of putting a URL shortener together on AWS. I have used a combination of AWS Lambda, API Gateway, and DynamoDB.
The code for my URL Shortener project can be found on my github page using the following short link: slsdna.com/short-url
The Clean Project Structure
When I set up Lambda projects with Python, I like to have dependencies centrally installed so that VSCode can pick them up from the local Python virtual environment and do all the right things with code hinting. One of the expectations of the SAM Cli is that each folder containing Lambda code requires a requirements.txt file for any code dependencies.
This “requirement” (pardon the pun) is because the CodeURI folder is copied into the .aws-sam/build/[service-name] folder as part of the build process, and pip is used to install all the dependencies locally into this folder. Hence, the Lambda package is only as big as it needs to be based on the required dependencies for each service. Many people keep things simpler and use a central requirements.txt folder and apply the same dependencies to ALL of their Lambda code - this is not necessarily a great idea since the size of the ZIP file does have an impact on Lambda cold start performance. For a lot of use cases, this is likely an extremely small difference - but it does depend on what dependencies are being used. They can get out of control quickly with many developers working on a project!
My project structure uses Poetry for Python dependency management. I like poetry. It wraps up all your Python dependencies locally within the project folder when you turn on virtualenvs.in-project configuration setting, which I strongly recommend turning on - it makes your dev environment very easy to understand as everything is co-located in your project folder. Apart from poetry, there are also several opinionated dev tools set up for Python to assist with applying best practices to your code:
- Flake8 for code linting. It will raise errors when linting fails to force code to be fixed.
- isort for sorting import statements alphabetically and grouping them into sections by type.
- black - the Uncompromising Code Formatter, which will take over your code formatting and allow you to focus on the real problem at hand
- pre-commit is installed and configured to run formatting and linting tools as a pre-commit action to ensure all code is formatted and styled. Any modified files will also cause commits to be aborted since all changes should be checked in!
These small, development-quality additions will ensure you can focus more time and energy on writing code and not formatting and styling your code.
Now, the magic of centrally managed dependencies, without making all your Lambda ZIP packages bloated with unused dependencies. My project structure has all your Lambda code contained in sub-folders of the services directory. Each service folder is matched to a poetry dependency group by name, so if you have a service named “user-api” then dependencies will be installed using poetry with a group or “user-api” using the following command:
In this way, all the specific dependencies for each service are separately grouped, which makes setup later for sam build simpler whilst still allowing ALL the dependencies to be installed into your VSCode virtual environment so code hinting and all the IDE goodness is there to be used. The secret to enabling separate requirements.txt files is using poetry’s export mechanism. There is a scripts folder containing a make-deps.sh script, which iterates over the sub-folders in services and generates specific requirements.txt files in each using:
This is the key for the opinionated SAM CLI project folder setup I use, and it is available online as a template on my GitHub page. The local .gitignore file has been configured to ignore "requirements.txt" files, so you won't be checking these into source control.
Using Powertools for AWS Lambda (Python) to create a REST API
The architecture of my URL shortener is simple. The following diagram shows the architecture, which is made up of an HTTP API (API Gateway Rest V2) with a custom domain, a Python Lambda Function and a DynamoDB table.
My implementation covers the GET requests for retrieving a shortened URL and does not expose a POST method for saving a new URL or a URL to be updated. Adding a new URL is done through the AWS DynamoDB Console to add the new record - the process is simple to manage at a small scale, and I have not bothered to take this further.
I like Powertools for AWS Lambda (Python) and, in particular, love the APIGateway[InsertType]Resolver classes it provides. In this project, I have used the ApiGatewayHttpResolver, which understands the Lambda Proxy message structure the API Gateway will pass to it via the API Gateway Rest V2 resource setup in my SAM template. The following code example is the minimum requirement for creating a simple API:
The Powertools for AWS Lambda (Python) API Resolver classes look and feel like creating an API with Flask, a very common library for APIs in the Python world. This look and feel makes for an easy transition into the AWS Lambda world for programmers with this experience. The use of the central resolver, which allows multiple routes to be defined in the one Lambda function, can feel like it grates against the one API, one Lambda setup most practitioners write about in terms of Lambda best practise, however, I feel we are starting to accept the view that small monolithic APIs are okay if the methods are related and share code. This model can also assist with faster migration to Lamba with a monolithic API as a starting point for gaining some cost advantages ahead of migration to more granular Lambda functions as time passes. As you can see from the example, getting up and running is fairly straightforward. If you want to migrate your Lambda to a different API type, like a Function URL or API Gateway Rest V1, you can achieve this by swapping out to a different resolver class with no other code changes! That’s pretty magical and enables faster pivots when you need to make changes to your service infrastructure due to hitting a pain point or limit of a particular service. Check out the full documentation for REST API - Powertools for AWS Lambda (Python) for all the features of each API resolver class available today.
Reusing Your SAM Template to Deploy Multiple Services
I need to run multiple URL shorteners for multiple domains - one I use for personal URL shortening (walmsl.es) and one I am setting up to support short URLs for Serverless DNA (slsdna.com). Before re-using your SAM template to deploy many versions of the same stack, follow the best practice of not defining Names for your resources and instead leaving name generation to CloudFormation. CloudFormation will generate a unique name for you, ensuring stack deployment will be successful no matter how many times you deploy it. The name generated will combine the stack name, Logical resource Id and a short random string. This is a key best practice.
The SAM project automates 99.99999999995% of the deployment - the only manual step is registering a certificate for the API gateway custom domain in the AWS Certificate Manager. I added several CloudFormation parameters to my sam template file to set up these external dependencies.
- CertificateARN - to define the actual certificate to register with the custom domain
- DomainName - defines the actual domain name to use for the API custom domain
- HostedZoneId - the defined Route 53 Id for the DNS Hosted Zone file.
When you first deploy a SAM Cli project, you run
sam deploy —-guided, which runs the deploy wizard and enables you to save a new config environment in your samconfig.toml file enabling installation of another stack instance of your Sam project. Once the new config has been saved, you can choose which configuration to deploy using the
--config-env command line switch.
These simple techniques enabled me to deploy the same project into a single account, enabling me to have multiple URL shorteners deployed across multiple domains.
Key Takeaways from Building this URL Shortener
- Don't bother trying to name things and creating a naming convention or system - leave that to the cloud tooling and ensure you never have deployment problems.
- Consider external dependencies and create template parameters to enable these to have configurable values, such as domain name, certificate ARN, and redirect URL.
- When setting up a custom domain for HTTP API, the certificate domain must match the custom domain name. For example, with a domain of "dev.slsdna.com", you need a certificate to match "dev.slsdna.com" exactly. You cannot partner a wildcard certificate of *.slsdns.com to match any domain unless you set up a wildcard API endpoint (see next point).
- You can use Wildcard endpoints with HTTP Api custom domains. This is something new I learned doing this! For wildcard API domains, you must define the custom domain as a wildcard *.slsdna.com and ensure it matches the certificate with a domain covering the wildcard exactly. Once you have this setup, you can access your HTTP API using any hostname for the wildcard domain.
- Turning on Delete Protection for data resources - I don't do this often but will certainly start. Whilst playing with deploying the second stack, I was deleting a failed deployment and ended up deleting my actual live stack. No real damage was done - it wasn't used a lot - but we should all be doing this to protect precious production data.
- I had trouble with the Route 53 configuration for the domain during deployment. The configuration documentation indicated I could use either the HostedZoneId or the HostedZoneName.
What I didn't realise in point six (6) was that when you define the HostedZoneName you need to ensure it is a fully qualified DNS HostZone with a trailing ".", e.g. "dev.slsdna.com." not "dev.slsdna.com". The actual error message provided is not clear that this is the failure - the actual resource creation returns an error of "No hosted zones named dev.slsdna.com found". This is very confusing, given when I go to the Route 53 console and enter the actual Hosted Zone console, the Hosted Zone Details display the Hosted Zone name as "dev.slsdna.com" (not "dev.slsdna.com.").
I get that this is a DNS Zone file reference to a fully qualified domain name. However, it wasn't until I was writing this up and I started reading the documentation clearly that the trailing "." was required.
I hope you enjoyed reading this article and found helpful tips on building with SAM Cli, AWS Powertools for Lambda (Python) and AWS Serverless services! We covered some key points:
- Serverless Best practices for deploying managed services
- How to automate everything for this project (sans certificate registration)
- How to build an API quickly and easily with AWS Powertools for Lambda (Python).
- Things i learned while building and automating this project
- Reference to an opinionated SAM template for Python projects with some key quality-of-life additions.