One of the AWS Leadership Principles is to “Learn and be Curious” - this encapsulates the Observability Mindset perfectly!
Observability is a deep and wide field with many tricks, tips and tools to try and achieve the nirvana of the “Perfect Observable System”. Observability is not new, it is not limited to Serverless - so why all the fuss on Observability and the proliferation of tools in the market to achieve perfect Serverless Observability?
The answer is simple - the industry was sold Serverless technology to achieve Business Value faster. So the whole world went crazy and started building with AWS Lambda technology to do just this. We built fast, lean, and stopped thinking about all the core principles of good development - after all, it was just a cloud function, it isn’t complex, and the code is simple - how hard can it be?
We forgot many things in getting to business value faster - I have written about this in an earlier blog post - What I forgot at the beginning. This article is focused on Serverless Testing, but one of my main observations across the Serverless community at this time was the lack of focus on Software Architecture. This is the main underlying theme of my whole testing series. In a rush to Serverless and faster business value, we have also forgotten about good logging and a level-headed approach to thinking about how we understand what our distributed Serverless system is doing.
We have forgotten about Observability thinking!
What is the Observability Mindset?
Observability is the ability to infer knowledge about the internal states of a system based on its external outputs. The Observability mindset is about thinking up-front about how you will ask questions about your system to work out what it is doing and whether it is behaving normally. This means we need to think about how we are going to emit data and telemetry about our system functions so that people responsible for operating the system can have confidence it is working correctly. Observability is a system design and development task that needs to be thought about up-front before we start building - because it is always harder to go back and put it in when you have a Serverless project being built quickly, in pieces by a large team.
The mindset is also wholistic and not focused on just technical telemetry data - actual business KPIs must be designed and built-in so you can understand how customers use your system and whether you are building the right thing for your customers. So its not just about - do we have bugs or errors but:
- Are your customers using the new feature
- Are our core processes running slower / faster after an update
- How long do customers spend using a new feature - lots, not a lot - what did you expect and want them to do?
Why is Observability Important
Now that I read that heading back, a part of me is sad inside. I had hoped that it was obvious why Observability is important, but unfortunately, it’s not that obvious after all.
Observability is important so we are able to:
- understand how people use our software
- understand whether our SaaS system is correctly priced so we are profitable
- Are our customers using our product the way we thought they would
- being able to reason about why an error occurred
These are some of the main reasons I came up with why Observability is important. The last one is also an important reason for good logging, metrics and tracing. With a Serverless system running at scale, by the time a monitoring alert goes off or an error is noticed - the error is likely minutes old and drowned in a sea of transactions. So you must be able to interrogate your logs to work out what happened to a transaction, where it was processed and why it failed - all of this is impossible without good, structured logging with key fields to identify events passing through your system. This is where frameworks like the AWS Lambda Powertools become important - this framework provides a structured logging utility covering multiple languages - Java, Python, Typescript and dotNet.
The logging utility provides configuration to extract key fields from your event data to inject into every log message, which enables you to interrogate your logs using Cloudwatch Insight queries.
Next time you are starting a new project or picking up an old one - ask yourself these 3 simple questions:
- Do I understand how my users use my system?
- When something goes wrong (at scale), will I be able to find enough data to identify exactly what went wrong?
- Do I know I am charging my customers correctly and can calculate my margin per customer?
If you can answer these three simple questions, you are on your way to having an Observability Mindset. It doesn’t stop there - Observability changes and evolves the same way your software does - so don’t stand still, keep evolving, querying, observing and understanding.