Lambda migration story

Lambda migration story

I worked recently on migrating lambda functions to a Rails backend.

It seems that some readers are just confused and asking, isn’t the other way around? Is it a typo? 🤔

Well, not in this case. Let me give some context to the story. The Lambda functions are part of a legacy project, some use cases are outdated and are just adding extra complexity to the system.

I considered also refactoring it but due to its high complexity, the team agreed that moving this logic to the backend will be more beneficial.

Let’s start with the system architecture:

Architecture Architecture

We have different components in this diagram:

  • Rails backend: this is the application where I want to move the lambda logic to. It communicates with the other components through events, for this example, I’m interested in a specific event, let’s name it user_event.
  • external service: this is a streaming platform, that we are providing access to our users as a benefit. It provides an API for account creation.
  • kinesis: it’s a medium to transfer data in real-time and it’s used here to transfer the event.
  • user_handler_lambda: this lambda function listens to the user_event and executes a workflow that ends up with submitting an API request to the external service and storing the event data in DynamoDB for reference.
  • validation_lambda: the external service require a payload verification to create a user account on their side, the role of this function is to receive an encoded payload from the external service and return if it’s fine or not. You can imagine this part as a 2FA required from the external service.
  • DynamoDB: it’s used to store the event payload as a reference, after the migration it will be just archived.

That’s an overview of how things work. The goal is to move this use case to the backend and remove the lambda functions.

As the system is based on events, I have to know if there are other services that are listening to the same event. It’s like an investigation part where I had to search for the event name in different repositories and asked different teams about it.

In my case, I was lucky as no other service depends on this event. Otherwise, it will make the migration more complex as I have to keep emitting the event during the migration.

What’s next? It’s time to clear my mind and gather all the patience from within before jumping into the legacy code. As usual, good music and a cup of coffee are the best companions during this journey. :coffee:

I try always to profit from this step by understanding in depth the use cases and the API documentation involved. As known, during refactoring, I strive to keep the exact same behavior but I tried also to keep notes on edge cases and areas that may require some follow up later.

Now, it’s time to step back and consider some cases that could happen during the migration. As the validation endpoint will be migrated to the backend application, I could have the two following cases:

Validation edge case 1 Validation edge case 1

Validation edge case 2 Validation edge case 2

Those two cases have something in common. A token could be encoded and decoded from two different systems. This issue is easy to solve, I opted to make the validation endpoint permissive during the migration window (by returning valid for incoming requests).

So let’s sum the migration plan:

  1. Make the verification endpoint permissive on lambda and backend application sides.
  2. Deploy backend changes and stop emitting the event. At this stage, I’m still having the lambda functions in production to consume any event that is propagated, Kinesis is a real-time data streaming but I prefer to be on the safe side here :)
  3. Monitor the system and the status of the endpoints.
  4. Keep the lambda functions for a couple of weeks in case it’s needed for a rollback.
  5. Cleanup by enabling the validation endpoint and archiving all old resources.

Conclusion

Working a migration task is always challenging and provides a good learning opportunity. I wanted to share some learnings that could provide value for anyone:

  • Estimation: it’s always hard to estimate those kinds of tasks and it’s even more complex if it requires interaction with different teams and providers. So it should be communicated clearly if you anticipate some delay.
  • Preparation: nothing fancy, just preparing a shared online document where you mention some details like what are the reasons for the migration and a system diagram. It helps to share this document when communicating with different teams/members as it gives enough context and of course, you don’t have to explain the same thing over and over :smile:

That’s it for today, happy coding :wave: