Senior Platform Engineer
MoonPay • GBWhat you’ll do
In the short term we need to increase the resiliency and reliability of our current PaaS solution with things such as:
Improving the maintainability of our infrastructure as code
Building dashboards, monitoring & alerting mechanisms with Datadog
Load testing and performance tuning our production services
Lifecycling and maintenance of our Kubernetes clusters
In the medium to long term you’ll get to:
Implement new and shiny technologies on top of Kubernetes as you see fit to ensure our tech can scale with the business.
Develop and integrate solutions with a bias for automation in order to improve and maintain reliability across the production estate and make recovery easier.
Design and track metrics for site uptime and performance ensuring high levels of visibility are maintained.
Own the deployment pipelines and continuously improve our monitoring and alerting capabilities.
Collaborate closely with all other engineering functions to provide timely feedback from our environments.
Support Engineering on their journey to deliver better software, faster and more safely (think “It’s OK to deploy on Fridays” 😎).
About you
-
You have strong systems administration skills, know the difference between a container and a virtual machine, and know your way around a Linux terminal
-
You have platform engineering/SRE experience at leading startups or fast growing tech companies
-
You have either had experience with some of our tech stack or are confident you can cross train and up skill quickly
-
You have experience working in a regulated industry
-
You are confident working with and guiding developers on monitoring and logging of complex systems at scale
-
You have worked on complex projects
-
You reflexively reach for AI agents to assist in researching and solving your problems
-
You can work collaboratively with different teams i.e. Security, Data, Engineering
-
You want to forge and own MoonPays reliability & recovery processes
-
You’ve got at least a basic understanding of complex reliability structures, theories, principles, and best practices
-
You have worked with JavaScript codebases and frameworks e.g Typescript, Node.JS and React
Current Tech Stack
-
Typescript as our programming language of choice
-
Node.js as our backend platform
-
TypeORM, TypeDI, TypeGraphQL and routing-controllers as our backend libraries
-
Google Cloud Platform to host our services
-
Postgres as our core database
-
Redis for caching
-
Bull to manage background tasks
-
DataDog for logging and monitoring
-
ArgoCD for continuous deployment on Kubernetes
-
GitHub to manage our source code
-
Jest to run our tests ✅