This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior SRE DevOps Engineer. In this role, you will have the opportunity to contribute to the development of a satellite communication platform that enables vital voice calls and messaging when traditional connectivity fails. Your expertise will play a crucial part in ensuring reliability and enhancing operational efficiency across our cloud infrastructure. You'll collaborate with a dedicated team to implement innovative solutions that address the complexities of real-time applications and bridging mobile devices with satellite hardware. This is a remote position that offers flexibility and a chance to make a significant impact in the field of cloud and DevOps engineering.
Accountabilities
Implement SLI/SLO frameworks with error budgets to drive reliability decisionsDesign release strategies including blue/green deployments and version trackingLead incident response and develop automated runbooks to reduce MTTRDevelop tooling and automation frameworks in TypeScript/Python for enhanced productivityWrite services focused on reliability, such as health checkers and auto-remediation controllersMaintain production AWS infrastructure using IaC with a focus on microservices orchestrationEstablish CI/CD pipelines for backend services and mobile appsDefine and enforce security policies across the infrastructureBuild observability features with OpenTelemetry and distributed tracingManage database configurations including PostgreSQL and Redis
Requirements
7+ years of experience in SRE/DevOps/Platform Engineering with a strong software backgroundProficient in at least one backend language (TypeScript/Node.js, Python, or Go)Deep expertise in AWS technologies including ECS, EKS, and RDSStrong experience with IaC tools like Terraform or CloudFormationProven track record in CI/CD pipeline design for both on-prem and cloud environmentsExperience in container orchestration with Docker and KubernetesSolid understanding of network security and incident responseExperience implementing SLI/SLO frameworks and reduction strategiesOperations knowledge for PostgreSQL, Redis, and message queuesStrong understanding of distributed systems patterns
Benefits
Build critical communication infrastructure for remote areasA role merging engineering and operations with significant ownershipTechnically challenging environment across cloud, IoT, and satellite systemsFull ownership of infrastructure with direct impact on reliabilityCompetitive compensation and flexible remote work options