This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior SRE DevOps Engineer in Romania.
This is a high-impact role at the intersection of software engineering and cloud operations, focused on building and maintaining resilient, large-scale infrastructure for real-time communication systems. You will design, automate, and optimize cloud-native environments that support mission-critical connectivity under strict latency and reliability constraints. The position combines hands-on coding with deep operational ownership, empowering you to shape infrastructure strategy while improving developer productivity. Working in a remote-first, highly technical environment, you’ll collaborate across engineering teams to ensure scalability, security, and performance. If you thrive on solving distributed systems challenges and building production-grade reliability tooling, this role offers both ownership and influence.
Accountabilities:
Designing and implementing SLI/SLO frameworks with error budgets to guide reliability and performance decisions.Building and maintaining AWS-based production infrastructure using Infrastructure as Code (Terraform, CloudFormation), including ECS, EKS/Kubernetes, and microservices orchestration.Developing internal tools, automation frameworks, and reliability services in TypeScript, Python, or similar languages to enhance operational efficiency.Leading incident response processes, conducting root cause analyses, and creating automated runbooks to reduce MTTR.Architecting and maintaining CI/CD pipelines for backend services, mobile applications, and IoT firmware across cloud and on-prem environments.Implementing comprehensive observability using OpenTelemetry, distributed tracing, metrics exporters, and alerting systems.Managing data services such as PostgreSQL (RDS), Redis/ElastiCache, SQS, and networking components (ALB/NLB, VPC, IAM).Enforcing strong security standards, including IAM policies, encryption, secrets management, vulnerability management, and compliance auditing.
Requirements:
The ideal candidate is both a strong software engineer and an experienced platform reliability expert. Key qualifications include:
7+ years of experience in SRE, DevOps, or Platform Engineering roles with daily hands-on coding responsibilities.Proficiency in at least one backend language (TypeScript/Node.js, Python, or Go) for developing automation tools, internal services, and reliability frameworks.Deep expertise in AWS services (ECS, EKS, RDS, ElastiCache, SQS, VPC, IAM, CloudWatch).Strong experience with Infrastructure as Code tools (Terraform, CloudFormation, or Pulumi), including modular design and state management.Proven experience designing and maintaining CI/CD pipelines in both cloud and on-prem environments.Solid understanding of container orchestration (Docker, Kubernetes, Helm) and distributed systems patterns such as circuit breakers, retries, and graceful degradation.Experience operating production databases (PostgreSQL, Redis) and message queues.Strong security knowledge covering network segmentation, encryption, secrets management, and incident response.Preferred experience with real-time communication infrastructure (SIP, RTP, WebRTC), telecom systems, IoT pipelines, or satellite/low-bandwidth optimization environments.
Benefits:
Competitive compensation packageFlexible remote work environment with autonomy and ownershipOpportunity to build and scale critical communication infrastructureExposure to cutting-edge technologies across cloud, IoT, telecom, and distributed systemsHigh-impact role with direct influence on reliability and platform architectureCollaborative, technically advanced engineering culture