Jobgether

DevOps/Observability Engineer

Jobgether • US
Remote

This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a DevOps/Observability Engineer based in United States.

This role sits at the core of modern cloud infrastructure reliability, focused on building and scaling a next-generation observability platform for complex, distributed systems. You will design and implement end-to-end monitoring, logging, and telemetry pipelines that provide deep visibility across large-scale cloud environments. The position requires strong expertise in cloud-native architectures, with a focus on AWS, Kubernetes, and open-source observability tooling. You will play a key role in unifying metrics, logs, and traces using technologies such as OpenTelemetry, Prometheus, Grafana, and Splunk. Operating in a fast-paced, engineering-driven environment, you will collaborate closely with platform and DevOps teams to improve system reliability, performance, and cost efficiency. This is a highly technical, hands-on role where your work directly strengthens the stability and scalability of mission-critical systems.

Accountabilities:

  • Design and implement end-to-end observability architectures using OpenTelemetry, Prometheus, Grafana, and related tools across cloud environments.
  • Build and maintain centralized observability pipelines across multi-account AWS environments, including CloudWatch, CloudTrail, and VPC Flow Logs.
  • Develop scalable log aggregation and routing strategies, including filtering, noise reduction, and integration with systems such as Splunk HEC.
  • Create advanced alerting frameworks and high-quality dashboards using Alertmanager, CloudWatch Alarms, and Grafana with PromQL.
  • Deploy and manage observability infrastructure using Infrastructure as Code tools such as Terraform.
  • Support Kubernetes and container-based observability across EKS and ECS environments.
  • Optimize observability systems for performance, cost efficiency, and scalability in large-scale production environments.
  • Collaborate with engineering teams to improve system reliability, monitoring standards, and incident response capabilities.
  • Requirements:

    • 8+ years of experience in DevOps, Site Reliability Engineering, or Observability Engineering roles.
    • Strong hands-on experience designing unified observability pipelines using OpenTelemetry, Prometheus, and Grafana.
    • Deep expertise in AWS observability services including CloudWatch, CloudTrail, and cross-account telemetry strategies.
    • Proven ability to build and manage large-scale log aggregation systems and optimize high-volume data pipelines.
    • Strong experience with Kubernetes (EKS) or containerized environments (ECS) in production settings.
    • Advanced proficiency with Terraform or other Infrastructure as Code tools for infrastructure and observability deployments.
    • Experience building alerting systems, dashboards, and monitoring frameworks for distributed systems.
    • Strong understanding of cost optimization strategies for observability platforms (log filtering, metric reduction, storage tiering).
    • Excellent problem-solving, debugging, and collaboration skills in complex cloud-native environments.
    • Benefits:

      • Competitive compensation aligned with experience and market benchmarks.
      • Remote work flexibility within United States.
      • Opportunity to work on large-scale, AI-driven, cloud-native infrastructure systems.
      • Exposure to enterprise clients and high-impact digital transformation projects.
      • Hands-on experience with leading observability and cloud technologies in production environments.
      • Strong learning and upskilling culture in AI, cloud, and platform engineering.
      • Collaborative, high-performance engineering environment focused on innovation and reliability.
      • Opportunity to shape next-generation observability practices at scale.