Jobgether

Site Reliability Engineer

Jobgether • US
Python Remote

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Site Reliability Engineer in the United States.

In this role, you will play a critical part in ensuring the reliability, scalability, and performance of modern, user-facing systems. You’ll work at the intersection of software engineering and operations, building robust infrastructure and driving automation to support high-quality service delivery. The position offers the opportunity to design resilient systems, improve operational efficiency, and proactively address risks before they impact users. You will collaborate closely with cross-functional teams to enhance system design and implement best practices in observability and incident response. This environment values continuous improvement, innovation, and data-driven decision-making. It’s an ideal role for someone who thrives in fast-paced environments and is passionate about building reliable, scalable platforms.

Accountabilities:

  • Ensure high availability, reliability, and scalability of production systems and services
  • Develop and maintain automation tools for deployments, configuration management, and operational workflows
  • Implement and manage monitoring and alerting systems to provide real-time visibility into system health
  • Respond to, troubleshoot, and resolve incidents while conducting post-mortems to prevent recurrence
  • Define and monitor Service Level Objectives (SLOs) and performance indicators
  • Perform capacity planning and resource forecasting to support system growth
  • Collaborate with engineering teams to identify operational risks and improve system architecture
  • Analyze system and application metrics to drive performance optimization initiatives
  • Requirements:

    • Minimum of 5 years of experience in IT, software engineering, or technology operations roles
    • At least 2 years of hands-on experience in Site Reliability Engineering, DevOps, or observability-focused roles
    • Strong expertise in cloud platforms such as AWS or Azure
    • Solid understanding of distributed systems, networking, storage, and operating systems
    • Experience with infrastructure as code tools (e.g., Terraform) and containerization technologies (e.g., Docker)
    • Proficiency with monitoring and observability tools such as DataDog, Prometheus, Grafana, or similar
    • Programming or scripting skills in languages such as Python, Ruby, or JavaScript
    • Strong problem-solving skills and the ability to work collaboratively across teams
    • Excellent communication skills with a proactive and detail-oriented mindset
    • Benefits:

      • Competitive salary with performance-based bonus opportunities
      • Comprehensive medical, dental, and vision insurance
      • Generous paid time off and company holidays
      • 401(k) plan with employer matching contributions
      • Paid parental leave and family support programs
      • Flexible and collaborative work environment
      • Opportunities for professional growth and skill development