Lead Site Reliability Engineer
Turion Space • Irvine, California, United StatesAt Turion Space, our Platform Engineering team is building the infrastructure backbone that powers the next generation of space exploration. As Lead Site Reliability Engineer, you'll build the operational foundations for our mission-critical systems by establish SRE practices, defining reliability standards, and creating the monitoring, incident response, and automation capabilities that keep production systems running when it matters most. You'll ensure our spacecraft control systems, autonomous satellite operations, and mission-critical applications maintain world-class reliability.
Our team’s mission is to enable Turion engineers to efficiently and reliably deliver products at scale, and support missions that can’t afford downtime when hardware is operating hundreds of miles above Earth. You’ll help create and scale an infrastructure platform that is as reliable and cutting-edge as the missions it supports.
Key Responsibilities:
Design and implement monitoring, alerting, and observability solutions across cloud and on-premises infrastructure
Define and maintain SLAs, SLIs, and SLOs for critical systems
Lead incident response, conduct postmortems, and drive systemic improvements to prevent recurrence
Own on-call rotation for production systems
Identify and eliminate repetitive manual operational tasks through automation and self-healing systems
Partner with development teams to embed reliability practices into the software development cycle and establish reliability standards
Contribute to architecture reviews with focus on scalability, fault tolerance, disaster recovery, and security requirements
Minimum Qualifications:
5+ years of working experience in DevOps or SRE type roles and 1+ years in a technical leadership role
Self-directed work style with ability to own projects from conception to production in fast-moving environments
Proficient in utilizing AWS cloud services
Deep understanding of network concepts
Development experience in at least one programming language (e.g. Python, Go, TypeScript)
Experience with Linux system administration
Experience with observability tools (Grafana, Prometheus, Loki, Alloy, ELK) in production environments
Strong experience with Kubernetes, Docker, and container orchestration in production environments
Hands-on experience with CI/CD tools and infrastructure as code (Terraform or Crossplane preferred)
Hands-on experience with DR planning, failure mode analysis, and building resilient systems with automated failover and recovery
Familiarity with HashiCorp Vault or similar identity/secrets management systems
Previous experience scaling infrastructure at high-growth companies (startup to 100+ employees)
Preferred Qualifications:
Relevant certifications such as AWS Certified Solutions Architect
Active SECRET or TOP SECRET clearance that can be maintained
Lead Site Reliability: $155,000-$231,000
ITAR Requirements:
This position may include access to technology and/or software source code that is subject to U.S. export controls. To conform to U.S. Government export regulations, applicant must be a (i) U.S. citizen or national, (ii) U.S. lawful, permanent resident (aka green card holder), (iii) Refugee under 8 U.S.C. § 1157, or (iv) Asylee under 8 U.S.C. § 1158, or be eligible to obtain the required authorizations from the U.S. Department of State.
Benefits:
We offer a comprehensive compensation and benefits package designed to support the well-being and professional growth of our employees. In addition to a competitive base salary and company stock, determined by factors such as job-related knowledge, education, skills, experience, and market demand, full-time employees are eligible for:
Equity: Receive equity in Turion Space, letting you benefit from the company's success
Health Insurance: Comprehensive medical, dental, and vision coverage for employees and their dependents.
Retirement Plans: Access to a 401(k) plan to help you plan for your future.
Paid Time Off: Generous vacation days, personal days, sick days, and holidays to ensure you have time to recharge.
Professional Development: Opportunities for ongoing training, workshops, and courses to advance your skills and career growth.
Team Building Activities: Regular social events, team outings, and company-sponsored activities to foster a positive work environment.
We are dedicated to providing a supportive and enriching environment for our team members, recognizing that our collective success is built upon the well-being and satisfaction of each individual.
Turion Space is an Equal Opportunity Employer; employment with Turion Space is governed on the basis of merit, competence and qualifications and will not be influenced in any manner by race, color, religion, gender, national origin/ethnicity, veteran status, disability status, age, sexual orientation, gender identity, marital status, mental or physical disability or any other legally protected status.