Coupang

Sr. Site Reliability Engineer - 10823

Coupang • IN
JavaPython Hybrid
Coupa makes margins multiply through its community-generated AI and industry-leading total spend management platform for businesses large and small. Coupa AI is informed by trillions of dollars of direct and indirect spend data across a global network of 10M+ buyers and suppliers. We empower you with the ability to predict, prescribe, and automate smarter, more profitable business decisions to improve operating margins.

Why join Coupa?

🔹 Pioneering Technology: At Coupa, we're at the forefront of innovation, leveraging the latest technology to empower our customers with greater efficiency and visibility in their spend.
🔹 Collaborative Culture: We value collaboration and teamwork, and our culture is driven by transparency, openness, and a shared commitment to excellence.
🔹 Global Impact: Join a company where your work has a global, measurable impact on our clients, the business, and each other. 

Learn more on Life at Coupa blog and hear from our employees about their experiences working at Coupa. 

What You'll Do:

  • Responsible for building and provisioning enterprise-grade data, messaging, and analytics platforms in the public cloud
  • Ensure that data, services, and infrastructures are reliable, fault-tolerant, efficiently scalable, and cost-effective
  • Administration of Linux machines, web servers, application servers, databases, and infrastructure support for products and businesses
  • Own end-to-end availability and performance of mission-critical services and build automation to prevent problem recurrence
  • Develop tools and automation using Ruby, python, etc., to increase availability and performance
  • Collaborate with Product and Release Engineering for new product releases and maintenance
  • Coordinate change management
  • Participate in incident response and blameless post mortems
  • Participate in 24×7 on-call rotation for after-hours emergencies
  • What You Will Bring to Coupa:

  • Bachelor’s degree and 7+ years of professional experience
  • 3+ years of production support for Elasticsearch/Redis/Kafka (Elasticsearch experience is a must)
  • 3+ years of production system administration and web operations experience
  • 2+ years of programming experience in Ruby, Java, Perl, Python, or equivalent
  • 2+ years of experience with configuration management tools such as Chef, Puppet, Salt, or equivalent
  • Experience with AWS or a comparable cloud provider
  • Experience with Infrastructure-as-Code products like Terraform
  • Experience in massive-scale web operations
  • Expertise in problem-solving and analyzing globally distributed systems
  • Excellent written and verbal communication skills