ShopBack

Senior Site Reliability Engineer

ShopBack • CN
GoPython
Our Journey
ShopBack began in 2014 as a late-night spark of inspiration between Henry and Joel — not just to build a Cashback platform, but to reimagine how brands and consumers connect. As former advertisers, they understood the limitations of traditional marketing, and saw an opportunity to deliver more value on both sides. That idea quickly turned into action, and the first prototype was built over a weekend with the other co-founders. Today, ShopBack serves over 50 million users across 13 markets, partners with 20,000+ merchants, and powers over half a million transactions daily. We're building The World’s Most Rewarding Way to Shop — and looking for bold, driven individuals to join us.

About the role

At ShopBack, our engineering teams build scalable platforms and utilize leading-edge technologies to build a world-class product. You will join a diverse and talented team of aspiring engineers with great ambitions to impact the eCommerce landscape. We are seeking team members who strive to solve the hard problems, take pride in delivering world-class products, and are strong team players.

You are someone who wants to see the impact of your work making a difference every day. You find passion in the craft and are constantly seeking improvement and better ways to solve tough problems.

Your Adventure Ahead

  • Improve availability, reliability, scalability, and recoverability of the systems with security and cost optimization in mind
  • Develop the best possible continuous delivery pipelines incorporating streamlined change and release management process with 0 downtime deployment process
  • Maintain tools for configuration management, build, continuous integration and deployment, reporting, monitoring etc.
  • Participate in capacity planning and risk management
  • Manage and monitor a multi-datacenter regional environment
  • Collaborate with product engineers to enhance core platforms
  • Collaborate closely with application development to assist in adopting methods to improve scalability and reliability of services
  • Explore and adopt new and creative DevOps approaches to improve production reliability and availability
  • Quarterly on-call duty
  • Drive incident response and postmortems
  • Automate infra provision, upgrades and mitigate issues via AI agentic workflows
  • Essentials to Succeed

  • 7+ years of relevant DevOps or SRE experience
  • Proven experience working on public cloud platforms (AWS, GCP, etc)
  • Proven experience with containerization using Docker and Kubernetes
  • Experience with Infra-As-Code using Terraform is a plus
  • Experience with application development in Python, Javascript or Go is a plus.
  • Leverage AI on a day to day to improve individual and team workflows
  • Technologies We Use & Love

  • Cloud: AWS
  • Infra: Kubernetes
  • Programming languages: NodeJS / Typescript / Python / Go
  • Relational database: Postgres
  • Cache: Redis
  • Message queue: Kafka, SQS
  • Continuous Integration:  Gitlab, Fluxcd
  • Monitoring: DataDog / Prometheus
  • Networking: Istio
  • Big Data: Trino, Spark, S3, etc. 
  • Communication: Slack
  • Project Management: JIRA / Confluence
  • Other technologies: Knative Eventing / Serving, Debezium + Kafka Connect. Opensearch