Veepee

Lead SRE

Veepee • FR
JavaPython Hybrid
Pioneer of online flash sales since 2001 and key player in European e-commerce, Veepee collaborates with over 7,000 brands to offer highly discounted products available for a limited time. Operating across various sectors, including fashion, home, wine, travel or beauty... Veepee achieved a turnover of 3.3 billion euros incl. VAT in 2024 and employs 5,000 staff members across 10 countries.


📄 JOB DESCRIPTION

Today we're looking for a site reliability engineer - full-time - to join our data department, and more specifically the data platform team. The candidate should expect to work in a distributed environment, with team members in France, main office in Paris, and in Belgium.

Veepee’s data organization came into existence in 2018 and consists of a strong team of 50 data professionals, spread across different data domains (engineering, analytics, data science & ML and governance).
You will be part of a multidisciplinary, multinational team that fosters collaboration, transparency, and respect.

Within the data platform team, you’ll drive the reliability and scalability of Veepee’s next-generation data platform—powering data ingestion, analytics, and ML workloads across multiple European datacenters.
You’ll act as the SRE reference for the Data Platform, helping define operational excellence standards and mentoring engineers across teams. 

🎯 TASKS

  • Infrastructure & Reliability
  • Maintain and monitor Kubernetes microservices.
  • Define observability standards (logging, metrics, alerting) using Grafana, Prometheus, etc.
  • Manage GCP services (BigQuery, Cloud Storage, Cloud SQL…) with Terraform and Atlantis.
  • Enhance GitOps deployments (Helm, ArgoCD).
  • Incident management & on-call rotationPerformance and cost optimization.
  • Security and compliance alignment (especially with multi-region GCP/on-prem setup).

  • Collaboration & Enablement
  • Partner with data engineers/scientists to build resilient ingestion pipelines.
  • Support data scientists to deploy and monitor ML workflows.
  • Promote SRE best practices (SLOs, DRP, postmortems, capacity planning).
  • 👉 Required skills & experience

  • Leadership & Collaboration
  • Proven ability to lead technical discussions and influence reliability culture across multiple teams.
  • Strong sense of ownership and accountability, with a collaborative mindset.
  • Excellent communication skills; able to explain complex topics clearly to both technical and non-technical audiences.
  • Fluent in English (spoken and written).

  • Experience
  • 5+ years of experience as an SRE, DevOps, or Platform Engineer in production environments.
  • Demonstrated experience deploying and operating applications on Kubernetes (Helm, GitOps, CI/CD).
  • Solid understanding of public cloud (preferably Google Cloud Platform) and private cloud ecosystems.
  • Hands-on experience implementing Infrastructure as Code with Terraform and GitLab pipelines.
  • Proven ability to build and maintain observability stacks (Grafana, Prometheus, Stackdriver, or equivalent).
  • Familiarity with GitOps workflows and modern deployment practices (e.g., ArgoCD).

  • Mindset
  • You thrive in helping others and enabling teams to be more autonomous.
  • You’re pragmatic, solutions-oriented, and willing to go the extra mile to keep systems reliable.
  • You enjoy working in an environment that values responsibility, trust, and continuous improvement.

  • 👉 NICE TO HAVE skills

  • Hands-on experience with Trino, Airflow, or data-intensive workloads.
  • Knowledge of Iceberg, ClickHouse, or data lake architectures.
  • Experience automating infrastructure using Python/Go.
  • Experience with GitOps approach using ArgoCDSome programming skills in Python (for automation/enablement tools) and Java (to understand context)
  • Experience with Machine Learning applicationsExperience with ELT solutions
  • ✅ BENEFITS

  • Variable bonus
  • Dynamic and creative environment within international teams
  • The variety of self-education courses on our e-learning platform
  • The participation in meetups and conferences locally and internationally
  • Flexible Office with up to 2 days at home
  • Flexible retribution package (including Medical Insurance)