Zoox

Engineering Manager, HPC Storage

Zoox • US
JavaPython Hybrid
Zoox is looking for an experienced Software Engineering Manager to lead our High Performance Computing Storage infrastructure team. Zoox HPC Storage provides abstraction layers for petabyte-scale data movement and management for critical, high-throughput use cases, such as ML foundation model training, synthetic data generation, and more. You will take on a breadth of end-to-end responsibilities, including distributed system design, optimization of storage-related GPU utilization bottlenecks, and cost-effective resource management.

The position comes with a high degree of independence and the opportunity to help define Zoox’s scaling strategy, both technically and organizationally. You will be responsible for hiring and maintaining the health of your team, as well as growing and coaching them to support the continued success of their careers. 

In this role, you will:

  • Work closely with AI teams and other software customers to holistically address pain points, find optimization opportunities, and ultimately charter systems-solutions for broad categories of storage use cases
  • Develop a multi-year vision and roadmap for storage at Zoox, including investment into new data movement and management paradigms to meet Zoox’s ever growing computational and storage needs in a cost-effective manner
  • Own the hiring process end-to-end, from thoughtful role definition to interview loop design to successfully hiring bar raisers
  • Mentor, coach, and advocate for your direct reports
  • Qualifications:

  • Experience managing teams of 5-10
  • Demonstrated ability to prioritize development work and build cross-functional consensus across ML stakeholders
  • Experience with high performance storage systems deployed on cloud providers, such as FSx for Lustre on Amazon Web Services (AWS)
  • Strong operational background with highly available systems
  • Bachelor's degree in computer science (or related field)
  • Bonus Qualifications:

  • Experience with ML-specific data formats such as Mosaic Streaming Datasets (MDS)
  • Experience with end-to-end hosted ML services such as AWS SageMaker HyperPod
  • Proficiency with Python, Java, or other managed languages