Zoox

Senior ML Storage Infrastructure Engineer

Zoox • US
JavaPython Hybrid
Zoox is looking for a software engineer to work on our custom High-Performance Computing infrastructure and its supporting ecosystem of tools and services. This infrastructure is central to machine learning workflows across all Zoox software divisions, from data engineering to computer vision perception to simulation and more. You will take on a breadth of end-to-end responsibilities including distributed system design, algorithmic job scheduling, and adaptive cloud scaling in support of all of Zoox’s computational needs.

In this role, you will:

  • Design and implement improvements to Zoox’s in-house, cutting-edge HPC infrastructure
  • Design systems that optimize various storage technologies in the cloud and our own datacenter(s) for performance, reliability, and efficiency that power our diverse machine learning workloads
  • Investigate new distributed system paradigms and technologies to meet Zoox’s ever growing computational and storage needs
  • Create production-grade web service APIs, SDKs, and other tools to provide a world-class developer experience for all of Zoox’s software teams
  • Qualifications:

  • Experience with high-performance object storage and filesystems
  • Experience with distributed systems
  • Proficiency with Python, Java, or other managed languages
  • Bachelor's degree in computer science (or related field)
  • Experience with cloud computing platforms such as AWS, GCP, or Azure
  • Bonus Qualification:

  • Deep experience with AWS FSx for Lustre, open-source Lustre filesystem, or another ML-optimized filesystem
  • Experience with workload management / job scheduling systems such as SLURM
  • Knowledge of machine learning / artificial intelligence systems