Zoox is looking for a software engineer to work on our custom High-Performance Computing infrastructure and its supporting ecosystem of tools and services. This infrastructure is central to machine learning workflows across all Zoox software divisions, from data engineering to computer vision perception to simulation and more. You will take on a breadth of end-to-end responsibilities including distributed system design, algorithmic job scheduling, and adaptive cloud scaling in support of all of Zoox’s computational needs.
In this role, you will:
Design and implement improvements to Zoox’s in-house, cutting-edge HPC infrastructureDesign systems that optimize various storage technologies in the cloud and our own datacenter(s) for performance, reliability, and efficiency that power our diverse machine learning workloadsInvestigate new distributed system paradigms and technologies to meet Zoox’s ever growing computational and storage needsCreate production-grade web service APIs, SDKs, and other tools to provide a world-class developer experience for all of Zoox’s software teams
Qualifications:
Experience with high-performance object storage and filesystemsExperience with distributed systemsProficiency with Python, Java, or other managed languagesBachelor's degree in computer science (or related field)Experience with cloud computing platforms such as AWS, GCP, or Azure
Bonus Qualification:
Deep experience with AWS FSx for Lustre, open-source Lustre filesystem, or another ML-optimized filesystemExperience with workload management / job scheduling systems such as SLURMKnowledge of machine learning / artificial intelligence systems