ML Infrastructure Engineer

Sygaldry Technologies • San Francisco, California, United States

Python Hybrid

About Sygaldry

Sygaldry Technologies is building quantum-accelerated AI servers to exponentially speed up training and inference for AI. By integrating quantum and AI, we're accelerating the path to superintelligence, and addressing the problem of rising compute costs and energy bottlenecks. Sygaldry AI servers combine multiple qubit types within a single, fault-tolerant architecture to deliver the combination of cost, scale, and speed necessary for advanced AI applications. We pioneer new domains in physics, engineering, and AI, tackling the hardest challenges with a grounded, optimistic, and rigorous culture. We're looking for individuals ready to define the intersection of quantum and AI and drive its profound global impact.

About the Role

Our AI & Algorithms team is growing fast - research scientists, applied mathematicians, and quantum algorithm researchers developing the algorithms that will accelerate and transform AI. They need compute infrastructure that stays out of their way: GPU access that's reliable, experiments that are reproducible, and workloads that scale without requiring each researcher to become a cloud expert. You'll build and manage the compute platform this team runs on. The workloads are diverse -- quantum circuit simulation, large-scale numerical optimization, model training, tensor network contractions, and high-throughput data generation -- across multiple cloud providers and on-prem GPU servers. You own the full stack from cloud provider configuration to the Python APIs that researchers use to launch jobs.

What You’ll Work On

Research Computing & Developer Experience

Build compute abstractions that handle the team's diverse workloads: GPU-accelerated simulation, distributed training, high-throughput CPU jobs, and interactive analysis -- across PyTorch, JAX, and scientific computing frameworks
Stand up experiment tracking and reproducibility infrastructure
Create developer tooling that makes cloud compute feel local: environment setup, job submission, monitoring, and artifact management
Scale experiments from single-GPU prototyping to multi-node production runs

Multi-Cloud GPU Orchestration

Design multi-provider workload orchestration: route jobs based on cost, availability, and capability
Manage and optimize spend across cloud providers -- track credit balances, burn rates, and expiration dates
Configure hybrid local + cloud workflows as on-prem GPU infrastructure comes online
Coordinate with our infrastructure engineer on cloud administration and security

Pipeline Infrastructure

Build CI/CD pipelines for research workloads: automated testing, evaluation benchmarks, artifact management
Create data generation and preprocessing pipelines at the throughput the team's simulators demand
Set up monitoring, alerting, and cost dashboards that surface problems before researchers hit them

You May Be a Good Fit If You

Think in systems: you see how compute, storage, networking, and cost interact
Care about developer experience: you've felt the pain of bad research infrastructure
Are pragmatic about tooling: right tool for the job, no over-engineering
Take ownership: you want to own a critical function with autonomy
Write things down: you document decisions and create runbooks

Strong Candidates May Have

Deep AWS experience (EC2, S3, IAM, CloudFormation or Terraform)
GPU compute management (instance types, spot strategies, multi-GPU, distributed training)
Python-based ML and scientific computing tooling (PyTorch, JAX)
GCP and/or Modal experience
MLops or research computing platforms (MLflow, W&B, Kubeflow, or HPC job schedulers)
CI/CD pipeline management (GitHub Actions, containers)
Hybrid cloud / on-prem GPU cluster management
Experience supporting research teams with heterogeneous computing needs

How We’re Different

At Sygaldry, curiosity and intellectual courage drive our work. We approach ambitious challenges with a grounded, optimistic, and rigorous culture and know that kind people build the strongest teams. We prioritize mission over ego and collaborate openly with a strong sense of shared purpose. We dream big, yet we execute with a love of detail. We’re looking for scientists, engineers, and operators to forge new paths with us at the intersection of quantum and AI.

Culture & Benefits

Visa Sponsorship - We know what it takes to make top talent thrive here. We’re open to supporting visas whenever possible.
Compensation - We value your contribution and invest in your future with a competitive salary and meaningful equity.
Benefits - Your well-being matters. We provide company-sponsored health coverage to give you and your family peace of mind.
Connection - Whether it’s company offsite or casual crew socials, we make time to connect, recharge, and have fun together.
Time Off - We trust you to take the time you need. Unlimited PTO so you can rest, recharge, and come back ready to make an impact.

We encourage applications from candidates with diverse backgrounds. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristics.

We encourage you to apply even if you do not believe you meet every single qualification. If you don’t think this role is right for you, but you believe that you would have something meaningful to contribute to our mission, please reach out at letsbuild@sygaldry.com

Apply Now