Field AI

1.65 Senior Machine Learning Platform Engineer

Field AI • US
FieldAI’s Irvine team is where embodied AI meets real robots, real sensors, and real field deployments. Based in the heart of Southern California’s robotics ecosystem, we build risk-aware, reliable, field-ready AI systems that solve the hardest problems in robotics and unlock the full potential of embodied intelligence. If you want your work to ship, get tested on hardware, and improve through real deployments, Irvine is the place. We go beyond typical data-driven approaches or pure transformer-only architectures, combining rigorous engineering with learning systems proven in globally deployed solutions that deliver results today and get better every time our robots run in the field.

Who are We?

Field AI is transforming how robots interact with the real world. We are building risk-aware, reliable, and field-ready AI systems that address the most complex challenges in robotics, unlocking the full potential of embodied intelligence. We go beyond typical data-driven approaches or pure transformer-based architectures, and are charting a new course, with already-globally-deployed solutions delivering real-world results and rapidly improving models through real-field applications. Learn more at https://fieldai.com.

About the Job

Our Field Foundation Model (FFM) powers a global fleet of autonomous robots that capture massive streams of multimodal data across diverse, dynamic environments every day. As part of the Insight Team our mission is to transform this raw, multimodal data into actionable insights that empower our customers and engineers to deliver value. Field-insight Foundation Model (FiFM) is at the core of how we transform multimodal data from autonomous robots into actionable insights. As a Senior Machine Learning Platform Engineer, you will own the infrastructure that powers FiFM, from model hosting and distributed training pipelines to data systems, observability, and security.This is a role at the intersection of systems engineering and machine learning. You’ll design and operate large-scale ML platforms, ensure FiFM transitions smoothly from research into production, and optimize for both performance and cost across cloud and edge. In addition to building core infrastructure, you’ll play a leadership role by mentoring junior engineers, setting technical direction, and raising the engineering bar across the team.

What You’ll Get To Do:

  • Design and manage scalable ML infrastructure with IaC tools (Terraform, CloudFormation).
  • Develop and optimize cloud-based pipelines for training, evaluation, and inference on multimodal datasets.
  • Build and operate data systems for large-scale video ingestion, indexing, and storage.
  • Maintain MLOps workflows for versioning, experiment tracking, reproducibility, and CI/CD.
  • Ensure reliability and observability with monitoring, logging, and alerting.
  • Collaborate with AI/ML Engineers to productionize workflows.
  • Optimize infrastructure for performance and cost across cloud and edge.
  • Enforce best practices in security, compliance, and maintainability.
  • Mentor and manage junior engineers, providing technical guidance and career development.
  • What You Have:

  • Bachelor’s/Master’s in Computer Science, Engineering, or related field (or equivalent experience).
  • 4+ years of industry experience in ML infrastructure or platform engineering.
  • Strong coding skills in Python/TypeScript and a strong foundation in software engineering best practices.
  • Proven experience with distributed systems, cloud platforms (AWS preferred), containerization and orchestration (Docker, Kubernetes/EKS, Ray), and serverless.
  • Hands-on experience building ML pipelines for distributed training and large-scale inference.
  • Strong knowledge of data management at scale, including preprocessing and retrieval of video/image datasets.
  • Proficiency with CI/CD pipelines, infrastructure-as-code (Terraform, CloudFormation), and automation.
  • Familiarity with MLOps tools (MLflow, Kubeflow, Airflow).
  • Experience with system monitoring and observability in production.
  • The Extras That Set You Apart:

  • Experience with vector databases (OpenSearch, Pinecone, Weaviate) for indexing and retrieval.
  • Familiarity with distributed training frameworks (Horovod, DDP/FSDP, DeepSpeed, Ray).
  • Hands-on experience with GPU orchestration and auto-scaling (Karpenter, SageMaker, EKS).
  • Experience with agentic AI deployment workflows, orchestration frameworks, and retrieval-augmented generation.
  • Strong knowledge of security and compliance in ML and cloud environments.