RoboForce

Senior / Staff AI Research Engineer, Data Infrastructure

RoboForce • Milpitas, CA
Python

Why RoboForce

RoboForce is an AI robotics company developing Physical AI–powered Robo-Labor for dull, dirty, and dangerous work. The company's robots are engineered for demanding industrial environments, with a focus on real-world deployment and scalability.
 
We are looking for a Senior / Staff AI Research Engineer, Data Infrastructure to build the data and learning engine behind RoboForce's Physical AI stack. In this role, you will own the full pipeline — from raw teleoperation and UMI device data collection through curation, annotation, and storage, to post-training infrastructure that scores demonstrations, identifies failure patterns, and closes the loop back into model retraining.
 
Responsibilities
  • Design and maintain end-to-end data collection pipelines ingesting multimodal demonstration data from teleoperation devices and UMI hardware, including synchronization, versioning, and distributed storage at scale.
  • Build annotation tooling and data curation workflows — quality filtering, deduplication, episode scoring, and domain reweighting — to produce high-quality training datasets for robot policy learning.
  • Develop post-SFT reinforcement learning infrastructure: implement reward scoring on demonstrations, mine and categorize failure patterns, and feed curated failure data back into the retraining loop.
  • Build evaluation and test infrastructure to log policy rollouts on-robot, capture structured results, and surface actionable diagnostics for the research team.
  • Collaborate with ML researchers to define data schemas, episode formats, and pipeline interfaces that support rapid iteration on VLA and manipulation policy training.
  • Architect scalable storage and retrieval systems for heterogeneous robot data (vision, proprioception, action, language) across both cloud and on-prem environments.
Requirements
  • Bachelor's or Master's degree in Computer Science, Robotics, or related field with 5+ years of experience.
  • Strong proficiency in Python and experience building production-grade data pipelines and ETL systems.
  • Hands-on experience with large-scale dataset management, including versioning, deduplication, quality filtering, and distributed storage (e.g., S3, GCS, HDF5, WebDataset, Zarr).
  • Experience building or working with post-training infrastructure — SFT pipelines, reward modeling, or RL training loops (e.g., PPO, DPO, rejection sampling).
  • Familiarity with deep learning frameworks (PyTorch, JAX) and ML training workflows sufficient to collaborate tightly with research teams.
  • Requires 5 days/week in-office collaboration with the teams.
Bonus Qualifications
  • Experience with robotics data collection hardware — teleoperation devices, UMI, GELLO, or similar — and the synchronization and preprocessing challenges they introduce.
  • Familiarity with robot learning pipelines: imitation learning, behavior cloning, or VLA/VLM fine-tuning workflows.
  • Experience building evaluation or experiment tracking infrastructure (e.g., Weights & Biases, MLflow, custom rollout loggers).
  • Proven ability to design annotation tooling or human-in-the-loop labeling systems for structured or multimodal data.
Benefits
  • Competitive stock options/equity programs.
  • Health, dental, and vision insurance, 401(k) plan.
  • Visa sponsorship and green card support for qualified candidates.
  • Lunches and dinners, a fully stocked kitchen, and regular team-building events.