AI Engineering's ML-Platform team goal, is to deliver a modern infrastructure and solutions to enhance Mobileye's Algorithm development life cycle and shorten our delivery times. We are an independent group, consisting of excellent and experienced engineers with diverse skills in algorithms, software, and infrastructure. We strive to implement a DevOps culture allowing our engineers to easily collaborate on large-scale products. We develop cross-company products that enable the research and deployment of state-of-the-art algorithms.
What will your job look like?
Build and maintain infrastructure for large‑scale AI and HPC workloads across on‑prem and cloud environmentsOperate and enhance our multi‑cloud, multi‑cluster scheduling platformDevelop automation, tooling, and platform services und BashTroubleshoot complex issues across the stack: compute, networking, storage, orchestration, and distributed systemsImprove reliability of critical systemsCollaborate with ML, data, and backend teams to support evolving platform needsDrive best practices in CI/CD, infrastructure-as-code, and system designParticipate in on‑call rotations for critical infrastructure components
All you need is:
10+ years of hands‑on experience in DevOps, SRE, systems engineering, or similar rolesLinux knowledge, including debugging, performance tuning, ana system internalsProven experience working with HPC environments, large clusters, or high‑performance compute systemsSolid experience with Kubernetes (EKS or similar managed K8s services)Knowledge of infrastructure‑as‑code tools(Terraform, Helm, etc.)Hands‑on experience with: PostgreSQL or similar relational databasesElasticsearch or similar search/indexing systemsPrometheus/Thanos/Grafana or similar observability stacksRabbitMQ or similar messaging systemsStrong proficiency in Bash, networking fundamentals, and debugging distributed systems.Experience investigating complex issues across compute, storage, networking, and orchestration layersAdvantages:
Experience with multi‑cloud architecturesExperience with workflow orchestration tools such as Argo Workflows (or similar systems like Airflow, Prefect, Flyte)Familiarity with GPU scheduling, AI/ML pipelines, or data‑intensive workloadsBackground in large‑scale distributed systems or platform engineeringAbility to write production‑quality Go (Golang) code
What We Offer:
Impactful engineering that advances Mobileye’s AI capabilities and strengthens the safety of transportation systems globallyThe opportunity to work on cutting‑edge AI infrastructure at massive scaleA highly technical environment with deep engineering challengesCollaboration with great ML, software, and systems engineers