Everseen: A leader in vision AI solutions for the world’s leading retailers.
The role:
We are seeking an experienced software engineering lead for the Everseen ML Operations department. You and your team will extend the capabilities of our scalable ML Ops infrastructure that empowers our data scientists and machine learning engineers to develop, train, benchmark, and monitor our machine learning models efficiently. You will be instrumental in enhancing our internal Machine Learning Platform and driving automation, reproducibility, and performance across the machine learning lifecycle.
What you'll do:
LeadershipManages and leads the function/specialism reporting to them from both an operational and strategic perspective. Has overall responsibility for the successful operation of the function/specialism deliverables.Deals with all capacity planning activities.Sets strategy and makes key decisions for the success of the function/specialismWorks with senior leadership to set strategic goals for their team and cascades these down.Monitor and support team members' career paths. Evaluate team members' performance and propose promotions.Assess and propose team headcount adjustments based on project roadmaps and team capacity.
Teaching and Sharing CultureEnsure the sharing of skills, knowledge, and expertise between members of the engineering team.Foster a culture of collaboration and continuous learning by organizing training sessions, workshops, and knowledge-sharing sessions.
Design and DevelopmentCoordinate and drive progress with cross-functional teams in designing and developing new features and functionalities. Ensure that the developed solutions meet project objectives and enhance user experience.Has decisional authority over the technology stack and internal technical improvements.
CodingEnsure design and implementation of reusable, testable, efficient, and elegant code based on requirements and a longer-term product and feature strategy.Ensure adherence to coding standards and best practices.
Continuous Integration and Continuous Deployment (CI/CD)Ensure adoption and implementation of CI/CD principles using industry standards and best practices.
Integration of Third-Party SolutionsEnsure the evaluation, integration, and maintenance of third-party software solutions to optimize system performance.Ensure the expansion of product capabilities by integrating compatible third-party solutions. Be aware of and promote
Monitoring and TroubleshootingEnsure seamless operation and timely resolution of any anomalies to maintain system reliability.
DocumentationResponsible for defining documentation strategies, ensuring alignment with organizational goals, and overseeing the consistency, quality, and accessibility of technical documentation across teams.
Profile and skills
5+ years of experience in either ML infrastructure, MLOps, or Platform Engineering.5+ years of experience in leadership rolesInspirational leadership, strategic vision, culture shaping approach. Strong programming skills – Python / GoHands-on experience with Kubernetes, Docker, and cloud services.Experience with CI/CD tools (e.g., GitLab, Jenkins).Understanding of ML training pipelines, data lifecycle, and model serving conceptsExcellent communication and collaboration skills.Familiarity with workflow orchestration tools (e.g., Airflow, Kubeflow, Ray, Vertex AI, Azure ML).Proven ability to monitor and optimize cloud costGood Understanding of data privacy, RBAC, and model governance
Additional SkillsExperience in Microsoft Azure or Google GCP cloud infrastructure and Machine Learning tools Experience with ML frameworks (e.g., TensorFlow, PyTorch).Experience with GPU orchestration (e.g., NVIDIA GPU Operator, MIG).Experience with Infrastructure as Code (e.g., Terraform).Knowledge of data engineering tools (e.g., Snowflake, Databricks, BigQuery, Airbyte, Kafka).Familiarity with feature stores and model registries.Exposure to large-scale distributed systems and performance optimization.