Everseen: A leader in vision AI solutions for the world’s leading retailers.
The Role
We are seeking a Full-Stack Engineer to be a key member of the Everseen ML Operations team. As part of that team, you will own the design and implementation of the front-end and back-end components of the Everseen internal ML platform, supporting the AI researchers requirements for dataset management and video/image annotation tools. You will be instrumental in shaping our internal Machine Learning Platform and driving automation, reproducibility, and performance across the machine learning lifecycle.
What you’ll do (Main responsibilities)
Design and Development
Collaborate with cross-functional teams to design and develop new features and functionalities.
Ensure that the developed solutions meet project objectives and enhance user experience.
Coding
Design and implement reusable, testable, efficient, and elegant code based on requirements.
Ensure adherence to coding standards and best practices.
Testing
Create, maintain, and run unit tests for both new and existing applications and services.
Aim to deliver defect-free and well-tested solutions.
Data Analysis
Analyze and collect data from various sources such as log files, application stack traces, and thread dumps.
Utilize data analysis to identify trends, patterns, and potential areas for improvement.
Continuous Integration and Continuous Deployment (CI/CD):
Create and maintain CI/CD integration using various tools.
Automate the build, test, and deployment processes to ensure efficiency and reliability.
Integration of Third-Party Solutions
Evaluate and integrate third-party software solutions to optimize system performance.
Expand product capabilities by integrating compatible third-party solutions.
Update and track third-party solutions' compatibility with Everseen stack according to internal development guidelines
Monitoring and Troubleshooting
Monitor production logs to identify and troubleshoot issues promptly.
Ensure seamless operation and timely resolution of any anomalies to maintain system reliability.
Documentation
Responsible for creating, maintaining, and updating technical documentation to ensure code, systems, and processes are clearly understood and easily accessible by team members and stakeholders.
Collaborating With
AI/ML Research team
Data Engineering team
Data Annotation team
Software Development Engineers
DevOps team
Product Managers
Security & Compliance Teams
Profile and Skills
3-4+ years of work experience in either ML infrastructure, MLOps, or Platform Engineering
Bachelors degree or equivalent focusing on the computer science field is preferred
Excellent communication and collaboration skills.
Technical Skills:
Experience in ML infrastructure, MLOps, or Platform Engineering.
Strong programming skills, with experience in Front-End development, in React and Angular
Understanding ML lifecycle, model versioning, and monitoring
Experience with back-end frameworks on top of NodeJS ( NestJS )
Hands-on experience with Kubernetes, Docker, and cloud services.
Experience with CI/CD tools (e.g., GitLab, Jenkins).
Excellent communication and collaboration skills.
Experience with Infrastructure as Code (e.g., Terraform).
Experience with:
ML frameworks (e.g., TensorFlow, PyTorch)
GPU orchestration (e.g., NVIDIA GPU Operator, MIG),
Infrastructure as Code (e.g., Terraform).
Data engineering tools (e.g., Snowflake, Databricks, BigQuery, Airbyte, Kafka)
Familiarity with feature stores and model registries. Exposure to large-scale distributed systems and performance optimisation.
Ability to work with Linux systems, including troubleshooting skills such as log investigations, performance testing, and connectivity investigation.
Possesses a deep understanding of technical concepts and terminology relevant to Everseen's products and services.
Expert knowledge of advanced concepts like microservices and distributed systems, indicating an understanding of modern software development architectures.
In-depth knowledge of Azure Kubernetes Services for container orchestration, Azure Blob Storage for data storage, and ElasticSearch for search and analytics.
Ability to leverage cloud computing technologies and services for testing and validation purposes.
In-depth knowledge of cloud security, scalability, and performance optimization principles.
Excellent understanding of cloud computing technologies and services, including infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS).
Broad understanding of the software engineering and architecture space, including knowledge of various programming languages, frameworks, techniques, and industry trends in AI.