About The Role:
As an AI ML Engineer in the SRE and Observability team at GoTo, you will build intelligent systems that detect anomalies, reduce incidents, and accelerate root cause analysis. Your work will directly improve reliability across platforms, helping teams resolve issues faster and keep services running smoothly. You will apply ML techniques to correlate metrics, logs, and traces, and design automation that prevents recurring issues. By embedding AI into daily operations, you will enable self learning systems and empower engineers to focus on higher impact problems. If you are excited about applying AI to real world reliability challenges, this role is for you.
What Will You Do
AI Driven Incident Detection – Build and deploy ML models to identify anomalies across metrics, logs, and traces before they cause incidents.Root Cause Analysis Automation – Correlate observability signals and analyze historical incidents to accelerate RCA and provide resolution recommendations.Agentic AI Solutions – Develop AI agents that autonomously troubleshoot, recommend actions, and improve incident response.Integration with Operations Tools – Connect AI insights with ticketing, alerting, and incident management platforms for seamless workflows.Collaboration and Enablement – Partner with SRE, Monitoring, and Security teams to embed AI driven practices into daily operations.
What Will You Need
2-4 years experience in machine learning, deep learning, and data analysis, with focus on anomaly detection and NLP.At least 2 years experience in Site Reliability Engineering, DevOps, or cloud infrastructure roles.Tools: n8n, Langgraph/crewAI, Langchain etc.Proficiency in scripting or programming (e.g., Python, Go, or Bash) for automation and tooling.Understanding of observability tools like Prometheus, Grafana, ELK, or similar logging/monitoring stacks.Strong problem-solving skills with a focus on performance tuning, reliability, and incident response.Excellent communication and collaboration skills, with the ability to work effectively across cross-functional teams.