Help us use technology to make a big green dent in the universe!
Kraken powers some of the most innovative global developments in energy.
We’re a technology company focused on creating a smart, sustainable energy system. From optimising renewable generation, creating a more intelligent grid and enabling utilities to provide excellent customer experiences, our operating system for energy is transforming the industry around the world in a way that benefits everyone.
It’s a really exciting time in energy. Help us make a real impact on shaping a better, more sustainable future.
Kraken Customer
What we do: build the most AI-driven, innovative, forward-thinking platform for energy management. From optimizing resources to delivering cost-effective, exceptional customer experiences through advanced Customer Information Systems (CIS), billing, meter data management, CRM, and AI-driven communications, Kraken is powering the next wave of innovation in the energy industry.
Why we do it: future energy will not look like energy as we know it today. We need to not just think about our future, but build for it. Now.
The Team
We have expanded our tentacles and are looking for someone (based in Melbourne, Australia) to join our Global Platform Engineering Reliability - Observability team.
Our Reliability group is responsible for architecting, developing, and maintaining the resilient and scalable infrastructure that powers and supports our platform.
We’re building a brand-new Observability team at Kraken and we’re looking for a Platform Engineer II to grow with it. Your core focus will be contributing to the availability, performance and scalability of products across Kraken - helping engineering teams get the visibility they need to build and operate reliable services with confidence. You’ll work alongside the team to improve monitoring and alerting across our product suite and support engineering teams as they transition to on-call over the next six months.
We operate with a high degree of autonomy and trust. That means you'll sometimes need to navigate unclear requirements, ask questions and move forward with incomplete information. You won't be handed a perfect spec for every task - but you will have a supportive team, clear goals and the space to develop your skills as the team evolves.
You’ll have the opportunity to shape how observability is done at Kraken and your contributions will have a direct, visible impact.
What you'll do:
Support and implement monitoring and alerting strategy across Kraken’s customer businessDefine and uphold observability best practices across multiple products and platformsPartner with product teams to implement observability tooling and improve reliability across the organisationHelp product teams build best-in-class dashboards for their requirements or bespoke use casesWork with product teams to define and implement meaningful Service Level Objectives (SLOs) and Service Level Indicators (SLIs), aligned to contractual Service Level Agreements (SLAs)Build, tune, and continuously improve alerts and monitors using golden signals (latency, traffic, errors, saturation) as a framework - reducing noise and increasing actionable signalHelp product teams transition to on-call models by improving signals, alert quality, and operational readinessImprove tooling and self-service capabilities for alerting and monitoring across multiple product teamsAnalyse incident metrics to identify trends and improvement opportunities, communicating insights clearly back to product teamsManage the cost and usage of our observability tooling stack in collaboration with FinOps Contribute to broader platform reliability infrastructure improvements where neededHelp solve interesting and difficult problems - there’s a significant opportunity for disruption in the global energy market
What you'll have:
Solid hands-on experience across our core platform stack:
AWS (supporting and improving cloud infrastructure used by product teams)Terraform (infrastructure as code; comfortable operating with Terraform day-to-day)Kubernetes (container orchestration and deployment management; comfortable working with Kubernetes day-to-day)Experience using industry-standard observability tooling - we use Datadog, Grafana, Prometheus and Rootly (experience with other monitoring/alerting platforms is transferable)Strong collaboration and communication skills - able to work effectively with developers, product managers, and other stakeholders to design and deliver impactful observability “golden paths” and monitoring experiencesExposure to Python (or a similar C-based language like TypeScript, Go, C#) - able to understand how applications behave in production to support observability and reliability improvementsPrevious experience working in small, highly autonomous teams
A working style that fits how we operate:
Comfortable with ambiguity and able to create structure in unclear situationsProactive learning mindset (experiment, iterate, and adapt as the team evolves approaches)Strong asynchronous written communication (Slack/Notion/docs) and a habit of keeping others in the loopAutonomy and accountability - making progress independently and owning outcomes
What will help:
Previous experience working in a data-focused or Observability teamExperience working on SaaS platforms, including engaging product teams to ensure upskilling and knowledge sharingExperience building observability tooling to support large-scale internet-facing servicesExperience instrumenting and diagnosing issues with very large relational databasesFamiliarity with PostgreSQL (or similar RDBMS), particularly Amazon RDS at scaleExperience using SLOs to drive meaningful performance and reliability improvements