At NordLayer, we’re building cybersecurity that scales with the business.
A toggle-ready platform that helps modern teams thrive—without the security headaches. Trusted by 11,000+ global companies, NordLayer plugs into any tech stack and protects users across borders.
Your impact? Helping businesses stay protected and moving forward with future-ready network security.
NordVPN runs a global edge infrastructure serving millions of users. Knowing what's happening across that infrastructure - in real time, at scale, without drowning in noise - is what this role exists to solve.
We're looking for a Senior Site Reliability Engineer focused on observability: designing monitoring systems, improving signal quality, reducing alert fatigue, and collaborating with data teams on anomaly detection. You'll own how we understand the health and behavior of our distributed systems.
Main Responsibilities
Design, build, and improve monitoring pipelines and observability tooling across globally distributed infrastructure
Define and implement service-level monitoring based on golden signals (latency, traffic, errors, saturation)
Reduce alert fatigue - build meaningful, actionable alerts that engineers trust
Develop and maintain custom exporters, scripts, and integrations for metrics and log collection
Collaborate with the data team on anomaly detection and data-driven operational insights
Understand service signals - know what to measure, why, and what the numbers actually mean
Core Requirements
Distributed systems observability - monitoring architecture, signal design, dashboarding
Golden signal thinking - you design monitoring around what matters, not what's easy to measure
Alert design - reducing noise, building actionable alerts, managing on-call sanity
Python - scripting, custom exporters, automation, data processing
Linux administration and debugging
Networking fundamentals
Bonus Points For
SaltStack
Advanced networking - traffic analysis, protocol-level debugging
Advanced data knowledge - aggregation strategies, downsampling, cardinality management, retention trade-offs
Proven track record of onboarding new systems/services into monitoring from scratch
Familiarity with agentic engineering - Claude Code, LLM integrations, MCP workflows
Tools You Will Use
Naemon (Nagios) and Gearmand
Prometheus-based exporters
Telegraf
Fluent Bit
VictoriaMetrics ecosystem
OpenSearch
Grafana
Salary Range
Gross Salary 5800-7400 EUR/Month