Site Reliability Developer (python/java) / SRE
WatchGuard Technologies • ESGoJavaPython Hybrid/Remote
WatchGuard embraces a Flexible Work Philosophy. Most of our employees can choose to work from the office, at home, or any combination of the two. We’ve built a global workforce of outstanding team members and a flexible culture built on trust, collaboration, and belonging.
Who you are:
You are a customer-focused, data-driven developer who has a passion for delivering the best customer experience possible. You enjoy the thrill of coordinating and troubleshooting production issues and want to proactively find and fix issues.
You have an understanding of cloud technologies, automation, everything-as-code, networking, microservice architectures, object-oriented design, SRE and DevOps cultures, proficiency in Python, Java, or Go programming and a desire to learn others.
You come with proven knowledge of software engineering best practices for the full software development lifecycle including coding standards, code reviews, security, source control management, build processes, automated testing, deployment, monitoring, chaos engineering, and automated self-healing operations. As well as knowledge of tools and technologies like CloudFormation, Terraform, New Relic, Lambda, Serverless, Elasticsearch, Docker, Kubernetes, Spark, Flink, Jenkins, GitHub, Artifactory, Jira, etc.
You are able to lead production incident response and postmortems through your strong analytical and problem-solving abilities as well as verbal and written communication skills.
What to expect as a member of the SRE team in WatchGuard:
The WatchGuard SRE team owns the reliability and security of our production cloud environments alongside our application development teams to ensure we deliver the best possible experience to our customers. As you learn more about our systems, you will be:
• Ensuring smooth production operations with development teams and leading large-scale event response.
• Defining operational and security policies, standards, and processes for our development teams to follow.
• Guiding our development teams through the process of establishing, monitoring, and achieving their service level agreements through the definition of service level indicators and objectives.
A Typical Day in the Life of a Site Reliability Developer, SRE Team at WatchGuard:
As a SRE at WatchGuard, a “typical” day may have you:
• Working side-by-side with our application teams in production AWS, Azure, and hybrid cloud environments to ensure proper monitoring, security, reliability, automation, and support are in place.
• Driving an operational excellence culture throughout WatchGuard with the simplification, automation, analysis, and evolution of our activities and processes.
• Championing security and operational best practices to become known as a cloud expert by the rest of our development teams located across the globe.
• Striving to provide the best possible customer experience even when things go wrong by participating in our on-call rotation and then coordinating and leading the production troubleshooting efforts.
• Using your programming skills to develop automation or assist with debugging and fixing complex production issues.
• Being curious, learning new things, and then sharing your knowledge through documentation, presentations, and guidance to other teams.