Principal Platform Engineer I

JavaPython Hybrid

Who are we?

Smarsh empowers its customers to manage risk and unleash intelligence in their digital communications. Our growing community of over 6500 organizations in regulated industries counts on Smarsh every day to help them spot compliance, legal or reputational risks in 80+ communication channels before those risks become regulatory fines or headlines. Relentless innovation has fueled our journey to consistent leadership recognition from analysts like Gartner and Forrester, and our sustained, aggressive growth has landed Smarsh in the annual Inc. 5000 list of fastest-growing American companies since 2008.

Role Overview

You will join the Datastore Team, a core component of the Smarsh Fabric platform that underpins our enterprise applications. The team enables self-service, next-generation data capabilities across the engineering organization, ensuring long-term scalability, reliability, and innovation.

This role serves as a technical authority for large-scale distributed data platforms. You will shape architectural direction, help define operating standards, and solve complex distributed systems challenges across petabyte-scale environments running hundreds of clusters. Working across engineering domains, you will influence how data infrastructure is designed, built, and operated company-wide.

Skills & Experience

8+ years of experience in platform engineering, SRE, or distributed systems-focused roles.

Demonstrated subject matter expertise in operating at least one of MongoDB, Kafka, or ElasticSearch at high scale, including deep day-2 operational knowledge.

Significant experience designing and operating large-scale, Kubernetes environments and associated ecosystem tooling (e.g. Helm, Kustomize, ArgoCD etc.). Experience managing stateful workloads on Kubernetes is a significant plus.

Proven experience defining architectural standards and influencing technical direction across teams.

Strong programming skills (Python, Java, or similar), with experience building internal platform APIs and automation tooling a significant plus.

Extensive experience with Infrastructure as Code (e.g. Terraform) and cloud-native deployment models.

Hands-on experience operating enterprise-scale workloads in AWS.

Experience designing and evolving observability platforms (e.g. Prometheus/Grafana, ELK) to support multi-cluster environments.

Strong understanding of security principles and experience embedding security best practices into production environments and code promotion systems.

Excellent communication skills with the ability to influence diverse technical and non technical stakeholders

Core Responsibilities

Define and evolve the architectural vision and operating standards for large-scale distributed data platforms.

Lead the design and evolution of highly available, scalable clusters supporting core data technologies including MongoDB, Elasticsearch, and Apache Kafka.

Solve complex, ambiguous distributed systems problems, balancing trade-offs across scalability, resilience, performance, security, and cost.

Influence engineering teams to adopt platform standards, automation practices, and self service capabilities.

Partner with engineering and product leadership to align platform capabilities with strategic business objectives.

Establish reliability targets, operational models, and observability standards for stateful workloads on multi-cluster Kubernetes environments.

Mentor engineers, lead design reviews, and raise the technical bar across the platform organization.

Contribute as a senior escalation point within the on-call rotation.

Apply Now