Senior DevOps Engineer
webAI • Austin, Texas, United StatesAbout Us:
webAI is pioneering the future of artificial intelligence by establishing the first distributed AI infrastructure dedicated to personalized AI. We recognize the evolving demands of a data-driven society for scalability and flexibility, and we firmly believe that the future of AI lies in distributed processing at the edge, bringing computation closer to the source of data generation. Our mission is to build a future where a company's valuable data and intellectual property remain entirely private, enabling the deployment of large-scale AI models directly on standard consumer hardware without compromising the information embedded within those models. We are developing an end-to-end platform that is secure, scalable, and fully under the control of our users, empowering enterprises with AI that understands their unique business. We are a team driven by truth, ownership, tenacity, and humility, and we seek individuals who resonate with these core values and are passionate about shaping the next generation of AI.
About the Role:
We are seeking a Senior DevOps Engineer to design, build, and scale secure infrastructure supporting AI workloads across cloud and edge environments. This is a high-impact individual contributor role where you will help drive infrastructure architecture, platform reliability, and security best practices across the organization.
You will work closely with engineering teams to implement scalable, automated infrastructure solutions that enable our AI platform to operate efficiently across diverse deployment scenarios—from public cloud to hybrid and edge environments. This role requires strong technical depth, production experience, and the ability to translate complex requirements into resilient infrastructure systems.
Responsibilities:
Design and implement secure, scalable infrastructure across multi-cloud (AWS, Azure, GCP), hybrid, and edge environments
Build and maintain Infrastructure as Code (Terraform, Pulumi, Ansible) using GitOps workflows and automated validation
Deploy and operate Kubernetes clusters optimized for AI/ML workloads, including GPU scheduling and container security best practices
Develop secure CI/CD pipelines with integrated security controls (SAST, DAST, vulnerability scanning, secrets management)
Support MLOps infrastructure initiatives including model deployment automation, versioning, and lifecycle management
Implement observability and monitoring frameworks using tools such as Prometheus, Grafana, ELK, or Datadog
Enforce security best practices including IAM, encryption, network segmentation, and compliance automation
Participate in incident response, reliability improvements, postmortems, and disaster recovery planning
Develop reusable infrastructure modules and documentation (runbooks, architecture docs, standards)
Mentor junior and mid-level engineers on DevOps best practices and infrastructure design
Qualifications:
5–8+ years of experience in DevOps, Site Reliability Engineering, or Infrastructure Engineering supporting production systems
Strong expertise with Docker, Kubernetes, and cloud-native architectures
3–5+ years of hands-on experience implementing Infrastructure as Code (Terraform, Pulumi, Ansible)
Experience working with AWS, Azure, or GCP including compute, networking, and managed services
Proven experience building and maintaining CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, ArgoCD)
Programming experience in Python (preferred), Go, or Bash for automation
Experience implementing monitoring and observability in production environments
Strong understanding of cloud security best practices and access management
Strong communication skills and ability to collaborate cross-functionally
Preferred Skills:
Exposure to multi-cloud or hybrid cloud environments
Experience supporting AI/ML or MLOps workflows
Familiarity with service mesh technologies (Istio, Linkerd)
Experience with edge computing or distributed systems
Understanding of cost optimization and cloud efficiency practices
Relevant certifications (CKA, AWS Solutions Architect, Terraform Associate, etc.)
We at webAI are committed to living out the core values we have put in place as the foundation on which we operate as a team. We seek individuals who exemplify the following:
Truth - Emphasizing transparency and honesty in every interaction and decision.
Ownership - Taking full responsibility for one’s actions and decisions, demonstrating commitment to the success of our clients.
Tenacity - Persisting in the face of challenges and setbacks, continually striving for excellence and improvement.
Humility - Maintaining a respectful and learning-oriented mindset, acknowledging the strengths and contributions of others.
Benefits:
Competitive salary and performance-based incentives.
Comprehensive health, dental, and vision benefits package.
401k Match (US-based only)
$200/mos Health and Wellness Stipend
$400/year Continuing Education Credit
$500/year Function Health subscription (US-based only)
Free parking, for in-office employees
Unlimited Approved PTO
Parental Leave for Eligible Employees
Supplemental Life Insurance
webAI is an Equal Opportunity Employer and does not discriminate against any employee or applicant on the basis of age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We adhere to these principles in all aspects of employment, including recruitment, hiring, training, compensation, promotion, benefits, social and recreational programs, and discipline. In addition, it is the policy of webAI to provide reasonable accommodation to qualified employees who have protected disabilities to the extent required by applicable laws, regulations and ordinances where a particular employee works.