Staff Platform Engineer - AI Infrastructure

Paytm • CA

GoPython Hybrid

Apply Now

About the Role

As a Staff Platform Engineer - AI Infrastructure, you will build and scale the infrastructure behind

Paytm's AI inference platform, serving internal teams and enterprise customers and supporting

new customer use cases from the ground up. You will own GPU infrastructure, model hosting

and serving, and multi-model routing across modalities. This includes running our own coding

and domain-specific models (voice, vision, risk, fintech workflows) as well as third-party models

on shared GPU and accelerator clusters.

You will also build self-service platforms that let teams provision, compute, deploy and

customize models, and manage resources through APIs and control planes, so they can use AI

without rebuilding infrastructure each time.

Your work will form the AI control plane for Paytm Intelligence (Pi): policy-driven routing, quotas,

observability, and usage and cost visibility. It will directly affect how fast we ship agents and AI

features, how reliably they run, and how efficiently we use our hardware across payments, risk,

fraud, collections, support, and developer experience.

What You'll Do

Design and operate GPU infrastructure for model hosting, including provisioning, scheduling, and cost optimization across cloud and on-premise environments

Build and scale model serving systems using vLLM, TensorRT-LLM, Triton, or equivalent, supporting real-time inference with strong latency and availability guarantees

Implement multi-model routing to serve multiple models across modalities (text, voice, code, vision) on shared infrastructure

Own the model lifecycle end to end: download, deploy, serve, monitor, swap, and scale

Drive inference optimization including quantization strategies (AWQ, GPTQ), batching, caching, and cold start reduction

Build self-service infrastructure platforms where teams provision compute, storage, and model endpoints through APIs and control planes

Implement infrastructure-as-code at scale using Terraform, Pulumi, or CDK

Build observability and reliability for inference systems: SLIs/SLOs, GPU utilization

monitoring, latency tracking, automated capacity planning, and alerting

Define platform standards and governance including multi-tenant isolation, cost attribution, and resource quotas

Lead architectural design and influence engineering direction across the AI infrastructure stack

What You'll Bring

8+ years of software engineering experience, including 3+ years building infrastructure platforms or ML/AI infrastructure

Deep experience with cloud infrastructure (AWS, GCP) and Kubernetes

Hands-on experience with GPU workloads and model serving (vLLM, TensorRT-LLM, Triton, or similar)

Strong software engineering fundamentals in Python, Go, or C++

Experience with infrastructure-as-code (Terraform, Pulumi, CDK)

Experience designing self-service platforms or internal developer tooling

Understanding of model optimization: quantization, batching, serving architectures

Proven ability to lead complex cross-team technical initiatives

Strong communication skills and the ability to influence technical direction

Nice to Have

Experience building or operating inference infrastructure at scale

Experience with CUDA, GPU scheduling, or hardware-level optimization

Experience with multi-model serving across different modalities

Experience with edge inference or on-device model deployment

Experience with model fine-tuning infrastructure (LoRA, QLoRA, PEFT)

Background in fintech or regulated industries

Apply Now