Speech Software Engineer

ASAPP • US

Python Hybrid

At ASAPP, our mission is simple: deliver the best AI-powered customer experience—faster than anyone else. We are guided by principles that shape how we think, build, and execute, including deep customer obsession, purposeful speed, ownership, and a relentless focus on outcomes. We work in small, highly skilled teams, prioritize clarity over complexity, and continuously evolve through curiosity, data, and craftsmanship.

We’re building a globally diverse team of technologists and problem solvers who thrive in fast-paced environments, value collaboration, and approach every challenge with a Day 1 mindset. With hubs in New York City, Mountain View, Latin America, and India. If you’re driven by continuous learning, rapid iteration, and the challenge of building in a high-growth startup, this is more than a role—it’s a journey.

We are seeking a Speech Software Engineer to spearhead the architectural evolution of our voice infrastructure. This isn't just a maintenance role; you will be a primary architect in rebuilding our core speech stack from the ground up to support the next generation of real-time customer interactions. You will have the autonomy to make high-level technical decisions and the support of a team that thrives on deep thinking and startup-paced execution.

You will join the GenerativeAgent team, bridging the gap between cutting-edge ASR (Automatic Speech Recognition) research and high-performance production systems. If you are passionate about low-latency streaming, distributed systems, and the intricacies of audio processing, this is your opportunity to make a massive impact for millions of users.

What you'll do

Architect & Modernize: Lead the design and implementation of a scalable, high-availability voice infrastructure that replaces legacy systems.

Optimize Performance: Build and refine multi-threaded server frameworks capable of handling thousands of concurrent, real-time audio streams with minimal jitter and latency.

Build for Scale: Deploy robust ASR > LLM > TTS pipelines that process thousands of calls concurrently.

Stream Engineering: Develop robust logic for handling media streams, ensuring seamless audio data flow between clients and our ML models.

System Observability: Build advanced monitoring and load-testing tools specifically designed to simulate high-concurrency voice traffic.

Collaborate: Partner with Speech Scientists and Research Engineers to integrate state-of-the-art models into a production-ready environment.

What you'll need

Experience: 5+ years of software engineering experience, with a proven track record of building and maintaining production-grade infrastructure.

Industry Knowledge: A background in building ASR/TTS products at scale that interact with foundational LLMs.

Language Mastery: Expert-level proficiency in Golang, Python, or willingness to learn.

Voice Fundamentals: Deep understanding of audio processing, including sample rates, codecs (Opus, G.711), network protocols, and buffering strategies.

System Design: Strong background in object-oriented design and the ability to architect systems that are both modular and performant.

Growth Mindset: The ability to navigate and refactor large existing codebases while transitioning to new, more efficient architectures.

What we'd like to see

Cloud Native: Hands-on experience with Kubernetes, Docker, and cloud providers (AWS/GCP/Azure) for deploying distributed speech services.

Event-Driven Architecture: Familiarity with event loops (Boost.Asio, uvloop) and asynchronous programming patterns

Big Data: Experience with Hadoop, Spark, or Hive for analyzing massive datasets of speech logs to improve model accuracy.

Apply Now