Pryon

Senior Software Engineer, Ingestion Team

Pryon • US
Python Remote
About Pryon: 
We’re a team of AI, technology, and language experts whose DNA lives in Alexa, Siri, Watson, and virtually every human language technology product on the market. Now we’re building an industry-leading knowledge management and Retrieval-Augmented Generation (RAG) platform. Our proprietary, cutting-edge natural language processing capabilities transform unstructured data into meaningful experiences that increase productivity with unmatched accuracy and speed.

The Opportunity:

The Ingestion team is responsible for everything that happens between content arriving from a connector and that content being ready for search and retrieval. This means document processing pipelines that handle parsing, text extraction, chunking, metadata enrichment, embedding generation, and index population — across every file format and content type our customers throw at us.

We’re in the middle of a significant architectural evolution — migrating from a legacy pipeline to a modern, workflow-orchestrated architecture with cleanly separated processing stages: intake, transformation, enrichment, and indexing. The team is also actively designing the next iteration of the pipeline to push further on throughput and resilience.

This is real systems engineering: the problems are about scale, reliability, and the messy realities of processing millions of documents with wildly different structures.

The Ideal Candidate:

  • Is self-driven and comfortable operating with autonomy inside a structured team
  • Gets energized by architectural challenges, not just feature work
  • Has the patience and discipline to improve existing systems while building new ones
  • Understands that pipeline engineering is about handling the 10,000 edge cases, not just the happypath
  • Is motivated by the mission: building the processing backbone that makes enterprise AI accurateand reliable
  • Communicates well in a remote-first environment and collaborates naturally across teamboundaries
  • In This Role You Will:

  • Design and build pipeline stages for our modern ingestion architecture - from document intake through embedding generation and index writing
  • Contribute to the design of next-generation pipeline architecture as the system evolves
  • Improve system stability and scale: identify bottlenecks, reduce failure rates, and build observability into every stage
  • Work with workflow orchestration tools to manage complex, multi-step document processing with retry logic, error handling, and state management
  • Handle the realities of document diversity: PDFs, HTML, Office formats, images, structured and semi-structured data - all flowing through the same pipeline
  • Collaborate with the Connectors team (upstream) and Retrieval team (downstream) to ensure data flows cleanly across system boundaries
  • Participate in the ongoing migration from legacy systems, balancing new development with operational stability
  • What You'll Need to Be Successful:

  • 5+ years of software engineering experience, with meaningful time on data processing pipelines, ETL systems, or similar infrastructure
  • Strong proficiency in Python and/or Go
  • Experience with workflow orchestration tools — Temporal, Airflow, Prefect, Step Functions, or similar
  • Understanding of distributed systems patterns: queues, workers, backpressure, idempotency, retry strategies
  • Hands-on experience with Kubernetes, Docker, Terraform, and Helm
  • Familiarity with message brokers and event streaming (Kafka, RabbitMQ, SQS, or similar)
  • Comfort working across cloud providers (AWS, Azure, GCP)