Applications deadline: We accept submissions until 15 January 2026. We review applications on a rolling basis and encourage early submissions.
ABOUT THE OPPORTUNITY
We’re looking for Full-stack Software Engineers who are excited to build tools for frontier AGI safety research, e.g. building and maintaining evals libraries and tools for monitoring and controlling our own LLM traffic.
REPRESENTATIVE PROJECTS
Your main objective is to develop tooling for analyzing model evaluation results. Here is a list of features that you might build and ship in your first 6 months:
- LLM-powered search that finds interesting fragments in evaluation transcripts
- Comparison views that show how conversations and scores differ between two evaluation runs
- Ability to view and analyse conversations with coding agents (Cursor, Claude Code, etc.) in addition to evaluation transcripts
- Results streaming for evaluations that are currently being run
- Collaborative editing of evaluation logs that automatically updates metrics and other derived data. Think of this as developing an “IDE for evaluations”.
Besides this, here are example auxiliary projects which you might do:
- Automated evaluation pipelines to minimize the time from getting access to a new model for pre-deployment testing to analyzing the most important results and sharing them.
- LLM agents and MCP tools to automate internal software engineering and research tasks, with sandboxes to prevent major failures - Telemetry API and instrumentation of our existing tools, allowing us to monitor usage and improve reliability
- Upstream improvements to the Inspect framework and ecosystem, e.g. support for evaluating modern agentic scaffolds.
KEY RESPONSIBILITIES
Balance between moving quickly and creating robust and performant softwareLead the development of major features from ideation to implementationSupport the entire user journey from running the evaluation to finding interesting results to analysing the results to producing reports and papersMake the software configurable and extensible, so that users can adapt it for their needsCollaboratively define and shape the software roadmap and prioritiesEstablish and advocate for good software design practices, codebase health, and coding agent practicesWork closely with researchers to understand what challenges they faceWork closely with the product team to create solutions that satisfies both our researchers and external customers
KEY REQUIREMENTS
You must have experience writing production-quality Python and React code We value candidates from diverse backgrounds and recognise that candidates may demonstrate their skills in different ways. For example, we might be impressed if you have: Led the development of a successful software tool or product over an extended period (e.g. 1 year or more)Started and built the tech stack for a company, e.g in a start-upWorked your way up in a large organisation, repeatedly gaining more responsibility and influencing a large part of the codebaseAuthored and/or maintained a popular open-source tool or libraryPlaced in a prestigious programming competition (IOI, ICPC, etc.)5+ years of professional software engineering experience
The following would be a bonus:
Experience designing rich and intuitive UIs, especially for power usersDirect work with researchers or customersExperience working with LLM agents or LLM evaluationsInterest in AI Safety
We want to emphasize that people who feel they don’t fulfill all of these characteristics but think they would be a good fit for the position nonetheless are strongly encouraged to apply. We believe that excellent candidates can come from a variety of backgrounds and are excited to give you opportunities to shine.
LOGISTICS
Start Date: Target of 2-3 months after the first interviewTime Allocation: Full-timeLocation: The office is in London, and the right next to the London Initiative for Safe AI (LISA) offices. This is an in-person role. In rare situations, we may consider partially remote arrangements on a case-by-case basis.Work Visas: We can sponsor UK visas
BENEFITS
Salary: 100k - 200k GBP (~135k - 270k USD)Flexible work hours and scheduleUnlimited vacationUnlimited sick leaveLunch, dinner, and snacks are provided for all employees on workdaysPaid work trips, including staff retreats, business trips, and relevant conferencesA yearly $1,000 (USD) professional development budget