Salary range: $310,000 - $500,000/year + benefits
Description: Transluce is a fast-moving nonprofit research lab building the
public tech stack for scalable AI evaluation and oversight. We specialize in
behavioral evaluations of frontier AI systems, assessing how models actually
behave in deployment, not just how they perform on benchmarks. We are an
independent non-profit with a mission to steer the development of AI for the
public good.
About the role: We're looking for an engineer to work on measuring and shaping
AI model behaviors, someone who thrives on turning hard questions into evidence
fast. Think of this as a forward-deployed engineering role working directly with
policymakers, civil society partners, and frontier labs to rapidly answer key
questions about why AI systems act the way they do, and when and why they fail.
You'll build relationships with external domain experts, adapt our methods to
new contexts, and help ensure our work is both technically credible and
immediately useful to the people making consequential AI governance decisions.
This is a high-autonomy role with direct exposure to senior stakeholders and a
clear line of sight from your work to real-world impact.
Core responsibility: Build and extend Transluce’s AI evaluation methods for
measuring important evolving AI model behaviors. This includes:
* Scope, prototype, and run behavioral evaluations in response to emerging
policy and oversight needs, including rapid-turnaround work for government
and civil society partners.
* Execute on Transluce's contracts with government evaluators, including
building evaluations for harmful manipulation with the EU AI Office.
* Design and run privileged-access evaluations and external oversight exercises
with frontier labs.
* Work with civil society organizations and domain experts to adapt our
behavioral evaluation pipelines to their contexts (e.g., mental health,
persuasion, evaluation awareness).
Qualities of a strong candidate:
* Hands-on experience designing and running AI evaluations, particularly
behavioral or interactive evaluations (multi-turn, agentic, or red-teaming
contexts)
* Strong engineering instincts and good judgment about when "good enough to
ship" is actually good enough.
* Experience in customer-facing, consulting, or forward-deployed roles
translating ambiguous stakeholder needs into concrete deliverables.
* Experience running evaluations at scale or in a production context.
* Ability to understand and balance between the needs of AI researchers and
domain experts, as well as between researchers and senior decision makers.
* Strong communication skills, low ego, openness to giving and receiving
feedback.
We are located in San Francisco and enthusiastic to work together in-person. We
are open to sponsoring international visas.