Job Description
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role The Domain Scaling team has the goal to make Claude world-class at real-world knowledge work in domains like finance, healthcare, and legal. This is a unique role that combines executing directly on applied research and data sourcing (real-world and synthetic) to improve our models. You'll own the end-to-end process of creating RL environments for new capabilities: identifying high-value tasks, designing reward signals, managing vendor relationships, and measuring impact on model performance. Responsibilities Own the data strategy for knowledge work verticals end-to-end, from task sourcing through RL training Manage technical relationships with external data vendors, including evaluation of data quality and reward design Collaborate with domain experts to design data pipelines and evaluations Explore novel ways of creating RL envs for high value tasks Develop and improve QA frameworks to catch reward hacking and ensure env quality Run generalization experiments to measure how data strategy changes improve model capabilities Partner with other RL research teams and product teams to translate capability goals into training envs and evals You may be a good fit if you Have experience with fine-tuning large language models for specific domains or real-world use cases Have experience with reinforcement learning, reward design, or training data cura