Job Description
We are looking for research interns to work on foundational areas for coding language models, including pre-training data, mid-training data, synthetic data generation, evaluation, and agentic coding. Responsibilities * Explore data-centric methods for improving coding LLMs, including data filtering, quality assessment, deduplication, data mixture, and diversity analysis. * Build synthetic data and evaluation pipelines for code generation, code editing, repo-level reasoning, tool use, and multi…