Staff Backend Engineer, Voices

Synthesia•Europe•🌍 Remote

Full-timeSenior

👁️ 2 views•📝 0 applications•Posted 6/16/2026•Expires 7/20/2026

Tailor Resume for This Job Check ATS Score View Original Posting ↗

Job Description

Synthesia is the world’s leading AI video platform for business, used by over 90% of the Fortune 100. Founded in 2017, the company is headquartered in London, with offices and teams across Europe and the US. As AI continues to shape the way we live and work, Synthesia develops products to enhance visual communication and enterprise skill development, helping people work better and stay at the center of successful organizations. Following our recent Series E funding round, where we raised $200 million, our valuation stands at $4 billion. Our total funding exceeds $530 million from premier investors including Accel, NVentures (Nvidia's VC arm), Kleiner Perkins, GV, and Evantic Capital, alongside the founders and operators of Stripe, Datadog, Miro, and Webflow. About the role You will work on the core speech and voice generation experience at Synthesia, building the platform that sits at the critical path of script creation and video generation. You will design and deliver features across the script preview and voice orchestration stack, combining frontend user experiences with backend platform reliability. This includes integrating with multiple Text-to-Speech (TTS) providers, building recommendation systems, and ensuring consistency and quality across all voice outputs. You will take ownership of features from idea through to production, working with loosely defined requirements to scope, prototype, and ship solutions that deliver real user impact. You will build across the stack, including: Backend systems for TTS provider orchestration, handling fallbacks, retries, and load-shedding across multiple providers Frontend experiences that allow users to preview scripts, select voices, and control pronunciation with intuitive interfaces (frontend experience is not a must!) Voice discovery and recommendation systems that guide users to high-quality voices and help them iterate quickly You will frequently work on 0 to 1 problems, such as building new voice quality framewor