Job Description
Kafka is mission critical at Datadog—used for sending data in near real-time between a vast majority of our services. The Streaming Platform team is responsible for operating the Kafka fleet and building abstractions on top of it so that companies and engineers across the world continue to trust Datadog functionality and metrics. The Streaming Platform team’s goals are reliability, cost efficiency, and usability for our customers. We support several dozen internal customers, with petabytes of storage and hundreds of millions of messages and MiB per second flowing through our infrastructure. At this scale, first-class reliability is an exciting challenge, meaningful cost optimizations can result from clever improvements, and excellent customer satisfaction requires an emphasis on safe automation. Our team is looking for a Senior Software engineer to help us scale and continue improving our kafka fleet. In this position, you’ll regularly interact with our users, dig deep into open source solutions to understand behavior, and build necessary tooling and enhancements to keep kafka running smoothly and efficiently for our customers. At Datadog, we place value in our office culture - the relationships and collaboration it builds and the creativity it brings to the table. We operate as a hybrid workplace to ensure our Datadogs can create a work-life harmony that best fits them. What You’ll Do: Operate a bleeding edge Kafka deployment with exceptional reliability by deeply understanding foundational hardware and cloud-provider building blocks, incident root causes, and performance profiles through detail-oriented investigations. Develop a high-performance streaming platform composed of our control and data planes to ease customer adoption and reduce operational burden. Integrate open source solutions with internal s