Job Description
The Observability Visibility SRE Team is part of the Observability and Resilience Enablement group within the SRE/Security organization. Observability and Resilience Enablement focuses on closing the loop between how Datadog engineers detect and respond to issues and incidents and how those learnings translate into measurable risk reduction and lower customer impact. The Observability Visibility team carries the organization's 100% visibility priority, defining observability and reliability baselines and ensuring services consistently meet them by default through scalable, automated, and sustainable solutions. As a Senior Software Engineer on this team, you will help define, implement and evolve observability and resilience standards across Datadog's engineering organization. You will build systems, tooling, libraries, and automation that make observability and reliability the default experience for service owners, reducing operational risk while driving adoption and consistency. This role combines software engineering and site reliability engineering to drive measurable improvements in engineering effectiveness and service resilience. You will work closely with SRE, platform and product teams to identify gaps, deliver scalable solutions and ensure long-term coverage and compliance with established standards. At Datadog, we place value in our office culture - the relationships and collaboration it builds and the creativity it brings to the table. We operate as a hybrid workplace to ensure our Datadogs can create a work-life harmony that best fits them. What You'll Do: Define and evolve observability and resilience baselines, ensuring alignment with measurable risk reduction goals across Datadog services. Measure service compliance against established standards, assess risk and remediation complexity and drive sustainable solutions to close identified gaps. Design and deliver scalable o