Job Description
ABOUT THE ROLE As a Principal Site Reliability Engineer, you will serve as a technical leader responsible for the reliability, scalability, performance, and operational excellence of Accela 's Civic Platform. You will partner closely with Engineering, DevOps, Database Engineering, Security, and Architecture teams to evolve our cloud platform, modernize infrastructure, and ensure our SaaS offerings remain highly available, secure, and cost-effective at scale. This role combines deep technical expertise with strategic influence. You will drive reliability initiatives, define operational standards, mentor engineers, and lead complex technical efforts that improve the resiliency and efficiency of our platform. Your focus is simple: keep systems resilient, scalable, secure, and continuously improving. SPECIFIC RESPONSIBILITIES Serve as a technical leader for reliability engineering, operational excellence, and platform modernization across the Civic Platform. Drive platform modernization initiatives, including the continued evolution from VM-based architectures toward containerized and cloud-native services, in partnership with DevOps Engineering, Database Engineering, Security, and Development teams. Lead efforts that improve and sustain the availability, performance, scalability, security, and cost efficiency of Accela 's SaaS offerings. Define, implement, and operate service level objectives (SLOs), service level agreements (SLAs), and error budgets for critical platform services, using data to drive prioritization and risk-based decision making. Lead observability initiatives across metrics, distributed tracing, logging, and monitoring platforms to improve system visibility and accelerate issue detection and resolution. Drive Root Cause Analysis (RCA) efforts for complex production incidents, facilitate blameless postmortems, and ensure corrective actions are implemented and tracked to completion. Design, develop, and maintain automation, tooling, and software soluti