Job Description
Site Reliability Engineer Pune - Kharadi (Hybrid 3days/Week Office) Full time - Creospan Role Overview We are seeking a highly motivated Site Reliability Engineer (SRE) with strong expertise in Dynatrace, AWS Cloud, monitoring, observability, and production support. The ideal candidate will be responsible for ensuring application availability, system reliability, performance optimization, and operational excellence across enterprise-scale environments. This role requires hands-on experience in application monitoring, incident management, troubleshooting, automation, and collaboration with Development, DevOps, and Infrastructure teams to maintain highly available and resilient systems. Key Responsibilities Monitoring & Observability Design, develop, and maintain Dynatrace dashboards, alerts, monitoring profiles, and observability solutions. Configure and manage application performance monitoring (APM), infrastructure monitoring, and distributed tracing. Create and maintain operational dashboards, reports, and service health metrics. Establish proactive alerting and monitoring strategies to identify issues before they impact users. Production Support & Incident Management Monitor application and infrastructure performance to identify bottlenecks, anomalies, and system issues. Investigate and resolve production incidents, defects, and performance-related problems. Participate in critical incident management and on-call support rotations. Perform Root Cause Analysis (RCA) and implement corrective and preventive actions. Ensure adherence to SLAs, SLOs, and operational excellence standards. AWS Cloud & Infrastructure Reliability Support and maintain cloud-native applications hosted on AWS. Analyze system performance, scalability, and reliability within AWS environments. Collaborate with infrastructure teams to optimize cloud resources and improve system resilience. Support high-availability and disaster recovery strategies. DevOps & Automation Support CI/CD deployments, r