Senior Manager (Infrastructure and SRE Engineering)

at SolarWinds (View all jobs)

Bangalore, India

Req ID: 100135

At SolarWinds, we’re a people-first company. Our purpose is to enrich the lives of the people we serve—including our employees, customers, shareholders, Partners, and communities. Join us in our mission to help customers accelerate business transformation with simple, powerful, and secure solutions.

The ideal candidate thrives in an innovative, fast-paced environment and is collaborative, accountable, ready, and empathetic. We’re looking for individuals who believe they can accomplish more as a team and create lasting growth for themselves and others. We hire based on attitude, competency, and commitment. Solarians are ready to advance our world-class solutions in a fast-paced environment and accept the challenge to lead with purpose. If you’re looking to build your career with an exceptional team, you’ve come to the right place. Join SolarWinds and grow with us!

Your Role:

We seek a Senior Manager (Infrastructure & Site Reliability Engineering) with extensive experience in AWS, Kubernetes, GitOps, MySQL & ElasticSearch to lead our Site Reliability Engineering (SRE) team. The ideal candidate will manage a team of SREs and ensure our cloud infrastructure and services availability, reliability, and scalability. The successful candidate will deeply understand SRE practices and have a track record of implementing high-quality site reliability engineering practices (SLAs, SLOs, Proactive Alert Management, Incident Response/Review, Post Mortems, Capacity Planning, Costs Management etc).

Your Impact:

  • Manage and lead a team of SREs responsible for ensuring the reliability and availability of our cloud infrastructure and services
  • Develop and implement site reliability engineering practices to improve service availability, performance, and scalability
  • Collaborate with cross-functional teams to design and implement new features and services that meet customer needs and business requirements
  • Develop and implement incident response plans and post-incident reviews to identify root causes and prevent future incidents
  • Monitor and analyze system performance metrics to identify and resolve performance bottlenecks
  • Build and maintain relationships with key stakeholders, including internal customers and vendors
  • Stay up-to-date with industry trends and emerging technologies related to site reliability engineering and cloud infrastructure

Your Experience:

  • Bachelor's degree in Computer Science or related field, or equivalent work experience
  • 8+ years of experience in site reliability engineering, infrastructure engineering, or a related field
  • 4+ years of experience managing and leading a global team of SREs or infrastructure engineers
  • Must have extensive experience with AWS, Kubernetes, GitOps, MySQL, and ElasticSearch, preferably in a SaaS product with >$100M ARR
  • Experience with developing and implementing site reliability engineering practices and incident response plans
  • Strong analytical and problem-solving skills
  • Excellent communication and interpersonal skills
  • Demonstrated ability to build and maintain relationships with internal and external stakeholders

SolarWinds is an Equal Employment Opportunity Employer. SolarWinds will consider all qualified applicants for employment without regard to race, color, religion, sex, age, national origin, sexual orientation, gender identity, marital status, disability, veteran status or any other characteristic protected by law.

All applications are treated in accordance with the SolarWinds Privacy Notice: https://www.solarwinds.com/applicant-privacy-notice