Senior Manager, Infrastructure and SRE
at SolarWinds (View all jobs)
Austin, Texas
Req ID: 201956
At SolarWinds, we’re a people-first company. Our purpose is to enrich the lives of the people we serve—including our employees, customers, shareholders, Partners, and communities. Join us in our mission to help customers accelerate business transformation with simple, powerful, and secure solutions.
The ideal candidate thrives in an innovative, fast-paced environment and is collaborative, accountable, ready, and empathetic. We’re looking for individuals who believe they can accomplish more as a team and create lasting growth for themselves and others. We hire based on attitude, competency, and commitment. Solarians are ready to advance our world-class solutions in a fast-paced environment and accept the challenge to lead with purpose. If you’re looking to build your career with an exceptional team, you’ve come to the right place. Join SolarWinds and grow with us!
We seek a Senior Manager (Infrastructure & Site Reliability Engineering) with extensive experience in AWS, Azure, Kubernetes to lead our Site Reliability Engineering (SRE) team. The ideal candidate will manage a team of SREs and ensure our cloud infrastructure and services availability, reliability, and scalability. The successful candidate will deeply understand SRE practices and have a track record of implementing high-quality site reliability engineering practices (SLAs, SLOs, Proactive Alert Management, Incident Response/Review, Post Mortems, Capacity Planning, Costs Management etc).
Responsibilities:
- Manage and lead a team of SREs responsible for ensuring the reliability and availability of our cloud infrastructure and services
- Develop and implement site reliability engineering practices to improve service availability, performance, and scalability
- Collaborate with cross-functional teams to design and implement new features and services that meet customer needs and business requirements
- Develop and implement incident response plans and post-incident reviews to identify root causes and prevent future incidents
- Monitor and analyze system performance metrics to identify and resolve performance bottlenecks
- Build and maintain relationships with key stakeholders, including internal customers and vendors
- Stay up-to-date with industry trends and emerging technologies related to site reliability engineering and cloud infrastructure
Qualifications:
- Bachelor’s degree in Computer Science or related field, or equivalent work experience
- 8+ years of experience in site reliability engineering, infrastructure engineering, or a related field
- 4+ years of experience managing and leading a global team of SREs or infrastructure engineers
- Must have extensive experience with AWS, Kubernetes in a SaaS product with >$100M ARR
- Experience with developing and implementing site reliability engineering practices and incident response plans
- Programming skills with a high-level language like Python/Go and IaC tools like Terraform, Pulumi.
- Expertise with distributed systems for large-scale data processing and stream processing.
- Strong analytical and problem-solving skills
- Excellent communication and interpersonal skills
- Demonstrated ability to build and maintain relationships with internal and external stakeholders
SolarWinds is an Equal Employment Opportunity Employer. SolarWinds will consider all qualified applicants for employment without regard to race, color, religion, sex, age, national origin, sexual orientation, gender identity, marital status, disability, veteran status or any other characteristic protected by law.
All applications are treated in accordance with the SolarWinds Privacy Notice: https://www.solarwinds.com/applicant-privacy-notice