Full-Time Senior Manager of Reliability Engineering
Prizepicks is hiring a remote Full-Time Senior Manager of Reliability Engineering. The career level for this job opening is Senior Manager and is accepting USA based applicants remotely. Read complete job description before applying.
Prizepicks
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
At PrizePicks, we are the fastest-growing sports company in North America, as recognized by Inc. 5000. As the leading platform for Daily Fantasy Sports, we cover a diverse range of sports leagues, including the NFL, NBA, and Esports titles like League of Legends and Counter-Strike. Our team of over 450 employees thrives in an inclusive culture that values individuals from diverse backgrounds, regardless of their level of sports fandom. Ready to reimagine the DFS industry together?
Job Overview:As the Senior Manager of Reliability Engineering, you will assume a pivotal role in shaping and implementing our reliability strategy, overseeing the performance, stability, and resilience of our critical infrastructure. You will provide leadership and mentorship to a talented team of Database Reliability Engineers (DBREs) and Site Reliability Engineers (SREs), fostering a culture of collaboration, innovation, and continuous improvement.
What you’ll do:
- Lead Reliability Strategy: Develop and execute a comprehensive reliability strategy that aligns with our business objectives and ensures the high availability, performance, and scalability of our systems and applications.
- Team Leadership: Provide inspirational leadership and guidance to a team of DBREs and SREs, fostering their professional growth and development.
- Monitoring and Observability: Implement comprehensive monitoring and observability solutions to gain real-time insights into system health and performance.
- Incident Response: Oversee the incident response process, ensuring swift resolution of incidents, review and documentation of causes and contributing factors, and implementation of measures to prevent recurrence.
- Performance Optimization: Drive initiatives to optimize system performance, identify and resolve bottlenecks, and proactively address potential issues.
- Capacity Planning: Conduct capacity planning and forecasting to ensure adequate resources are available to meet current and future demands.
- Automation and Tooling: Champion the adoption of automation and tooling to enhance efficiency, reduce toil, and improve operational effectiveness.
- Collaboration: Foster strong partnerships with cross-functional teams, including development, operations, and security, to ensure alignment and collaboration on reliability initiatives.
- Continuous Improvement: Cultivate a culture of continuous improvement, encouraging experimentation, learning, and the adoption of best practices.