Full-Time Director of Reliability
Upstart is hiring a remote Full-Time Director of Reliability. The career level for this job opening is Manager and is accepting USA based applicants remotely. Read complete job description before applying.
Upstart
Job Title
Posted
Career Level
Career Level
Locations Accepted
Salary
Share
Job Details
About Upstart: Upstart is a leading AI lending marketplace that partners with banks and credit unions to increase access to affordable credit. Upstart-powered banks and credit unions can improve approval rates and reduce loss rates across demographics, while delivering a superior digital lending experience. Over 80% of borrowers are approved instantly with zero documentation.
About the Role: As the Director of Reliability, you'll be a strategic leader ensuring our platform is consistently online, highly performant, and scalable. You will lead the Site Reliability Engineering (SRE), Compute, Quality, Runtime and Deployment teams to build resilient systems, promoting automation, observability, and incident excellence. This role is crucial to Upstart's mission of making credit more accessible and fair.
Impact:
- Proactively prevent downtime and service disruptions by implementing robust monitoring, alerting, and automation.
- Optimize system performance to improve response times and enhance customer satisfaction.
- Champion automation and observability to reduce manual toil and free up engineering teams.
- Lead the development of a world-class incident response process, ensuring quick resolution of outages.
- Empower teams with SRE best practices, breaking down silos between development, operations, and security.
- Align SRE initiatives with business objectives to balance reliability with innovation.
Requirements:
- 10+ years of experience in software engineering, DevOps, or SRE, with 5+ years in a leadership role.
- Proven experience leading large-scale, mission-critical distributed systems with a focus on reliability, scalability, and security.
- Expertise in cloud platforms (AWS, Azure, or Google Cloud).
- Strong background in observability tools (Prometheus, Grafana, Datadog, New Relic, or Splunk).
- Experience with infrastructure as code (Terraform, CloudFormation) and containerization (Docker, Kubernetes).
- Strong understanding of networking, security, and performance optimization.
- Demonstrated success in building high-performing SRE teams and implementing best practices.
Preferred Qualifications:
- Experience building and leading high-impact teams.
- Experience with large-scale distributed systems in AWS.
- Ability to influence and lead without direct authority.
- Strong product and analytical mindset.
- Experience in rapid growth environments.
Location and Compensation: This role is available remote, San Mateo, Columbus, Austin. The team operates on East/West Coast time zones. Majority of work can be accomplished remotely. Occasional onsite required. Compensation ranges from $217,400 to $300,900 USD.