Full-Time Principal DevOps Engineer
Upstart is hiring a remote Full-Time Principal DevOps Engineer. The career level for this job opening is Senior Manager and is accepting USA based applicants remotely. Read complete job description before applying.
Upstart
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
As a Principal DevOps Engineer on our Cloud Platform Team at Upstart, you will be a technical leader responsible for driving major architectural changes, executing on technical vision, and fostering healthy collaboration within and across teams. You will play a critical role in shaping the future of Upstart’s Cloud Platform while balancing business-driven initiatives that support our mission of providing affordable credit through data-driven decisions.
Position Location - This role is available in the following locations: Remote, San Mateo, Columbus, Austin
Time Zone Requirements - This team operates on the East/West Coast time zones.
Travel Requirements - This team has regular on-site collaboration sessions. These occur 2-4 days per quarter at one of our company offices. If you need to travel to make these meetups, Upstart will cover all travel related expenses.
How you’ll make an impact:
- Strategic Technical Leadership: Define and drive the near, mid, and long-term technical vision for Upstart’s Cloud Infrastructure, Compute/Kubernetes, and Service Mesh technologies. Ensure that the Cloud Platform is aligned with the company’s broader technical and business goals.
- Cross-functional Collaboration: Work closely with adjacent platform teams to ensure technical alignment and foster collaboration across the organization. Build and maintain strong relationships with other teams to ensure cohesive systems architecture and technical implementations.
- Infrastructure and Compute Ownership: Lead the management and continuous improvement of Infrastructure as Code, Kubenetes/EKS compute , GitOps tooling and pipelines, and service meshes. Implement and refine self-service solutions that empower product teams to efficiently deploy services and infrastructure.
- Architectural Excellence: Drive innovation in distributed computing, orchestration, and service mesh technologies. Ensure the Upstart’s Cloud Platform is scalable, reliable, and efficient, meeting the demands of Upstart’s rapidly growing product offerings.
- Mentorship and Influence: Mentor engineers across the platform teams, sharing knowledge and best practices in cloud infrastructure, service mesh technologies, and DevOps methodologies. Serve as a role model for operational excellence and continuous improvement.
- Operational Excellence: Establish and drive the development of key performance indicators (KPIs), service level indicators (SLIs), and service level objectives (SLOs) that are crucial to the reliability, scalability, and efficiency of the Cloud Platform. Collaborate closely with SRE and adjacent teams to ensure these metrics are aligned with broader business objectives and provide actionable insights for continuous improvement. Lead initiatives to monitor and optimize these metrics, ensuring the Cloud Platform consistently meets or exceeds its operational targets.
What we’re looking for:
- Minimum requirements:
- 8+ years of experience in a DevOps or related role
- 3+ years of experience working with Ruby/Python or similar programming languages
- Expertise with major public cloud providers, particularly AWS
- In-depth knowledge of distributed computing and orchestration (e.g., Kubernetes) and managed Kubernetes services
- Experience with Kubernetes scaling technologies, and monitoring & logging tool stacks
- Expertise in Infrastructure as Code tools (e.g., CDK, Terraform) and GitOps practices
- Strong understanding of service mesh technologies and their role in service-oriented architectures
- Deep knowledge of Linux architecture, administration, troubleshooting, and production operations
- Proven experience as a technical leader, with the ability to work across domains and influence technical direction
- Aptitude and willingness to mentor others, with a balance of humility and confidence
- Experience in leading incident response and remediation efforts, with a strong grasp of SLI/KPI metrics that inform SLO definitions
- Preferred qualifications:
- Experience working with Developer Experience teams to create low-friction interfaces for cloud infrastructure and service deployment.
- Strong knowledge of SRE best practices and experience collaborating with SRE teams.
- Experience with performance and load testing tooling.
- Familiarity with Ruby, Python, and Kotlin applications.