Full-Time Lead Site Reliability Engineer
Livepeer is hiring a remote Full-Time Lead Site Reliability Engineer. The career level for this job opening is Experienced and is accepting USA based applicants remotely. Read complete job description before applying.
Livepeer
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
Lead Site Reliability Engineer
About Livepeer: Livepeer is building the world’s open video infrastructure. Founded in 2017, it is the world’s first open-source protocol for decentralized video streaming, built on Ethereum. The Livepeer network has transcoded billions of minutes, serving Web3 and Web2 platforms. Livepeer AI unlocks Livepeer’s compute network for AI inference workflows, reducing costs and enabling richer video experiences.
Your Role: This position is for an experienced, self-driven SRE Engineer who loves building tools and automating processes to ensure high-quality, smooth production experiences for end-users. You’ll specialize in systems (operating systems, storage, networking, GPU clusters, Docker) and implement best practices for availability, reliability, and scalability, with interests in algorithms and distributed systems.
Responsibilities:
- Provide technical leadership in SRE execution and planning.
- Lead complex infrastructure projects for internal and external stakeholders.
- Orchestrate and run our infrastructure.
- Improve monitoring and reduce/automate manual processes.
- Be on-call (PagerDuty) for incident response.
- Plan infrastructure growth for continued scaling.
- Manage vendor relationships.
- Manage the technical roadmap for the SRE team.
- Monitor and optimize infrastructure costs.
- Support engineers and improve development workflows.
- Communicate directly with large customers.
- Coordinate with team members across time zones.
Experience Required:
- Building a technically competent SRE team through OKRs.
- Developing essential tooling to improve infrastructure operations.
- Experience running global, mission-critical infrastructure.
- Managing systems handling high request volumes.
- Proficiency with Linux, Unix shell, configuration management systems, infrastructure automation tools, and CI/CD pipelines.
- Experience with Kubernetes, Docker, Terraform, Ansible, Nginx, Github Actions, Grafana, Prometheus, Loki, AWS, Google Cloud, major CDN vendors, and video streaming technologies (HLS, RTMP, transcoding).
- Knowledge of Web3/Blockchain, particularly the Ethereum ecosystem.