Full-Time Software Engineer, Site Reliability (Senior or Staff)
Biorender is hiring a remote Full-Time Software Engineer, Site Reliability (Senior or Staff). The career level for this job opening is Expert and is accepting Northern America based applicants remotely. Read complete job description before applying.
Biorender
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
At BioRender, we’re on a mission to accelerate the world’s ability to learn, discover, and communicate science — transforming how knowledge is shared and making science open, collaborative, and easily understandable by all. We’re shaping the future of science communication and are looking for talented individuals to help bring this vision to life! 🚀
As our Sr/Staff SRE in the Platform Engineering team, you'll get in on the ground floor and play a pivotal role in developing and shaping a resilient, high-performant, and secure platform for BioRender’s engineering prowess.
Your objective is to design, implement, and maintain robust, scalable, and fault-tolerant systems that our customers rely on. Harnessing the power of automation, CI/CD, and Infrastructure as Code, you'll seamlessly integrate and deploy our applications into the cloud while establishing observability enhanced with actionable alerts and automation to detect performance bottlenecks.
You'll adeptly address production issues, promptly restore services, and lead post-mortems to continually enhance our engineering excellence.
What you’ll be doing:
- Enhance platform resilience by constantly seeking ways to improve the reliability, scalability, and release efficiency of the platform.
- Define, build, deploy, maintain, and extend advanced observability and monitoring tools to bolster system reliability and availability.
- Formulate and track Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to establish precise benchmarks for system performance.
- Swiftly respond to escalated incidents, troubleshoot intricate system and application problems, and conduct thorough root cause analyses to implement corrective measures.
- Stay up to date with the latest industry trends and emerging technologies and iterate on best practices to increase the quality & velocity of development and deliverables.
- Lead in the design and architecture of scalable, distributed, fault-tolerant systems that uphold performance and reliability standards.
- Champion the adoption of new technologies, disseminate best practices, and advocate for architectural patterns. Mentor and guide fellow engineers.
What you bring to the table:
- 10-12+ years of experience in the software/DevOps/SRE realm.
- Strong programming skills in 2 or more of these languages: javascript, typescript, python, Go
- Ability to troubleshoot complex distributed systems at scale
- Database Performance Monitoring and best practices.
- Comfortable innovating and establishing new practices, processes, and tooling.
- Strong analytical skills, system design, and architecture for cloud applications
- CI/CD, configuration management, monitoring, and automation expertise
- Advanced knowledge of observability and best practices (ELK, Datadog, OpenTelemetry, Prometheus, Grafana).
- Deployment and orchestration via AWS ECS, k8s, CloudRun etc.
- Understanding of Linux, virtualization, networking, VPCs, firewalls, security groups.
- Hands-on knowledge of AWS and resources provisioning via CLI/API/IaC.
- Bachelor's degree in Computer Science, similar technical field of study, or equivalent practical experience.