Full-Time Site Reliability Engineer
Neondatabase is hiring a remote Full-Time Site Reliability Engineer. The career level for this job opening is Experienced and is accepting Worldwide based applicants remotely. Read complete job description before applying.
Neondatabase
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
Neon aims to be the go-to platform for serverless Postgres with additional features like branching and autoscaling.
Currently, we are serving 750k databases and want to grow that number, along with delivering more features, without compromising on reliability and scalability.
This is where our SRE team comes into the picture.
The SRE team is responsible for managing Neon’s multi-region, multi-cloud deployment in close collaboration with the broader engineering team, as well as improving the reliability of the overall platform.
All the features we want to implement can only reach our customers if the changes are delivered in a reliable way, which means the SRE team plays a significant role in defining our pace of development.
Successful candidates will get the opportunity to contribute to the effort of evolving Neon to become multi-cloud so that we can be as close as possible to our customers while also making decisions about how to best utilize different cloud technologies.
They will also take part in refining and improving our existing infrastructure so that stability and scalability complement the delivery of new features and services.
Neon's foundations are built on open-source software. If you want to take a look into what makes Neon work, feel free to browse https://github.com/neondatabase/neon (storage layer of databases) and https://github.com/neondatabase/autoscaling (autoscaling of databases), as well as our engineering blog.
SREs frequently work with stakeholders in different teams. These repos provide a sneak peek of what the Neon engineering team is capable of producing.
You will:
- Join an experienced team and contribute to the foundation all of Neon is built upon
- Contribute to building a stable and cost-efficient infrastructure foundation
- Play a key role in ensuring we are proactive instead of reactive on infrastructure and reliability
- Coach your fellow engineers on cloud, infrastructure, and reliability topics
- Be ready to join an on-call rotation
Requirements:
- 4+ years of experience working in Site Reliability Engineering
- Experience with cloud infrastructure components in Azure and/or AWS
- Experience in a complex Linux infrastructure environment
- Experience focusing on building repeatable and cost-efficient infrastructure
- Experience building solutions for problems with no answers on Google
- Experience working with monitoring solutions in the Prometheus ecosystem; Grafana, Loki, Tempo, VictoriaMetrics
- Experience managing multi-cluster, multi-cloud Kubernetes deployments
Nice to have:
- Familiarity with Go, GitOps (e.g., Flux, ArgoCD), Postgres, Virtualization (QEMU/KVM)
Our stack: AWS, Azure, Terraform, Grafana Cloud, VictoriaMetrics, Flux, EKS/AKS