Full-Time Site Reliability Engineer
Sinch is hiring a remote Full-Time Site Reliability Engineer. The career level for this job opening is Senior Manager and is accepting USA based applicants remotely. Read complete job description before applying.
Sinch
Job Title
Posted
Career Level
Career Level
Locations Accepted
Salary
Share
Job Details
At Sinch Mailgun, we're building the infrastructure that powers communication at internet scale.
As one of the largest email providers in the world, our platform delivers billions of emails every day for developers, startups, and global enterprises alike.
We’re looking for a Senior Site Reliability Engineer to join our SRE team.
In this role, you will assist in shaping, scaling, and optimizing the critically important infrastructure that underpins each Mailgun service.
You’ll work closely with product engineering teams to drive improvements, automate workflows, and ensure our systems meet the highest reliability standards.
This is more than just keeping the lights on. You’ll be engineering the future of a platform trusted by developers and companies around the globe, solving complex distributed systems challenges, and driving real-world innovation in how email infrastructure is built and operated.
- Responsibilities
- Collaborate with other teams to define and implement system requirements.
- Design, build, and maintain cloud-based microservices infrastructure.
- Automate routine operational tasks and remediation processes to improve efficiency and reliability.
- Proactively fix and resolve issues, collaborating with support teams, other engineering teams, and using monitoring tools to ensure system health.
- Ensure that datastores operate efficiently and meet performance and availability goals.
- Contribute to the team’s growth by mentoring junior engineers and sharing standard methodologies.
- Plan and execute strategies for scaling systems and infrastructure as needs grow.
- Requirements
- Strong background in infrastructure, operations, or software engineering with a focus on reliability.
- Extensive experience working with cloud platforms such as Google Cloud Platform (GCP) or Amazon Web Services (AWS).
- Proficiency in using configuration management tools like Terraform and Ansible to manage infrastructure.
- Hands-on experience with modern monitoring and observability tools such as Prometheus, Grafana, and similar technologies.
- Proven experience with distributed databases (e.g. Cassandra, Elasticsearch) and maintaining their health at scale.
- Familiarity with distributed event stores and stream-processing platforms.
- Strong coding skills in at least one modern programming language (Python, Go, etc.).
- Expertise in running and maintaining production systems in a Linux environment and public cloud infrastructure.
- Demonstrated expertise in architecting solutions for complex technical challenges, and the ability to lead initiatives from conception through to execution.
- Strong interpersonal and communication skills, with a history of building effective relationships with cross-functional teams.
- Ability to mentor and guide junior engineers, fostering a collaborative and inclusive team culture.
Preferred Experience
- Container orchestration platforms
- CI/CD pipeline automation and infrastructure as code practices
- Network architecture and security best practices in cloud environments
- Containerization and microservices architectures
- Advanced problem solving skills, particularly in highly sophisticated and distributed systems