Full-Time Senior Site Reliability Engineer
Sigma Software is hiring a remote Full-Time Senior Site Reliability Engineer. The career level for this job opening is Senior Manager and is accepting Brazil based applicants remotely. Read complete job description before applying.
Sigma Software
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
Company DescriptionWe have an excellent opportunity for a bright, smart, and highly motivated Senior Site Reliability Engineer to join our mature project team. You have a unique chance to become part of our team and work with best practices and methodologies. This role empowers you to take the lead and excel to your fullest potential.
CustomerOur customer is a renowned technology company based in New York, specializing in providing cutting-edge solutions in the realm of video advertising. The products field millions of queries per second and consume 125 GB of data per minute.
ProjectWe're seeking a skilled Senior Site Reliability Engineer responsible for the Cloud Infrastructure and Observability solutions for the Client's platform. Ensuring all systems run smoothly is a key responsibility. The project is an easy-to-use, massive-scale, and highly available demand-side platform. Backed by Amazon Web Services and Kubernetes, the team has embraced Infrastructure as code to manage thousands of applications, servers, and containers running in multiple regions worldwide.
Job Description
- Design and build infrastructure and tooling for high scalability, reliability, and sub-second performance levels using security industry best practices.
- Write code and scripts to support Infrastructure as code (IaC), configuration management, and automated incident resolution.
- Support and extend the observability stack to capture and alert on any system issues.
- Participate in on-call rotations and be an escalation contact for service incidents.
- Write systems documentation, troubleshoot playbooks, and other instruction manuals.
- Other duties and responsibilities as assigned.
Qualifications
- Bachelor's or higher degree in computer science, computer engineering, relevant technical field, or equivalent practical experience
- Expertise with architecture solutions and system design
- Experience in analyzing and troubleshooting large-scale distributed systems
- At least 6 years of administration experience with Linux, AWS, and Kubernetes
- At least 6 years of experience in configuration management using Cloud Formation, Terraform, and Ansible or similar
- At least 3 years of experience with Python
- Strong problem-solving skills
- Strong verbal/written communication skills
- At least an Upper-Intermediate level of English