Full-Time Site Reliability Engineer
SGS is hiring a remote Full-Time Site Reliability Engineer. The career level for this job opening is Experienced and is accepting Remote based applicants remotely. Read complete job description before applying.
SGS
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
Site Reliability Engineer will ensure the reliability, supportability, scalability, and performance of our .NET stack applications. Collaborate with developers and product operations to understand application requirements and translate them into operational practices.
Design, implement, and maintain infrastructure automation tools using Infrastructure as Code (IaC) methodologies.
Monitor application health and performance metrics, proactively identifying and resolving issues. Implement incident response procedures to ensure timely resolution of outages.
Establish and improve best practices for product solution design/architecture and development.
Participate in peer and team code reviews, by developing comprehensive coding standards and guidelines to ensure consistency, maintainability, and quality in software development.
Collaborate with engineers to develop and implement disaster recovery plans.
Continuously improve monitoring and alerting processes to ensure efficient problem identification and resolution.
Stay updated on the latest advancements in .NET infrastructure and SRE best practices.
Must Haves:
- Actively involved in applying established architectural, coding best practices, and conducting code reviews.
- Critical and advance understanding of supportability and maintainability KPIs.
- Strong development background in C#/.NET; Strong development background and knowledge is required.
- Experienced with at least one programming language.
- Senior level understanding with Azure cloud services and Azure DevOps.
- Deep level knowledge in at least one of Azure Pipelines, Releases, Ansible, Puppet or Chef.
- Bachelor degree.
- Minimum 3+ years of experience in a related technical role (e.g., Systems Administrator, Network Engineer)
- Experience with configuration management tools (Ansible, Puppet, or Chef).
- Azure experience required
- Familiarity with monitoring and alerting tools (.NET performance counters, Azure App Insight, Prometheus, Grafana) is a plus.
- Ability to manage and coordinate multiple projects.
- Strong understanding of system administration principles, including operating systems (Windows Server) and networking concepts.