Contractor Site Reliability Engineer
Nordsec is hiring a remote Contractor Site Reliability Engineer. The career level for this job opening is Experienced and is accepting Europe based applicants remotely. Read complete job description before applying.
Nordsec
Job Title
Posted
Career Level
Career Level
Locations Accepted
Salary
Share
Job Details
The Infrastructure department is responsible for influencing and tracking change, providing frontline support, and delivering software-defined solutions.
Are you excited by the challenge of managing large-scale systems, automating infrastructure, and ensuring seamless service reliability?
We’re seeking a Site Reliability Engineer (SRE) to play a key role in shaping the future of our global infrastructure.
Overseeing a global infrastructure of ~10,000 on-prem servers, you’ll tackle unique technical challenges, engineer scalable systems, and have a direct impact on the reliability and performance of our products.
Main Responsibilities
- Build Reliable Infrastructure: Design, develop, and maintain highly available, scalable systems.
- Automate Everything: Create and optimize automation workflows to streamline deployments, improve speed, and eliminate manual overhead.
- Ensure Observability: Build monitoring and alerting systems that provide deep visibility into performance, reliability, and health
- Solve Complex Issues: Troubleshoot, debug and resolve critical issues in complex systems.
- Collaborate & Innovate: Work closely with QoS and operations teams to enhance reliability, develop new features, and drive technical excellence.
Core Requirements
- Linux Expertise: Good knowledge of Linux systems, particularly Debian-based distributions.
- Automation Skills: Hands-on experience with configuration management tools such as SaltStack/Ansible or similar solutions.
- Programming: Proficiency in Python for building automation scripts and tools.
- Observability Knowledge: Experience with monitoring tools and frameworks to enhance system observability.
- TCP/IP Networking Concepts: A solid understanding of TCP/IP networking protocols and concepts.
- Problem-Solving Skills: Proven ability to debug and troubleshoot complex systems effectively.
Tools You Will Use
- Operating Systems: Linux (Debian)
- Firewalls: NFtables
- Load Balancing & Proxying: HAProxy, NGINX
- Containers: Docker
- Automation Frameworks: SaltStack
- KV store: Redis, Consul
- Analytics: Elasticsearch, Victoria Metrics, Grafana
- Programming Languages: Python