Full-Time Director of Site Reliability Engineers
Cyware is hiring a remote Full-Time Director of Site Reliability Engineers. The career level for this job opening is Manager and is accepting USA based applicants remotely. Read complete job description before applying.
Cyware
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
About CywareCyware delivers an innovative approach to cybersecurity that unifies threat intelligence, automation, threat response, and vulnerability management with data insights gleaned from assets, users, malware, attackers, and vulnerabilities. Cyware’s Cyber Fusion platform integrates SOAR and TIP technology, enabling collaboration across siloed security teams. Cyware is widely deployed by enterprises, government agencies, and MSSPs, and is the leading threat intelligence sharing platform for global ISACs and CERTs.
About you: You are driven, inquisitive, proactive, and energetic. You have a growth mindset and are committed to delivering results. You thrive in a fast-paced, collaborative environment.
Why We Are Hiring:The Director of Site Reliability Engineers (SREs) is responsible for managing the entire SRE function at Cyware. The SREs are responsible for keeping all user-facing services and other Cyware production systems running smoothly.
The Director, along with the SREs, are a blend of pragmatic operators and software craftspeople who apply sound engineering principles, operational discipline, and mature automation to our production operating environments.
What You Will Do:
- Team Leadership & Mentorship: Guide and develop SREs, setting clear goals and fostering a high-performance culture.
- Observability & Incident Response: Ensure system monitoring, drive root cause analysis, and support on-call teams to meet SLAs.
- Automation & Efficiency: Lead efforts to automate deployments, infrastructure provisioning, and operational tasks to minimize human error.
- Data-Driven Decision Making: Define and measure SRE metrics (SLIs, SLOs, SLAs) and drive continuous improvement.
- Production Governance: Oversee high availability (HA), disaster recovery (DR), and compliance monitoring.
- Infrastructure & Cost Optimization: Manage and optimize cloud infrastructure using tools like Terraform, Kubernetes, and Jenkins.
- Release & Maintenance Management: Ensure smooth deployments, operational readiness, and security compliance.
- Cross-Team Collaboration: Work across time zones to coordinate with engineering, security, and operations teams.
Who You Are
- US Citizenship is a requirement of this position in accordance with 8 U.S.C 1324b(a)(2)(C)
- Bachelor's degree or higher, in Computer Science, Engineering, IT or related discipline
- 7 to 10 Years of total experience as an SRE
- 4 to 6 Years of experience managing a team of SREs
- Experienced in knowledge sharing and mentoring of Team members
- Self-awareness, handling conflict in the team, and providing and receiving feedback
- Accountability: willing to proactively step in and do the right thing while providing candid and constructive feedback
Cloud: AWS/Azure/GCP
Linux: Solid understanding of Linux Systems, sed/awk/grep/egrep, VI/VIM/Emacs, netstat, lsof, strace, ps/top/atop/dstat, grub boot config & systems rescue, fstab/disk labels, ext3/ext4, IPtables, sysstat (sar/vmstat/iostat etc), run-levels & startup scripts, sudo/chroot
Scripting: Bash/Python
Development Languages and Frameworks: Python/Django, Vue, React, Go Lang
Fundamentals: Basic DNS & Networking, TCP/UDP, IP Routing, HA & Load Balancing Concepts
Application Protocols: SMTP, HTTP, HTTPS, FTP, IMAP, POP
Good to have
- Applications: Database Systems Fundamentals (MySQL/Postgres), Redis, Nginx/Apache
- Tools/Utilities: Nagios, Yum, RPM, GIT, Grafana, Prometheus, New Relic, ELK, Docker, Jenkins
- Certifications: RHCSA/RHCE/AWS (SysOps)