Full-Time Site Reliability Engineering Manager
Daxko is hiring a remote Full-Time Site Reliability Engineering Manager. The career level for this job opening is Manager and is accepting Birmingham, AL based applicants remotely. Read complete job description before applying.
Daxko
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
Site Reliability Engineering Manager
Manage all production assets for each product. Responsibilities include batching, upgrading, deploying new servers, organizing team workload, supporting engineering efforts, compliance, uptime, and performance monitoring.
Prioritize, organize, and lead team execution. Assess operational capabilities and performance to ensure on-time delivery of quality products to internal and external customers.
- Set and communicate performance targets and goals
- Evaluate and provide real-time feedback
- Train team members for their specific roles
- Coordinate on-call rotation
- Coordinate staff training
- Assist in resolving emergencies (infra/software outages)
- Manage headcount and staffing decisions (hiring/terminations)
Daily responsibilities include:
- Oversee progress in achieving operational goals (quality, cost, customer service)
- Responsibility for uptime, data accuracy, and integrity
- Interact with Engineering Leads for team alignment
- Maintain business continuity for all production assets
- Proper planning and prioritization using agile practices
- Ensure operations comply with company and regulatory requirements
- Technical escalation point for your team
- Provide weekly reports on system availability, response, and capacity
- Manage on-call rotation
- Budget responsibilities (hosting and software licensing)
Requirements:
- Bachelor's degree (technical discipline preferred) or equivalent experience
- 3-5 years managing globally distributed team members
- 3-5 years in Site Reliability Engineering
- Strong foundation in:
- Linux
- Web Servers (NGiNX, PHP, Traefik, F5)
- Virtualization Technologies (VMWare)
- Cloud Platforms (AWS, Azure)
- Containerization Systems (Docker, Kubernetes, Dynos)
- Caching technology (Redis, rabbitmq)
Other Skills
- Strong security mindset and experience implementing security controls
- Excellent organizational and time management skills
- Proven ability to meet deadlines
- Strong analytical and problem-solving skills
- Strong supervisory and leadership skills
- Ability to prioritize and delegate tasks
- Bonus: Experience with Monitoring Technologies (creating custom checks, managing alert profiles, OpenTelemetry, Instana, LogicMonitor, PagerDuty, OpsGenie)
- Bonus: Experience with Tooling (GitLab CI, Jenkins, Chef, Terraform, Elastic Search, Kubernetes, Rancher)
- Bonus: Scripting experience (Ruby, Python, Bash)
- Bonus: Experience with SOC, PCI, GDPR standards and regulations
- Bonus: Experience working tickets and managing priorities (Atlassian Suite, etc.)
- Bonus: Experience developing or supporting Java, PHP, or Node applications
- Bonus: Experience automating repetitive tasks