Full-Time Senior Manager Site Reliability Engineer
Blackpoint Cyber is hiring a remote Full-Time Senior Manager Site Reliability Engineer. The career level for this job opening is Senior Manager and is accepting Canada based applicants remotely. Read complete job description before applying.
Blackpoint Cyber
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
Blackpoint Cyber is a leading provider of cybersecurity threat hunting, detection, and remediation technology. Founded by former NSA experts, they offer national security-grade solutions to global clients. They are experiencing significant growth.
As a Senior SRE Manager, you will lead infrastructure, reliability, and cost optimization efforts for Blackpoint Cyber's critical services.
Responsibilities include:
- Leading the design, implementation, and management of scalable, reliable cloud infrastructure (AWS/Azure/GCP).
- Establishing SRE best practices (monitoring, incident response, capacity planning, performance tuning).
- Improving observability, monitoring, and alerting for quick issue resolution.
- Driving automation (IaC, CI/CD pipelines) to reduce manual intervention.
- Leading a team of SREs (Coach, Model, Care management style).
- Providing 50% hands-on SRE contributions.
- Designing, implementing, and supporting key infrastructure (automated attack infrastructure, isolated environments, secure data storage).
- Establishing security hygiene and monitoring policies.
- Monitoring and optimizing cloud spending for cost-effectiveness without sacrificing reliability.
- Defining and implementing cost-saving strategies (right-sizing, spot instances, optimized storage).
- Collaborating with finance and procurement for infrastructure cost forecasting and alignment with business objectives.
- Managing and mentoring a team of SREs, DevOps engineers, and cloud infrastructure specialists.
- Collaborating with engineering and security teams to design reliable, scalable architectures and embed reliability in development workflows.
- Ensuring compliance, security hardening, and disaster recovery readiness.
- Driving post-incident reviews for continuous improvement.
Qualifications include:
- 10+ years SRE/DevOps/Cloud Infrastructure experience.
- 3+ years experience leading SRE teams.
- Strong AWS/Azure/GCP expertise, cost management and scaling strategies.
- Proficiency in IaC (Terraform, CloudFormation, Pulumi).
- Hands-on CI/CD pipeline, Kubernetes, and container orchestration experience.
- Monitoring, logging, and observability tools expertise (Prometheus, Grafana, Datadog, Splunk).
- Proven ability to optimize cloud costs while maintaining reliability.
- Strong leadership, collaboration, and problem-solving skills.
Nice-to-haves:
- Cybersecurity/high-security environment experience.
- Understanding of compliance frameworks (SOC2, ISO 27001, FedRAMP).
- Serverless architectures and edge computing knowledge.
- FinOps team experience.