Full-Time Sr Site Reliability Engineer
Blackpoint%20cyber is hiring a remote Full-Time Sr Site Reliability Engineer. The career level for this job opening is Experienced and is accepting Canada based applicants remotely. Read complete job description before applying.
Blackpoint%20cyber
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
Blackpoint Cyber is a leading cybersecurity firm providing threat hunting, detection, and remediation technology. Founded by NSA experts, they offer national security-grade solutions. They are seeking an experienced Senior SRE Engineer to join their hyper-growth team.
Job Overview: As a Senior SRE Engineer, you'll design, implement, and maintain infrastructure and CI/CD pipelines, focusing on automation, scalability, and performance. Collaborating with cross-functional teams, you'll ensure system reliability and foster continuous improvement.
Key Responsibilities:
- Design, build, and maintain scalable infrastructure using Terraform and Terragrunt for cloud resource automation.
- Manage cloud environments (especially AWS), optimizing cost, security, and high availability.
- Manage and scale data streaming platforms using Confluent Cloud and Kafka.
- Deploy and manage Redis instances for caching and real-time data processing.
- Implement and maintain monitoring and alerting solutions (Prometheus, Grafana, Alert Manager, OpsGenie) for system reliability.
- Enable feature flag management and controlled rollouts using LaunchDarkly.
- Manage Kubernetes clusters using Kubernetes, Helm, ArgoCD, Istio, and Kustomize for continuous delivery and infrastructure-as-code.
- Collaborate with development teams for seamless service integration.
- Troubleshoot complex system issues ensuring high performance and uptime.
- Continuously improve automation tools, processes, and methodologies.
- Stay updated on emerging SRE trends and technologies.
Skills & Qualifications:
- 8+ years experience as a Senior SRE Engineer, focusing on cloud infrastructure and automation.
- Expertise in Infrastructure as Code (IaC) using Terraform and Terragrunt.
- Deep knowledge of AWS cloud services and secure, scalable architectures.
- Experience with Confluent Cloud and Kafka for distributed data streaming.
- Strong experience with Redis and RDS data storage.
- Experience with OpenSearch/ElasticSearch/ChaosSearch.
- Proficiency in monitoring and alerting (Prometheus, Grafana, Alert Manager, OpsGenie).
- LaunchDarkly experience for feature flag management.
- Extensive Kubernetes cluster management, including Helm, ArgoCD, Istio, and Kustomize.
- Strong problem-solving and communication skills.
Nice to Have:
- Multi-cloud experience (GCP, Azure)
- Security best practices in cloud and containerized environments
- Serverless architectures and CI/CD tools (Jenkins, GitHub Actions)
- Development experience in NodeJS/Python/GoLang