Full-Time Sr Staff Site Reliability Engineer (Prisma Access)
Palo Alto Networks is hiring a remote Full-Time Sr Staff Site Reliability Engineer (Prisma Access). The career level for this job opening is Experienced and is accepting Santa Clara, CA based applicants remotely. Read complete job description before applying.
Palo Alto Networks
Job Title
Posted
Career Level
Career Level
Locations Accepted
Salary
Share
Job Details
Palo Alto Networks runs a large infrastructure and is one of the biggest GCP customers. As a Principal SRE, you'll be at the forefront of building and maintaining highly reliable, scalable, and secure cloud infrastructure within a FedRAMP compliant environment. You'll drive operational excellence, champion SRE best practices, and work collaboratively to ensure our systems are robust and performant.
This includes automation, architecture, performance, observability, troubleshooting, security, and reliability. Our Infrastructure Platform stack includes Terraform, Kubernetes, GitLab CI/CD, GitOps, Prometheus, Grafana, Loki, Docker, GCP, Backstage, MySQL, PagerDuty, FireHydrant, Python, Bash, Java, NodeJS and Go.
Your Impact
- Design, build, and operate reliable, secure Cloud infrastructure across multi-cloud environments for our federal customers.
- Lead cross-functional initiatives to ensure applications are production-ready, scalable, secure, and resilient.
- Develop expertise in new technologies, embracing continuous learning and the adoption of AI tools.
- Develop tools and automation frameworks, championing Infrastructure as Code (IaC) and Monitoring as Code (MaC) principles.
- Automate robust deployments and orchestrate end-to-end monitoring and alerting solutions.
- Participate in 24/7/365 on-call rotations, including shift and weekend, to support critical business operations and production systems and for incident response.
- Lead root cause analysis of critical issues, driving improvements and preventing recurrence.
- Champion the success of SRE and DevOps initiatives, aligning technical decisions with business goals.
Your Experience
- Must be a US Citizen to be considered.
- 5+ years of experience in Infrastructure, SRE, or DevOps roles.
- BS or MS in Computer Science, a related field, or equivalent professional experience.
- 4+ years of experience with AWS and GCP, and expertise in their architecture, services and PKI concepts for cloud security.
- Expert troubleshooting skills to resolve cloud infrastructure and service issues, effectively identifying root cause and devising effective solutions.
- Proficiency in automation using Python and shell scripting; Golang is a plus.
- Expertise in Infrastructure as Code (IaC) with Terraform and Helm, leveraging AI tools for development.
- Solid experience with Kubernetes, container networking, and container workloads.
- Strong Linux administration skills.
- Proficiency with CI/CD pipelines, GitOps principles, and tooling like GitLab and Jenkins.
- Excellent written and verbal communication skills, with the ability to collaborate effectively to drive outcomes.
- Self-disciplined, self-managed, and highly driven with a strong sense of ownership and urgency.
- Ability to adapt quickly to evolving cloud technologies, security threats, and advancements through continuous learning.
- Effectively address customer needs and provide clear Root Cause Analysis (RCA) to customers.
- A deep understanding of how technical decisions impact the business and an ability to align cloud operations with business goals.