Full-Time Senior Site Reliability Engineer
Veracode is hiring a remote Full-Time Senior Site Reliability Engineer. The career level for this job opening is Senior Manager and is accepting USA based applicants remotely. Read complete job description before applying.
Veracode
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
Senior Site Reliability Engineer
Looking for an innovative, high-growth company in one of the hottest segments of the security market? Look no further than Veracode!
Veracode is recognized as a premier provider of SaaS-based application security solutions, transforming the way companies secure applications in today’s software driven world. We provide our customers with a solid foundation on which to build security into their modern agile development processes. Learn more about us at www.veracode.com!
We are seeking a skilled Senior Site Reliability Engineer to join our team and play a role in the evolution of the team to a full GitOps model. The ideal candidate will have a strong background in DevOps with a focus on security, experience in managing containerized infrastructures, and a deep understanding of CI/CD pipelines. As a Senior Site Reliability Engineer, you will be responsible for ensuring the reliability, security, and scalability of our software security product by using your expertise in DevOps practices and tooling. You will work closely with our engineering, product, and security teams to develop and implement best practices for building, testing, and deploying applications and microservices.
Key Aspects of Role:
- Utilize AWS services to design scalable cloud solutions that support critical systems.
- Partner with engineering teams to ensure monitoring and alerting is in place, enabling consistent, scalable, and automated service delivery.
- Develop and improve monitoring and alerting solutions to guarantee the reliability of applications and services, using tools like Datadog and Sumologic.
- Lead efforts to automate infrastructure deployment and management using Terraform, Kubernetes, and other cloud-native tools.
- Create automated incident response workflows to handle common infrastructure and application issues.
- Collaborate with security teams to ensure systems adhere to industry-standard security practices and policies.
- Document and train engineering teams on best practices in reliability, scalability, and operational excellence.
- Participate in 24x7 on-call rotations to respond to incidents, triage production issues.
- Contribute to incident and process post-mortems.
- Ensure uptime, SLAs, and availability of critical platform components through process improvements and automation.
- Monitor existing application and infrastructure while working to improve existing monitoring.
- Communicate effectively with project stakeholders and management.
- Develop and support processes to maintain uptime, SLAs and availability of critical platform components.
- Troubleshoot and resolve production issues related to systems, network, and application.
- Ensure that our systems and processes adhere to industry-standard security practices and policies.
Required Skills/Experience:
- Bachelor's Degree in Computer Science, Information Science, Engineering, or related/relevant field or equivalent experience.
- 5+ years working in a SRE, DevOps, Cloud Engineering or similar role.
- Experience with AWS and automation tools like Terraform, CloudFormation, or Ansible.
- Hands-on experience deploying, managing, and troubleshooting Kubernetes clusters.
- Proficiency with observability, monitoring, and alerting tools (Datadog, Sumologic, Prometheus, Grafana, etc.).
- Familiarity with CI/CD pipelines and repository management tools (e.g., GitLab, Jenkins, GitHub).
- Strong programming skills for automation (Python, Go, or similar languages).
- Solid understanding of infrastructure as code (IaC) and GitOps methodologies.
- Strong communication skills with the ability to collaborate effectively across different teams.
- Ability to work in an Agile environment.
- Proven experience in troubleshooting production environments and improving system reliability.
- Experience with on-call/incident management systems such as PagerDuty, VictorOps or OpsGenie.
Desired Experience:
- Experience with service meshes (e.g., Istio) to enhance application observability and security.
- Familiarity with advanced Kubernetes features (e.g., StatefulSets, Helm, Operators).
- Knowledge of database management and migration processes, including RDS and DMS
Compensation Transparency
In accordance with U.S. pay transparency laws, Veracode provides compensation transparency for roles based in the United States. Click here to view our compensation ranges by grade. Please note, specific compensation may be influenced by various factors including candidates experience, education, and work location.
Job Grade: Senior
Employment opportunities are available to all applicants without regard to race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
At Veracode, we prioritize a secure recruitment process. Unfortunately, fake recruitment and job offer scams are on the rise. They aim to deceive candidates through emails and calls to obtain sensitive information.
Here’s our recruitment promise to you:
- Comprehensive Interview Process: We never extend job offers without a comprehensive interview process involving our recruitment team and hiring managers.
- Offer Communications: Our job offers are not sent solely through email, and we will never ask you to pay for your own hardware.
- Email Verification: Recruiting emails from Veracode will always originate from an “@veracode.com" email address.
If you have any doubts about the authenticity of an email, letter, or telephone communication claiming to be from Veracode, please reach out to us at careers@veracode.com before taking any further action.