Full-Time Principal Dev-Ops Engineer
QAD, Inc. is hiring a remote Full-Time Principal Dev-Ops Engineer. The career level for this job opening is Experienced and is accepting Mexico City, Mexico based applicants remotely. Read complete job description before applying.
QAD, Inc.
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
As a DevOps engineer you will maintain and enhance established AWS Cloud infrastructure while collaborating with our existing DevOps team. Take ownership of specific components to distribute workload across the team. Develop comprehensive application understanding to improve infrastructure design, optimize deployment pipelines, and ensure SaaS environment reliability.
What you will do
- AWS resource maintenance and enhancement using Terraform and CloudFormation, implementing incremental infrastructure improvements.
- Kubernetes cluster management and optimization on EKS, improving container orchestration with Helm and Kustomize.
- CI/CD pipeline refinement in GitHub Actions, Jenkins, and Flux2 to improve deployment reliability and speed.
- System health monitoring with DataDog and AWS CloudWatch, enhancing alerting rules and dashboards as needed.
- Development team collaboration to understand application requirements and behaviors, using insights to improve infrastructure designs.
- Complex issue troubleshooting within multi-environment AWS infrastructure.
What you will need
- Professional degree: Optional
- 5+ years of experience in the field.
- AWS cloud account and resource management including security policies, IAM, VPC, EC2, EKS, RDS, S3, MSK, WAF, and Load Balancers
- Kubernetes administration, troubleshooting, and optimization
- Docker container image creation and troubleshooting
- Infrastructure-as-code implementation using Terraform and/or CloudFormation
- GitOps workflows and Git source control, including tools like Flux2 or ArgoCD
- CI/CD tooling proficiency, specifically GitHub Actions and Jenkins pipelines
- Go or Python scripting for operational workflow automation
- DataDog and CloudWatch monitoring implementation
- Application architecture understanding to inform infrastructure decisions
Nice to Have
- SQL database operations and troubleshooting
- AWS resource optimization for cost efficiency
- Container image security scanning implementation
- Production infrastructure change management
Other knowledge
- Go
- Python
- SQL
- YAML
- HCL (Terraform)