Contractor System Reliability Engineer
Techholding is hiring a remote Contractor System Reliability Engineer. The career level for this job opening is Experienced and is accepting Mexico based applicants remotely. Read complete job description before applying.
Techholding
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
About us: Working at Tech Holding isn't just a job, it's an opportunity to be a part of something bigger. We are a full-service consulting firm focused on delivering predictable outcomes and high-quality solutions to our clients. Our team has extensive industry experience, having held senior positions in various companies, from startups to Fortune 50 firms. We've combined our experiences into a unique approach supported by deep expertise, integrity, transparency, and dependability.
The Role: As a System Reliability Engineer, you will play a crucial role in managing Linux and Windows environments, automating processes, and implementing robust monitoring and security practices. Your expertise will help us maintain high availability and performance across our clients' systems. If you thrive on solving complex problems and optimizing systems, we want to hear from you!
Responsibilities:
- Manage, configure, and maintain Linux and Windows Server environments.
- Perform regular system updates, patches, and security configurations.
- Implement and maintain monitoring tools to track system performance, availability, and reliability.
- Analyze performance metrics and logs to identify and resolve issues proactively.
- Collaborate with stakeholders to create dashboards and alerts for proactive performance monitoring.
- Develop and maintain automation scripts for routine tasks, deployments, and incident responses.
- Use configuration management tools to ensure consistent and repeatable system setups.
- Implement and enforce security best practices for system configurations and network setups.
- Conduct regular vulnerability assessments and apply necessary patches to mitigate risks.
- Work closely with development, DevSecOps, and cloud engineering teams to support application deployments and infrastructure changes.
- Provide technical guidance and support for resolving complex system issues.
- Create and maintain detailed documentation for system configurations, procedures, and incident reports.
- Identify opportunities for process improvements and implement changes to enhance system reliability and performance.
Required Skills:
- Proficiency in managing and troubleshooting Linux and Windows Server systems.
- Experience with automation tools (Ansible, Puppet, or Chef).
- Familiarity with monitoring solutions (AWS CloudWatch, Dynatrace, Datadog).
- Ability to analyze system performance metrics and implement optimizations.
- Experience with patch management, vulnerability assessment, and remediation.
- Proficiency in scripting languages (Bash, Python, PowerShell).
- Experience with version control systems (Git).
- Familiarity with AWS (EC2, Lambda, Containers).
- Familiarity with AWS System Manager features (Patch Manager, Run Command).
- Familiarity with incident response, troubleshooting, and root cause analysis.
- Familiarity with infrastructure as code (IaC) tools (Terraform, AWS CloudFormation).
Nice to have: Familiarity with AWS Well-Architected principles.
AWS Certifications (Nice to have): DevOps Engineer, Solutions Architect - Associate, SysOps Administrator - Associate, Developer - Associate