Full-Time Sr. HPC Consultant
Burwood Group, Inc is hiring a remote Full-Time Sr. HPC Consultant. The career level for this job opening is Expert and is accepting Chicago, IL based applicants remotely. Read complete job description before applying.
Burwood Group, Inc
Job Title
Posted
Career Level
Career Level
Locations Accepted
Salary
Share
Job Details
We are seeking a highly motivated and skilled High-Performance Computing (HPC) Consultant to join our team and support cutting-edge research and innovation. The ideal candidate will have a strong background in managing HPC resources (preferably within a higher education setting). This role will help drive the maturity of HPC offerings and services capabilities in the cloud, with a preference for experience with Slurm and Google Cloud Platform.
Working with other team members, you will help drive maturity in our offerings and services for all aspects of HPC for large-scale distributed cloud-powered computing.
Responsibilities include:
- Supporting clients by designing, implementing, and improving proprietary libraries and machine-learning pipelines
- Leveraging modern computer architecture to improve throughput or reduce costs
- Identifying compute waste and IO bottlenecks and delivering cost-efficient solutions at scale
- Establishing relationships with customers, assisting them with technical solutions and solving problems
- Delivering value through project work and exceptional operational support to customers
- Maintaining awareness and leveraging the newest technologies, primarily from Google Cloud Platform, but also AWS
Skills Required:
- Expert-level Linux/Unix Administration
- Deep knowledge of at least one major workload manager (Slurm highly preferred)
- Hands-on experience with underlying technologies, including high-speed networking and parallel file systems
- Working knowledge of containerization
- Experience working with and configuring GPU environments. TPU is a plus
- Familiarity with CI/CD (GitHub Actions, or Jenkins, etc.)
- Experience with Git/GitHub
- Public cloud (Google, Azure, AWS) certifications are a plus
- Hands-on experience supporting research by designing, implementing and improving proprietary libraries and machine-learning pipelines
- Strong scripting skills working in Python and Bash
- Experience with identifying performance bottlenecks: at a low level, induced by the OS, from software architecture, or in a distributed system
- Demonstrable skills in system administration and the architecture, deployment, and optimization of HPC clusters