Full-Time Data Reliability Engineer - Remote
PayNearMe is hiring a remote Full-Time Data Reliability Engineer - Remote. The career level for this job opening is Experienced and is accepting Santa Clara, CA based applicants remotely. Read complete job description before applying.
PayNearMe
Job Title
Posted
Career Level
Career Level
Locations Accepted
Salary
Share
Job Details
We’re looking for a Data Reliability Engineer, reporting to our Manager of Data Operations.
About our Data Stack:
- Cloud Provider: AWS
- Database: MySQL, PostgreSQL
- Extract/Load: Fivetran
- Transform: dbt
- Data Warehouse: Snowflake
- BI Visualization: Looker
- Code versioning: GitLab
- Preferred languages: SQL, Python
- Infrastructure as Code: Terraform/OpenTofu
- Observability: Monte Carlo, Datadog
As our Data Reliability Engineer, you will design, build, and maintain the data infrastructure that powers our data platform, ensuring reliability, scalability, and performance.
You will bring an SRE approach to data operations, automating workflows, and continuously improving the data infrastructure and tools to support our business needs.
What you’ll do:
- Infrastructure Management: Design, build, and maintain scalable and reliable data infrastructure using CI/CD and continuous improvement practices using IaC to manage both SaaS and cloud platform infrastructure.
- Automation & Enabling Self-Service: Automate manual data operations tasks to enhance efficiency and ensure consistent, repeatable processes. Include self-service as a core tenet of the infrastructure designs and architecture.
- Observability: Develop and implement observability solutions to ensure data platform reliability. Build metrics, SLIs, and SLOs to measure build success/failures, infrastructure stability, capacity, etc.
- Data Reliability Engineering: Drive expansion of SRE practices, such as failure analysis, redundancy, automated QA and security integration, and improvement as a tool. With the goal to minimize toil and enhance the uptime and performance of our data platform.
- Data Platform Team Support: Partner with members of the Data Platform Team to design and deliver solutions that align with their requirements, fostering collaboration to ensure all infrastructure and systems meet their functional and technical needs.
- Collaboration Across Teams: Coordinate with data engineers, analysts, and platform teams to ensure the usability, reliability, and scalability of data solutions.
- Data Platform Optimization: Optimize the performance and scalability of the data platform to manage costs and reliability.
- Security and Compliance: Partner with security teams to implement industry best practices and standards like PCI or SOC 2.
- On-Call and Maintenance: Ensure data platform uptime by actively monitoring and responding to data platform issues. Participate in on-call rotations to address data platform issues. Manage incidents impacting the data platform, perform root cause analysis, and implement solutions to prevent recurrence.
- Incident Response: Take ownership of and continuously improve the incident response processes for the data platform.
Experience: +3 years of experience in DataOps, data engineering, SRE, or a related role with a focus on maintaining and scaling data platforms.
Site Reliability Engineering: Experience applying SRE principles including CI/CD, automation, IaC, etc.
Cloud Data Platforms: Hands-on experience with cloud-based data platform tools like Fivetran, Monte Carlo, Snowflake, AWS, and BI tools.
Infrastructure as Code: Experience using IaC design principles to manage data infrastructure.
CI/CD Practices: Expertise designing and implementing continuous integration and deployment processes.
Observability: Proficiency leveraging tools like Datadog and Monte Carlo to ensure reliability of the data platform.
Programming Skills: Expertise with scripting and orchestration languages like Terraform, GitLab CI, and Python for data operations and automation.
Data Expertise: Strong understanding of data principles and technologies including their design, management, and optimization.
Problem-Solving Skills: Demonstrated ability to troubleshoot and resolve complex data platform issues.
Communication: Excellent collaboration and communication skills to work effectively with cross-functional teams and document technical details.