Full-Time Data Engineer
RStudio is hiring a remote Full-Time Data Engineer. The career level for this job opening is Expert and is accepting USA based applicants remotely. Read complete job description before applying.
RStudio
Job Title
Posted
Career Level
Career Level
Locations Accepted
Share
Job Details
Posit is seeking our first Data Engineer as part of our Data Science Center of Excellence team.
In this position, you will be responsible for the infrastructure that stores Posit’s corporate data, operating the software used for corporate data science, the third-party integrations that bring data in, and the ETL/ELT code and tools that process the data. As part of the Data Science Center of Excellence team, you will work closely with the CloudOps, Security, and Enterprise Information Management teams to ensure the integrity and governance of our data, and with other teams throughout the organization in order to ensure the right data is in the right place so we can make data driven decisions to help us achieve our goals. You will also work closely with end users of data pipelines across the organization to understand their specific problems and collaborate to plan and develop ETL/ELT pipelines contributing to solutions.
What you’ll own:
- the operational excellence, reliability, and security of our data infrastructure and services, and advocate for investments to improve them
- the infrastructure, operations, and deployment pipelines of our data infrastructure
- ensuring that new features and functionality are designed and built with operational considerations, scalability, cost effectiveness, and sustainability in mind
What you’ll help with:
- ensuring appropriate metrics and monitoring are in place to provide actionable alerting with a high signal to noise ratio
- improving our infrastructure as code and continuous integration and deployment pipelines on consistent basis
- Planning and executing tasks related to the company’s data governance strategy
- giving and receiving feedback from other engineers in the form of code reviews and blameless post-mortems
- cross-functional collaboration working closely with teams across the organization to ensure data availability and support critical company operations, reporting, and data science projects
What you’ll teach:
- anti-patterns learned from prior experiences handling operational incidents
- the tools, tips, and tricks that make your professional life easier
What you’ll learn:
- Our current ETL/ELT toolchain of Fivetran, dbt, and Glue along with data storage tools including Amazon Redshift and S3
- metrics and monitoring using Datadog
- infrastructure as code using Pulumi and Python
- the Posit products and how their data science customers work
About you:
You have 5+ years of professional experience writing software to manage data pipelines and infrastructure. You are user-focused and driven by our mission to facilitate data science and education for everyone. You share our commitment to building great software by striving for robust design, clean and well-tested code, and delightful user experiences. You excel at breaking down complex problems into bite-size tasks and driving them to completion. You are able to act as a champion of data engineering and lead related internal efforts. You have a strong proficiency in Python, SQL, database management (both relational and non-relational), and have the ability to become an expert in our current toolchain, including dbt and Amazon S3. You are familiar with a broad set of data engineering tools, including Parquet and Iceberg, and are comfortable recommending new tools when they are the best solution for business problems. You love to learn and help others succeed through code reviews and other forms of mentorship. You are humble, pragmatic, deliberate, and you have a keen sense of empathy for your co-workers and users.
Within 1 month you will:
- get to know the larger Data Science COE and CloudOps teams and how we create, deliver and maintain software
- build and prioritize a backlog of work
Within 3 months you will:
- begin replacing existing business-critical data pipelines using our ETL/ELT toolchain
- establish process to take ownership of existing data pipelines to actively maintain and monitor for consistency and accuracy of data
Within 6 months you will:
- own or demonstrate expertise in multiple areas of our data infrastructure and pipelines
- research problems and new technologies and effectively communicate findings to the team
- identify underutilized or challenging data sources and develop enhanced data pipelines to unlock their potential for driving improved business outcomes
- propose significant projects and lead them