Archetypes in Data Engineering

Preeti Hemant
2 min readMar 16, 2022

This is a short read, part of the series “Short Byte: Data Engineering” to introduce topics.

What flavour of Data Engineer are you? Leave your replies in the comment section

Today’s data systems are complex enough to warrant more than just traditional data engineering. There are 3 popular archetypes in data engineering with different objectives.

Data Engineer (Pipelines)

A Data pipeline engineer is a traditional Data engineering role, the primary function being ingesting data in an Extract — Transform — Load sequence.

Data pipeline engineers automate data flows by building ETL pipelines. A company’s data comes from a variety of sources — custom pipelines may be built. Usually a tool like airflow is used for orchestration. They monitor pipelines, debug issues that emerge in data quality or SLAs and maintain healthy data flows.

Generally speaking, the end points of the jobs are a data lake and a warehouse. Hence, these engineers are also proficient in transformations and schema designs with optimal storage as the objective.

Data Engineer (Platform)

A data platform is becoming a necessity as the scale of data expands. Platform engineers build tools, frameworks and generalized systems. These systems meet the pipeline development needs of Data Scientists or Data pipeline engineers. In some cases, frameworks are built for a specific use case like experimentation. Data security is one of the architectural considerations in these platforms.

Data platforms aim to standardize how data is ingested from the outside and then handled internally. These platforms create a foundation that brings consistency in the downstream workflows.

Data platform engineers have a larger overlap with traditional software engineering when compared to Data pipeline engineers. They understand good design patterns in architecture and tradeoffs—E.g. a platform that offers flexibility in scheduling job runs but is opinionated on the how the load is distributed.

Analytics Engineer

What happens when simply ingesting data into the warehouse doesn’t suffice? Data needs to be modelled to incorporate complex business logic and to ensure consistency in metrics, reports and dashboards. This is Analytics Engineering.

A new role in the industry, it meets needs that are specific to Business Intelligence and implements an ELT (Extract, Load, Transform) data flow. DBT is one of the favourite tools in this space.

Although the boundaries between Data pipeline engineering and analytics engineering can be hazy, this role primarily focusses on how data flows after the warehousing step.

Further reading

  1. https://www.getdbt.com/what-is-analytics-engineering/
  2. https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/how-to-build-a-data-architecture-to-drive-innovation-today-and-tomorrow

--

--