Virginia Tech® home

ECE 5874 - Data Engineering Project (3C)

Course Description

Fundamentals of data engineering. The role of data engineer. Data engineering lifecycle. Data generation, ingestion, transformation, storage, serving Artificial Intelligence and Machine Leaning (AI/ML), visualization and business analytics. Automation and task orchestration. Data governance, quality and the role of data provenance. Data systems. E(xtract)T(ransform)L(oad) data. Build, test,maintain data pipelines. Data lakes. Real-world problems with emphasis on end-to-end engineering solutions. Cloud services and open-source data engines/platforms. Engineering portfolio.

Master of Information Technology (MIT) students only.

Why take this course?

Data engineering is a core component of today’s data infrastructure. It includes a set of activities that include collecting, collating, extracting, moving, transforming, cleaning, integrating, organizing, representing, storing, and processing data, focusing on data systems, i.e., tools platforms. It is the backbone of data science/data intensive projects, as it builds pipelines of data-related tasks from data collection to data-driven solutions for AI/ML, visualization and Business Analytics. Currently, most of the time spent in real-world data science projects involves data engineering.

This Data Engineering Project course adds to the MIT curriculum a complete end-to-end requirements to data problem-solving experience. It considers a wide breadth of technological choices and while addressing emerging issues such as data quality and valuation, automation and task orchestration. In this course, students will have
the opportunity to synthesize, integrate and apply knowledge from previous MIT courses (ECE 5494) to address the capstone project, gaining hands-on experience that includes technical, business, and professional development. The course will further allow students to construct their professional portfolio.

Learning Objectives

  • Identify the components of data engineering lifecycle
  • Identify data quality and provenance issues
  • Apply data validation methods
  • Implement data pipeline orchestration environments
  • Evaluate data storage environments
  • Analyze the role of data lakes and data warehouses in data engineering
  • Use free, open-source data engines/platforms towards end-to-end engineering solutions
  • Construct an engineering deliverable portfolio