Master of Science in Computer Engineering

Data Engineering(INF 517)

Course Code Course Name Semester Theory Practice Lab Credit ECTS
INF 517 Data Engineering 2 3 0 0 3 6
Prerequisites
Admission Requirements
Language of Instruction English
Course Type Elective
Course Level Masters Degree
Course Instructor(s) Sultan Nezihe TURHAN sturhan@gsu.edu.tr (Email)
Assistant
Objective Data engineering is the discipline concerned with the design of systems and use of analysis methods for the acquisition, storage, management, security, and processing of data. Rich data management schemes are needed to handle the sizeable “Big Data” that is available for processing. This class will be a foundational course in Data Engineering principles and practices and will consists of following headlines:
i. The data engineering lifecycle
ii. Data modelling techniques for organizing and managing data
iii. Building data pipelines to collect, transform, analyse, and visualize data from multiple source systems.
iv. Manipulate the data with different query languages
v. Data analytics application and algorithms
vi. Engineering non-traditional data types
vii. Data standards and data quality
Content 1. Introduction to Data Engineering: General Concepts
2. Data Storage Technologies
3. Cloud Data Platforms (AWS/Azure/GCP)
4. Data Integration Methods & Data Pipeline Architectures
5. Workflow Orchestration with Apache Airflow
6. Data Transformation with dbt (data build tool)
7. Batch Processing with Spark
8. Stream Processing Fundamentals & Apache Kafka
9. Search and Information Retrieval: Elastic Search
10. Data Lakehouse: Architecture and Principles
11. Data Mesh : Architecture and Principles.
12. Data Governance - 1: Metadata Management
13. Data Governance - 2: Data Quality and Testing
14. Data Governance - 2: Data Lineage and Observability
Course Learning Outcomes Students who successfully complete this course will have acquired the following skills:
- Distinguish data engineering from data science and recognise it as a distinct field of study
- Explains and applies the component steps of the data lifecycle
- Explains data engineering techniques; applies and documents large-scale data engineering techniques for a specific task involving various types of multidimensional data
- Explains and applies technical, ethical and societal issues related to data engineering, storage, access and maintenance
- Explains the fundamental principles of big data analytics/algorithms and applies them to different fields
- Explains relevant standards and best practices in data engineering, analyses shortcomings, and identifies possible strategies and approaches to overcome them.
Teaching and Learning Methods Lectures, presentations, discussions, case studies, assignments, projects, practical application
References 1. Reis, J, Housley M, Fundamentals of Data Engineering: Plan and Build Robust Data Systems, 1st Edition, 2022, O’Reilly, 978-1098108304
2. Warren, J., & Marz, N. (2015). Big Data: Principles and best practices of scalable realtime data systems. Simon and Schuster.
3. Learning Spark: Lightning-Fast Big Data Analysis, by by Holden Karau, Andy Konwinski, Patrick Wendell, and Matei Zaharia. O'Reilly Media. Feb 2015
4. Hadoop: The Definitive Guide, by Tom White. O'Reilly Media. April 2015. (Fourth edition of the book at Amazon.com)
5. Gorelik, A. (2019). The enterprise big data lake: Delivering the promise of big data and data science. O'Reilly Media.
Print the course contents
Theory Topics
Week Weekly Contents
1 Data Engineering: General Concepts
2 Data Storage Technologies
3 Data Integration Methods & Data Pipeline Architectures
4 Data Integration Methods & Data Pipeline Architectures
5 Workflow Orchestration with Apache Airflow
6 Data Transformation with dbt (data build tool)
7 Batch Processing with Spark
8 Stream Processing Fundamentals & Apache Kafka
9 Search and Information Retrieval: Elastic Search
10 Data Lakehouse: Architecture and Principles
11 Data Mesh : Architecture and Principles.
12 Data Governance - 1: Metadata Management
13 Data Governance - 2: Data Quality and Testing
14 Data Governance - 2: Data Lineage and Observability
Practice Topics
Week Weekly Contents
Contribution to Overall Grade
  Number Contribution
Contribution of in-term studies to overall grade 7 50
Contribution of final exam to overall grade 1 50
Toplam 8 100
In-Term Studies
  Number Contribution
Assignments 5 15
Presentation 1 15
Midterm Examinations (including preparation) 1 20
Project 0 0
Laboratory 0 0
Other Applications 0 0
Quiz 0 0
Term Paper/ Project 0 0
Portfolio Study 0 0
Reports 0 0
Learning Diary 0 0
Thesis/ Project 0 0
Seminar 0 0
Other 0 0
Make-up 0 0
Toplam 7 50
No Program Learning Outcomes Contribution
1 2 3 4 5
1 X
2 X
3 X
4 X
5 X
6 X
7 X
8 X
9 X
10 X
11 X
12 X
13 X
Activities Number Period Total Workload
Class Hours 14 3 42
Working Hours out of Class 14 2 28
Assignments 5 1 5
Presentation 1 1 1
Midterm Examinations (including preparation) 1 1 1
Project 0 0 0
Laboratory 0 0 0
Other Applications 0 0 0
Final Examinations (including preparation) 1 5 5
Quiz 0 0 0
Term Paper/ Project 0 0 0
Portfolio Study 0 0 0
Reports 0 0 0
Learning Diary 0 0 0
Thesis/ Project 0 0 0
Seminar 0 0 0
Other 0 0 0
Make-up 0 0 0
Yıl Sonu 0 0 0
Hazırlık Yıl Sonu 0 0 0
Hazırlık Bütünleme 0 0 0
Total Workload 82
Total Workload / 25 3.28
Credits ECTS 3
Scroll to Top