Data Engineering(INF 517)
| Course Code | Course Name | Semester | Theory | Practice | Lab | Credit | ECTS |
|---|---|---|---|---|---|---|---|
| INF 517 | Data Engineering | 2 | 3 | 0 | 0 | 3 | 6 |
| Prerequisites | |
| Admission Requirements |
| Language of Instruction | English |
| Course Type | Elective |
| Course Level | Masters Degree |
| Course Instructor(s) | Sultan Nezihe TURHAN sturhan@gsu.edu.tr (Email) |
| Assistant | |
| Objective |
Data engineering is the discipline concerned with the design of systems and use of analysis methods for the acquisition, storage, management, security, and processing of data. Rich data management schemes are needed to handle the sizeable “Big Data” that is available for processing. This class will be a foundational course in Data Engineering principles and practices and will consists of following headlines: i. The data engineering lifecycle ii. Data modelling techniques for organizing and managing data iii. Building data pipelines to collect, transform, analyse, and visualize data from multiple source systems. iv. Manipulate the data with different query languages v. Data analytics application and algorithms vi. Engineering non-traditional data types vii. Data standards and data quality |
| Content |
1. Introduction to Data Engineering: General Concepts 2. Data Storage Technologies 3. Cloud Data Platforms (AWS/Azure/GCP) 4. Data Integration Methods & Data Pipeline Architectures 5. Workflow Orchestration with Apache Airflow 6. Data Transformation with dbt (data build tool) 7. Batch Processing with Spark 8. Stream Processing Fundamentals & Apache Kafka 9. Search and Information Retrieval: Elastic Search 10. Data Lakehouse: Architecture and Principles 11. Data Mesh : Architecture and Principles. 12. Data Governance - 1: Metadata Management 13. Data Governance - 2: Data Quality and Testing 14. Data Governance - 2: Data Lineage and Observability |
| Course Learning Outcomes |
Students who successfully complete this course will have acquired the following skills: - Distinguish data engineering from data science and recognise it as a distinct field of study - Explains and applies the component steps of the data lifecycle - Explains data engineering techniques; applies and documents large-scale data engineering techniques for a specific task involving various types of multidimensional data - Explains and applies technical, ethical and societal issues related to data engineering, storage, access and maintenance - Explains the fundamental principles of big data analytics/algorithms and applies them to different fields - Explains relevant standards and best practices in data engineering, analyses shortcomings, and identifies possible strategies and approaches to overcome them. |
| Teaching and Learning Methods | Lectures, presentations, discussions, case studies, assignments, projects, practical application |
| References |
1. Reis, J, Housley M, Fundamentals of Data Engineering: Plan and Build Robust Data Systems, 1st Edition, 2022, O’Reilly, 978-1098108304 2. Warren, J., & Marz, N. (2015). Big Data: Principles and best practices of scalable realtime data systems. Simon and Schuster. 3. Learning Spark: Lightning-Fast Big Data Analysis, by by Holden Karau, Andy Konwinski, Patrick Wendell, and Matei Zaharia. O'Reilly Media. Feb 2015 4. Hadoop: The Definitive Guide, by Tom White. O'Reilly Media. April 2015. (Fourth edition of the book at Amazon.com) 5. Gorelik, A. (2019). The enterprise big data lake: Delivering the promise of big data and data science. O'Reilly Media. |
Theory Topics
| Week | Weekly Contents |
|---|---|
| 1 | Data Engineering: General Concepts |
| 2 | Data Storage Technologies |
| 3 | Data Integration Methods & Data Pipeline Architectures |
| 4 | Data Integration Methods & Data Pipeline Architectures |
| 5 | Workflow Orchestration with Apache Airflow |
| 6 | Data Transformation with dbt (data build tool) |
| 7 | Batch Processing with Spark |
| 8 | Stream Processing Fundamentals & Apache Kafka |
| 9 | Search and Information Retrieval: Elastic Search |
| 10 | Data Lakehouse: Architecture and Principles |
| 11 | Data Mesh : Architecture and Principles. |
| 12 | Data Governance - 1: Metadata Management |
| 13 | Data Governance - 2: Data Quality and Testing |
| 14 | Data Governance - 2: Data Lineage and Observability |
Practice Topics
| Week | Weekly Contents |
|---|
Contribution to Overall Grade
| Number | Contribution | |
|---|---|---|
| Contribution of in-term studies to overall grade | 7 | 50 |
| Contribution of final exam to overall grade | 1 | 50 |
| Toplam | 8 | 100 |
In-Term Studies
| Number | Contribution | |
|---|---|---|
| Assignments | 5 | 15 |
| Presentation | 1 | 15 |
| Midterm Examinations (including preparation) | 1 | 20 |
| Project | 0 | 0 |
| Laboratory | 0 | 0 |
| Other Applications | 0 | 0 |
| Quiz | 0 | 0 |
| Term Paper/ Project | 0 | 0 |
| Portfolio Study | 0 | 0 |
| Reports | 0 | 0 |
| Learning Diary | 0 | 0 |
| Thesis/ Project | 0 | 0 |
| Seminar | 0 | 0 |
| Other | 0 | 0 |
| Make-up | 0 | 0 |
| Toplam | 7 | 50 |
| No | Program Learning Outcomes | Contribution | ||||
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | ||
| 1 | X | |||||
| 2 | X | |||||
| 3 | X | |||||
| 4 | X | |||||
| 5 | X | |||||
| 6 | X | |||||
| 7 | X | |||||
| 8 | X | |||||
| 9 | X | |||||
| 10 | X | |||||
| 11 | X | |||||
| 12 | X | |||||
| 13 | X | |||||
| Activities | Number | Period | Total Workload |
|---|---|---|---|
| Class Hours | 14 | 3 | 42 |
| Working Hours out of Class | 14 | 2 | 28 |
| Assignments | 5 | 1 | 5 |
| Presentation | 1 | 1 | 1 |
| Midterm Examinations (including preparation) | 1 | 1 | 1 |
| Project | 0 | 0 | 0 |
| Laboratory | 0 | 0 | 0 |
| Other Applications | 0 | 0 | 0 |
| Final Examinations (including preparation) | 1 | 5 | 5 |
| Quiz | 0 | 0 | 0 |
| Term Paper/ Project | 0 | 0 | 0 |
| Portfolio Study | 0 | 0 | 0 |
| Reports | 0 | 0 | 0 |
| Learning Diary | 0 | 0 | 0 |
| Thesis/ Project | 0 | 0 | 0 |
| Seminar | 0 | 0 | 0 |
| Other | 0 | 0 | 0 |
| Make-up | 0 | 0 | 0 |
| Yıl Sonu | 0 | 0 | 0 |
| Hazırlık Yıl Sonu | 0 | 0 | 0 |
| Hazırlık Bütünleme | 0 | 0 | 0 |
| Total Workload | 82 | ||
| Total Workload / 25 | 3.28 | ||
| Credits ECTS | 3 | ||


