Skip to content

gtolomei/big-data-computing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Big Data Computing 2025-26

News | General Information | Syllabus | Class Schedules | Previous Years

News

  • IMPORTANT ANNOUNCEMENT: Due to an institutional commitment, tomorrow, December 11, the class will end at 11:00 AM.
  • First Exam Session: January 21, 2026
    All students wishing to take part in this session must register on Infostud (1027099). Registration is open until January 18, 2026. Further details will be communicated closer to the exam date.
  • OPIS SURVEY: Students who have not yet completed the OPIS evaluation survey are strongly encouraged to do so using the following code: XD4D7LD3. Please note that the official guidelines for completing the questionnaire are available online at this link.
  • IMPORTANT ANNOUNCEMENT: Classes are suspended from Thursday, October 30 to Wednesday, November 5, inclusive.
  • Dear all, I'll be out for a conference next week, so there will be no class on October 15th and October 16th
  • Differently from what previously communicated, there WILL BE lecture on Wednesday, October 1st
  • Dear all, unfortunately, due to health issues, we won't be having the class tomorrow (25.09.2025).

General Information

Welcome to the 2025-26 Big Data Computing class!

This is a first-semester course of the MSc in Computer Science at the Sapienza University of Rome.

This repository contains class material along with any useful information for the 2025-2026 academic year.

The Big Data Computing course is divided into two distinct modules, each one carrying 3 CFUs (credits).
Prof. Daniele De Sensi will lead the first module, while the second module will be taught by Prof. Gabriele Tolomei.
Importantly, these modules will not run concurrently; once the first module concludes, the second will begin.

Class Schedule

  • Wednesday from 8:00 AM to 11:00 AM (Aula Magna RM111 - "Building C" - Viale Regina Elena 295 [map])
  • Thursday from 10:00 AM to 12:00 PM (Room 1L RM018 - Via Del Castro Laurenziano, 7a [map])

Moodle Web Page

Students must subscribe to the Moodle web page using the same credentials (username/password) to access the Wi-Fi network and Infostud services at the following link: https://elearning.uniroma1.it/enrol/index.php?id=19950

Contacts

Prof. Daniele De Sensi

Prof. Gabriele Tolomei

Office Hours

Prof. Daniele De Sensi
Please drop me a message at [email protected] in case you would like to schedule a meeting, either online (i.e., via Google Meet or Zoom) or in-person (i.e., in Room 306 located at the 3rd floor of Building E in Viale Regina Elena 295).

Prof. Gabriele Tolomei
Please drop me a message at [email protected] in case you would like to schedule a meeting, either online (i.e., via Google Meet or Zoom) or in-person (i.e., in Room 106 located at the 1st floor of Building E in Viale Regina Elena 295).

Description and Goals

The amount, variety, and rate at which data is being generated nowadays, both by humans and machines, are unprecedented. This opens up a number of challenges on dealing with those data, as traditional computing paradigms are not conceived to operate at such a scale.

"Big Data" is the umbrella term that has rapidly become popular to describe methodologies and tools specifically designed for collecting, storing, and processing very large or complex data sets. In addition to addressing foundational computer science problems, such as searching and sorting, big data computing mainly focuses on extracting knowledge - thereby value - from large-scale data sets using advanced data analysis techniques, such as machine learning.

This course is intended to provide graduate-level students with a deep understanding of programming models and computer architectures that are suitable for the large-scale analysis of data. More specifically, the course will give students the ability to understand challenges and solutions in developing big data/machine learning workloads, and to tackle real-world problems faced by the so-called "Big Five" tech companies (i.e., Apple, Amazon, Google, Microsoft, and Facebook): text/graph analysis, classification/regression, and recommendation, just to name a few.

Prerequisites

The course assumes that students are familiar with the basics of data analysis and machine learning, properly supported by a strong knowledge of foundational concepts of calculus, linear algebra, probability, statistics, and computer architectures.

Exams

The exam will consist of an oral exam encompassing all topics covered during the course.

Recommended Textbooks

No textbooks are mandatory to successfully follow this course. However, there is a huge set of references which may be worth mentioning, especially to those who wants to dig deeper into some specific topics. Among those, some readings I would like to suggest are as follows:

  • Mining of Massive Datasets [Leskovec, Rajaraman, Ullman] available online.
  • Big Data Analysis with Python [Marin, Shukla, VK]
  • Large Scale Machine Learning with Python [Sjardin, Massaron, Boschetti]
  • Spark: The Definitive Guide [Chambers, Zaharia]
  • Learning Spark: Lightning-Fast Big Data Analysis [Karau, Konwinski, Wendell, Zaharia]
  • Hadoop: The Definitive Guide [White]
  • Python for Data Analysis [Mckinney]

Class Schedules

Lecture # Date Topic Material
Lecture 1 25/09/2025 Introduction to Big Data: Motivations and Challenges [slides: PPT]
Lecture 2 01/10/2025 Distributed Deep Learning [slides: PPT]
Lecture 3 02/10/2025 Collective Communication Algorithms [slides: PPT]
Lecture 4 08/10/2025 Collective Communication Algorithms, Network Topologies [slides: PPT]
Lecture 5 09/10/2025 Network Topologies, Load Balancing [slides: PPT]
Lecture 6 22/10/2025 Load Balancing, Congestion Control, In-Network Compute [slides: PPT]
Lecture 7 23/10/2025 GFS, HDFS, MapReduce [slides: PPT]
Lecture 8 29/10/2025 Spark [slides: PPT]
Lecture 9 06/11/2025 Recap & Outlook [slides: PPT]
Lecture 10 12/11/2025 Introduction to Big Data (Part II) [slides: PDF]
Lecture 11 13/11/2025 The Curse of Dimensionality [slides: PDF, notebook: ipynb]
Lecture 12 19/11/2025 Clustering: K-means [slides: PDF]
Lecture 13 20/11/2025 Clustering: Evaluation [slides: PDF]
Lecture 14 26/11/2025 Dimensionality Reduction: Principal Component Analysis [slides: PDF, notes: PDF]
Lecture 15 27/11/2025 Recommender Systems (Part I) [slides: PDF]
Lecture 16 03/12/2025 Recommender Systems (Part II) [slides: PDF]
Lecture 17 04/12/2025 Recommender Systems (Part III) [slides: PDF]
Lecture 18 10/12/2025 Graph Link Analysis [slides: PDF]
Lecture 19 11/12/2025 PageRank (Part I) [slides: PDF]
Lecture 20 17/12/2025 PageRank (Part II) [slides: PDF, notes: PDF]
Lecture 21 18/12/2025 The Last Take Home Message [slides: PDF]

Previous Years

In the following, you can quickly navigate through Big Data Computing class information and material from previous years.

NOTE: The folder containing the class material is unique, and it is subject to changes and/or updates; as such, there may be differences between the content displayed on this website and what has been shown in class in the past.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published