-
-
Notifications
You must be signed in to change notification settings - Fork 7.2k
feat: module 2 for 2026 cohort #749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
alexeygrigorev
merged 21 commits into
DataTalksClub:main
from
kestra-io:feat/2026-cohort-with-ai-unit
Dec 2, 2025
Merged
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
1e8d5fe
feat: pin docker image versions, add 2025 cohort
anna-geller b22cbd3
feat: cut dbt, add AI Workflows & AI Agents section
anna-geller f3c65d2
fix: simplify docker set up
wrussell1999 a531b9a
fix: pg db config
wrussell1999 4fec2da
fix: amend pg db name
wrussell1999 ccbc479
feat: reduce the scope
anna-geller 03bca14
fix: update structure for 2026
wrussell1999 514c8bd
simplify docker compose set up
wrussell1999 82a8599
fix: postgres volume
wrussell1999 3707556
fix: install instructions
wrussell1999 4a896c9
make module 1 pg work with module 2
wrussell1999 28c3bdd
make sure docker compose matches readme
wrussell1999 3eede85
add more introductory stages
wrussell1999 1194637
flesh out readme
wrussell1999 995af3d
2.2 details
wrussell1999 0df43f2
fix: headers
wrussell1999 458e757
2.1 complete
wrussell1999 e129673
fix heading
wrussell1999 183866f
feat: 2.2.1 video
wrussell1999 c0161e8
Add 2.2.2 to 2.3.1
wrussell1999 aea4039
fix: placeholders for videos
wrussell1999 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,61 +17,21 @@ wget https://github.com/DataTalksClub/nyc-tlc-data/releases/download/yellow/yell | |
|
|
||
| ### Running Postgres with Docker | ||
|
|
||
| #### Windows | ||
|
|
||
| Running Postgres on Windows (note the full path) | ||
| Running Postgres on Windows, macOS and Linux | ||
|
|
||
| ```bash | ||
| docker run -it \ | ||
| -e POSTGRES_USER="root" \ | ||
| -e POSTGRES_PASSWORD="root" \ | ||
| -e POSTGRES_DB="ny_taxi" \ | ||
| -v c:/Users/alexe/git/data-engineering-zoomcamp/week_1_basics_n_setup/2_docker_sql/ny_taxi_postgres_data:/var/lib/postgresql/data \ | ||
| -p 5432:5432 \ | ||
| postgres:13 | ||
| ``` | ||
|
|
||
| If you have the following error: | ||
|
|
||
| ``` | ||
| docker run -it \ | ||
| -e POSTGRES_USER="root" \ | ||
| -e POSTGRES_PASSWORD="root" \ | ||
| -e POSTGRES_DB="ny_taxi" \ | ||
| -v e:/zoomcamp/data_engineer/week_1_fundamentals/2_docker_sql/ny_taxi_postgres_data:/var/lib/postgresql/data \ | ||
| -v ny_taxi_postgres_data:/var/lib/postgresql \ | ||
| -p 5432:5432 \ | ||
| postgres:13 | ||
|
|
||
| docker: Error response from daemon: invalid mode: \Program Files\Git\var\lib\postgresql\data. | ||
| See 'docker run --help'. | ||
| ``` | ||
|
|
||
| Change the mounting path. Replace it with the following: | ||
|
|
||
| ``` | ||
| -v /e/zoomcamp/...:/var/lib/postgresql/data | ||
| ``` | ||
|
|
||
| #### Linux and MacOS | ||
|
|
||
|
|
||
| ```bash | ||
| docker run -it \ | ||
| -e POSTGRES_USER="root" \ | ||
| -e POSTGRES_PASSWORD="root" \ | ||
| -e POSTGRES_DB="ny_taxi" \ | ||
| -v $(pwd)/ny_taxi_postgres_data:/var/lib/postgresql/data \ | ||
| -p 5432:5432 \ | ||
| postgres:13 | ||
| postgres:18 | ||
| ``` | ||
|
|
||
| If you see that `ny_taxi_postgres_data` is empty after running | ||
| the container, try these: | ||
|
|
||
| * Deleting the folder and running Docker again (Docker will re-create the folder) | ||
| * Adjust the permissions of the folder by running `sudo chmod a+rwx ny_taxi_postgres_data` | ||
|
|
||
|
|
||
| ### CLI for Postgres | ||
|
|
||
| Installing `pgcli` | ||
|
|
@@ -125,7 +85,7 @@ Running pgAdmin | |
| docker run -it \ | ||
| -e PGADMIN_DEFAULT_EMAIL="[email protected]" \ | ||
| -e PGADMIN_DEFAULT_PASSWORD="root" \ | ||
| -p 8080:80 \ | ||
| -p 8085:80 \ | ||
| dpage/pgadmin4 | ||
| ``` | ||
|
|
||
|
|
@@ -144,11 +104,11 @@ docker run -it \ | |
| -e POSTGRES_USER="root" \ | ||
| -e POSTGRES_PASSWORD="root" \ | ||
| -e POSTGRES_DB="ny_taxi" \ | ||
| -v c:/Users/alexe/git/data-engineering-zoomcamp/week_1_basics_n_setup/2_docker_sql/ny_taxi_postgres_data:/var/lib/postgresql/data \ | ||
| -v ny_taxi_postgres_data:/var/lib/postgresql \ | ||
| -p 5432:5432 \ | ||
| --network=pg-network \ | ||
| --name pg-database \ | ||
| postgres:13 | ||
| --name pgdatabase \ | ||
| postgres:18 | ||
| ``` | ||
|
|
||
| Run pgAdmin | ||
|
|
@@ -157,14 +117,14 @@ Run pgAdmin | |
| docker run -it \ | ||
| -e PGADMIN_DEFAULT_EMAIL="[email protected]" \ | ||
| -e PGADMIN_DEFAULT_PASSWORD="root" \ | ||
| -p 8080:80 \ | ||
| -p 8085:80 \ | ||
| --network=pg-network \ | ||
| --name pgadmin-2 \ | ||
| dpage/pgadmin4 | ||
| ``` | ||
|
|
||
|
|
||
| ### Data ingestion | ||
| ### Data Ingestion | ||
|
|
||
| Running locally | ||
|
|
||
|
|
@@ -187,22 +147,6 @@ Build the image | |
| docker build -t taxi_ingest:v001 . | ||
| ``` | ||
|
|
||
| On Linux you may have a problem building it: | ||
|
|
||
| ``` | ||
| error checking context: 'can't stat '/home/name/data_engineering/ny_taxi_postgres_data''. | ||
| ``` | ||
|
|
||
| You can solve it with `.dockerignore`: | ||
|
|
||
| * Create a folder `data` | ||
| * Move `ny_taxi_postgres_data` to `data` (you might need to use `sudo` for that) | ||
| * Map `-v $(pwd)/data/ny_taxi_postgres_data:/var/lib/postgresql/data` | ||
| * Create a file `.dockerignore` and add `data` there | ||
| * Check [this video](https://www.youtube.com/watch?v=tOr4hTsHOzU&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb) (the middle) for more details | ||
|
|
||
|
|
||
|
|
||
| Run the script with Docker | ||
|
|
||
| ```bash | ||
|
|
@@ -213,47 +157,41 @@ docker run -it \ | |
| taxi_ingest:v001 \ | ||
| --user=root \ | ||
| --password=root \ | ||
| --host=pg-database \ | ||
| --host=pgdatabase \ | ||
| --port=5432 \ | ||
| --db=ny_taxi \ | ||
| --table_name=yellow_taxi_trips \ | ||
| --url=${URL} | ||
| ``` | ||
|
|
||
| ### Docker-Compose | ||
| ### Docker Compose | ||
|
|
||
| Run it: | ||
|
|
||
| ```bash | ||
| docker-compose up | ||
| docker compose up | ||
| ``` | ||
|
|
||
| Run in detached mode: | ||
|
|
||
| ```bash | ||
| docker-compose up -d | ||
| docker compose up -d | ||
| ``` | ||
|
|
||
| Shutting it down: | ||
|
|
||
| ```bash | ||
| docker-compose down | ||
| ``` | ||
|
|
||
| Note: to make pgAdmin configuration persistent, create a folder `data_pgadmin`. Change its permission via | ||
|
|
||
| ```bash | ||
| sudo chown 5050:5050 data_pgadmin | ||
| docker compose down | ||
| ``` | ||
|
|
||
| and mount it to the `/var/lib/pgadmin` folder: | ||
| Add a docker volume to the `pgadmin` container: | ||
|
|
||
| ```yaml | ||
| services: | ||
| pgadmin: | ||
| image: dpage/pgadmin4 | ||
| volumes: | ||
| - ./data_pgadmin:/var/lib/pgadmin | ||
| - data_pgadmin:/var/lib/pgadmin | ||
| ... | ||
| ``` | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,12 +1,16 @@ | ||
| volumes: | ||
| ny_taxi_postgres_data: | ||
| driver: local | ||
|
|
||
| services: | ||
| pgdatabase: | ||
| image: postgres:13 | ||
| image: postgres:18 | ||
| environment: | ||
| - POSTGRES_USER=root | ||
| - POSTGRES_PASSWORD=root | ||
| - POSTGRES_DB=ny_taxi | ||
| volumes: | ||
| - "./ny_taxi_postgres_data:/var/lib/postgresql/data:rw" | ||
| - ny_taxi_postgres_data:/var/lib/postgresql | ||
| ports: | ||
| - "5432:5432" | ||
| pgadmin: | ||
|
|
@@ -15,5 +19,5 @@ services: | |
| - [email protected] | ||
| - PGADMIN_DEFAULT_PASSWORD=root | ||
| ports: | ||
| - "8080:80" | ||
| - "8085:80" | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By using docker managed volumes, the commands should work the same between operating systems