yt-archive

YT-Archive is a Python tool that allows you to download videos from a YouTube channel and scrape comments from those videos.

Introduction

This project provides a straightforward tool for bulk downloading videos from a YouTube channel and scraping comments from those videos. Whether you want to archive videos for offline viewing or analyze the comments for research, this tool can help you automate the process.

Features

Download videos from a YouTube channel.
Scrape comments from downloaded videos. (in development)
Supports multithreading and multiprocessing for efficient downloading.
Save metadata and comments to MongoDB or JSON files.
Headless browsing using Selenium for web scraping.

Requirements

Python 3.x
Chrome WebDriver (should be automatically installed by the tool, otherwise see https://chromedriver.chromium.org/)
Various Python libraries (see requirements.txt)

Installation

Clone the repository to your local machine:
```
git clone https://github.com/ilumn/yt-archive.git
```
Install the required Python packages using pip:
```
pip install -r requirements.txt
```
Download and configure the Chrome WebDriver if it is not automatically installed by the tool. Make sure the WebDriver executable is in your system's PATH or in the project directory.
Configure the environment (see Configuration)

Modify the .env.template file with your parameters and save as .env
Run the tool (see Usage)
```
python main.py
```

Usage

You can use this tool to download videos and scrape comments from a YouTube channel. Here are some usage examples: Downloading Videos and Scraping Comments

python main.py --use-mongodb --processes 4

--use-mongodb: Use MongoDB to track downloaded videos and cache comments.
--processes 4: Define the number of simultaneous video downloads (default is 4, minimum 2, maximum 256).

Downloading Videos and Scraping Comments (Single-Processing)

python main.py --use-mongodb --single-processing

--single-processing: Skip multiprocessing and download videos one at a time. 
                     More stable on sub-gigabit internet.

Configuration

Before running the tool, you'll need to configure it by setting environment variables. Create a .env file and add the following variables (.env.template available):

MONGODB_URI: MongoDB connection URI (if using MongoDB).
MONGODB_DB: MongoDB database name (if using MongoDB).
MONGODB_COLLECTION: MongoDB collection name (if using MongoDB).
DOWNLOAD_FOLDER: Folder where videos will be downloaded.
CHANNEL_URL: URL of the YouTube channel videos page (/videos at the end) you want to scrape.
USE_TITLES_AS_FILENAMES: true/false, if false video id will be used as the filename instead

Contributing

Contributions are encouraged. If you would like to contribute to this project, please open an issue or submit a pull request. License

TODO

listed in order of highest to lowest priority.

fix comment scraping
add age gate bypassing (for downloading age restricted or content restricted videos)
add video description to metadata
add command line flag to save more metadata such as the youtube channel's subscribers at the time of download, pinned comments, video advertising id/hotspots available codecs/resolutoins, etc.
add support to use this tool (especially the comment scraper) as a library for other projects
possible switch to concurrent futures for multithreading
possible rewrite using pafy, yt-dl, or youtube-dl instead of pytube
configure github application testing

License

This project is licensed under the GNU GPLv3 License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
examples		examples
lib		lib
.env.template		.env.template
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

yt-archive

Table of Contents

Introduction

Features

Requirements

Installation

Usage

Configuration

Contributing

TODO

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ilumn/YT-Archive

Folders and files

Latest commit

History

Repository files navigation

yt-archive

Table of Contents

Introduction

Features

Requirements

Installation

Usage

Configuration

Contributing

TODO

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages