Automated SQL PIDM and WHKEY Search Breakups

Purpose

SQL has a built-in limit that prevents more than 1,000 values in a WHERE student_id IN (...) clause. This workspace provides a Python script that automates breaking up large lists of PIDM, WHKEY, or PantherID values into properly formatted SQL queries.

With this tool, you can input a CSV or Excel file containing student IDs, specify the ID type, and generate a .txt file containing multiple SQL-ready IN clauses, making it easy to copy and paste directly into your SQL queries.

Project Structure

your_project/
│── uploads/        # Place your input CSV/XLSX files here
│── outputs/        # Processed output files will be stored here (auto-created)
│── scripts/        # Contains the main scripts
│   ├── main.py     # CLI script for processing files
│   ├── utils.py    # Utility functions for formatting IDs
│── README.md       # Instructions on how to use this workspace
│── environment.yaml # Conda environment configuration

Installation Instructions

To set up the required dependencies, use the id_sql_prep.yaml file with Conda:

conda env create -f id_sql_prep.yaml

Once created, activate the environment:

conda activate id_sql_prep

Usage

Run the script from the command line, specifying the input file and ID type. Make sure that your Excel file or csv file is properly formatted (Single Column with a header formatted as: PIDM, WHKEY, and pantherid).

python scripts/main.py --file sample.csv --idtype whkey
python scripts/main.py --file sample.xlsx --idtype pidm
python scripts/main.py --file sample.csv --idtype pantherid

The script will output a .txt file in the outputs/ directory, containing properly ending part of a WHERE student_id IN (...) broken into chunks of 1,000 IDs per IN clause.

Example

If the input file contains the following WHKEYs:

123456
789012
345678
901234
... (more IDs)

The script will generate an output file (outputs/query_WHKEY.txt) containing:

OR k.whkey IN (
    '123456', '789012', '345678', '901234', ...
);

If the input file contains PantherIDs, the script ensures lowercase formatting:

OR k.whkey IN (
    'p12345', 'p67890', 'p24680', 'p13579', ...
);

For PIDMs:

OR k.pidm IN (
    '1000001', '1000002', '1000003', ...
);

The script automatically handles formatting and ensures each query block contains no more than 1,000 IDs.

If you need to reformat to match your overall query, then please do so in the query or outputted .txt file!

This tool simplifies SQL query preparation when working with large lists of student IDs, saving time and reducing manual errors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Automated SQL PIDM and WHKEY Search Breakups

Purpose

Project Structure

Installation Instructions

Usage

Example

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
outputs		outputs
scripts		scripts
uploads		uploads
.gitignore		.gitignore
README.md		README.md
id_sql_prep.yaml		id_sql_prep.yaml

GSU-Analytics/studentids_break_for_sql

Folders and files

Latest commit

History

Repository files navigation

Automated SQL PIDM and WHKEY Search Breakups

Purpose

Project Structure

Installation Instructions

Usage

Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages