Data Generator Utility Project

The Data Generation Utility is a command-line tool designed to aid in testing data pipelines by generating customizable JSON test data. Whether you're validating data transformations, testing data validations, or simply need mock data for development purposes, this utility simplifies the process.

Features

Flexible Configuration: Customize file count, file name prefix, data schema, and more through command-line arguments or a configuration file.

Multiple Prefix Options: Generate files with sequential counts, random prefixes, or UUID prefixes for easy organization.

Supports Multiprocessing: Utilize multiple CPU cores to generate data files efficiently, speeding up the process for larger datasets.

Schema Validation: Validate data schema correctness to ensure generated data adheres to specified rules.

How to use

Fork or Clone the repository

Run the utility with desired command-line arguemnts or customize configurations in 'default.ini'.

Generated JSON files will be saved to the specified directory.

List of input params for CU:

Name	Description	Behaviour
path_to_save_files	Where all files need to save	2 ways to define path: Relatively from cwd and absolute '.' - means current path (default)
files_count	How much json files to generate	Default: 4
file_name	Base file_name	If there is no prefix, the file name will be file_name.json With prefix full file name will be file_name_file_prefix.json
file_prefix	Prefix name for file if more than 1 file needs to be generated	Possible choices: count random uuid
data_schema	String with json schema	It can be loaded in two ways: With path to json file with schema With schema entered to command line
data_lines	Count of lines for each file	Default: 100
multiprocessing	The number of processes used to create files	Default: 1

Example usage

Example of data schema:
{"date":"timestamp:", "name": "str:rand", "type": "str:['client', 'partner', 'government']", "age": "int:rand(1, 90)"}

Simple usage with default values:
cli.py

Usage with data schema from file:
cli.py --file_count=3 --file_name=super_data --prefix=count --data_schema=./path/to/schema.json

Usage with provided data_schema from cmd:

cli.py --file_count=3 --file_name=super_data --prefix=count --multiprocessing=4 --data_schema="{\"date\": \"timestamp:\", \"name\": \"str:rand\", \"type\" : \"['client', 'partner', 'government']\", \"age\":\"int:rand(1, 90)\"}"

TODO in future:

Add testing

Add clear_path flag

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
cli.py		cli.py
config_generator.py		config_generator.py
default.ini		default.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Generator Utility Project

Features

How to use

List of input params for CU:

Example usage

TODO in future:

About

Uh oh!

Releases

Packages

Languages

MikJarz/Data_Generator_Project

Folders and files

Latest commit

History

Repository files navigation

Data Generator Utility Project

Features

How to use

List of input params for CU:

Example usage

TODO in future:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages