Skip to content

MikJarz/Data_Generator_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data Generator Utility Project

The Data Generation Utility is a command-line tool designed to aid in testing data pipelines by generating customizable JSON test data. Whether you're validating data transformations, testing data validations, or simply need mock data for development purposes, this utility simplifies the process.

Features

  • Flexible Configuration: Customize file count, file name prefix, data schema, and more through command-line arguments or a configuration file.
  • Multiple Prefix Options: Generate files with sequential counts, random prefixes, or UUID prefixes for easy organization.
  • Supports Multiprocessing: Utilize multiple CPU cores to generate data files efficiently, speeding up the process for larger datasets.
  • Schema Validation: Validate data schema correctness to ensure generated data adheres to specified rules.
  • How to use

  • Fork or Clone the repository
  • Run the utility with desired command-line arguemnts or customize configurations in 'default.ini'.
  • Generated JSON files will be saved to the specified directory.

    List of input params for CU:

    Name Description Behaviour
    path_to_save_files Where all files need to save 2 ways to define path: Relatively from cwd and absolute
    '.' - means current path (default)
    files_count How much json files to generate Default: 4
    file_name Base file_name If there is no prefix, the file name will be file_name.json
    With prefix full file name will be file_name_file_prefix.json
    file_prefix Prefix name for file if more than 1 file needs to be generated Possible choices:
  • count
  • random
  • uuid
  • data_schema String with json schema It can be loaded in two ways:
  • With path to json file with schema
  • With schema entered to command line
    data_lines Count of lines for each file Default: 100
    multiprocessing The number of processes used to create files Default: 1

    Example usage

  • Example of data schema:
    {"date":"timestamp:", "name": "str:rand", "type": "str:['client', 'partner', 'government']", "age": "int:rand(1, 90)"}
  • Simple usage with default values:
    cli.py
  • Usage with data schema from file:
    cli.py --file_count=3 --file_name=super_data --prefix=count --data_schema=./path/to/schema.json
  • Usage with provided data_schema from cmd: cli.py --file_count=3 --file_name=super_data --prefix=count --multiprocessing=4 --data_schema="{\"date\": \"timestamp:\", \"name\": \"str:rand\", \"type\" : \"['client', 'partner', 'government']\", \"age\":\"int:rand(1, 90)\"}"

    TODO in future:

  • Add testing
  • Add clear_path flag
  • About

    Python program to generate customizable JSON test data

    Topics

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published

    Languages