The Data Generation Utility is a command-line tool designed to aid in testing data pipelines by generating customizable JSON test data. Whether you're validating data transformations, testing data validations, or simply need mock data for development purposes, this utility simplifies the process.
Name | Description | Behaviour |
---|---|---|
path_to_save_files | Where all files need to save | 2 ways to define path:
Relatively from cwd and absolute '.' - means current path (default) |
files_count | How much json files to generate | Default: 4 |
file_name | Base file_name | If there is no prefix, the file name will be file_name.json With prefix full file name will be file_name_file_prefix.json |
file_prefix | Prefix name for file if more than 1 file needs to be generated | Possible choices: |
data_schema | String with json schema | It can be loaded in two ways:
|
data_lines | Count of lines for each file | Default: 100 |
multiprocessing | The number of processes used to create files | Default: 1 |
{"date":"timestamp:", "name": "str:rand", "type": "str:['client', 'partner', 'government']", "age": "int:rand(1, 90)"}
cli.py
cli.py --file_count=3 --file_name=super_data --prefix=count --data_schema=./path/to/schema.json
cli.py --file_count=3 --file_name=super_data --prefix=count --multiprocessing=4 --data_schema="{\"date\": \"timestamp:\", \"name\": \"str:rand\", \"type\" : \"['client', 'partner', 'government']\", \"age\":\"int:rand(1, 90)\"}"