Contains the code used to train a close relative of the neural network autocoder
described in Deep neural networks for worker injury autocoding.
Compared to the model described in the paper, examples/big_single_seq_180_lr4e-4.py
is significantly faster with similar performance, made possible by more efficient
batching due to the concatenation of all text inputs.
Note: more recent versions of these libraries will likely also work.
- Anaconda (with Python 3.6)
- Tensorflow 1.8
- Keras 2.1.6
- NLTK 3.2.5
- Sklearn
The model expects a CSV training set with the following columns and data:
| column header | description | example |
|---|---|---|
| survey_year | The year of the incident | 2017 |
| occupation_text | The worker's job title | RN |
| other_text | Optional field indicating the worker's job category | elder care |
| company_name | The primary name of the worker's establishment | ACME hospitals inc. |
| secondary_name | The secondary name of the worker's establishment | ACME holding corp. |
| unit_description | Description of the sampled establishment | hospital staff only |
| nar_activity | A narrative answering "What was the worker doing before the incident occurred" | Helping patient get out of bed |
| nar_event | A narrative answering "What happened" | The patient slipped and employee tried to catch her |
| nar_nature | A narrative answering "What was the injury or illness?" | Employee strained lower back |
| nar_source | A narrative answering "What obect or substance directly harmed the employee?" | Patient and floor |
| naics | The 6 digit 2012 North American Industry Classification System (NAICS) code for the establishment | 622110 |
| soc | The 6 digit 2010 Standard Occupational Classification (SOC) code for the worker | 29-1141 |
| nature_code | The 2.01 OIICS nature code | 1233 |
| part_code | The 2.01 OIICS part code | 322 |
| event_code | The 2.01 OIICS event code | 7143 |
| source_code | The 2.01 OIICS source code | 574 |
| sec_source_code | The 2.01 OIICS secondary source code (blank means none present) | |
| office | Checkbox indicating the job category of the worker (X or blank) | X |
| sales | Checkbox indicating the job category of the worker (X or blank) | X |
| assembly | Checkbox indicating the job category of the worker (X or blank) | X |
| repair | Checkbox indicating the job category of the worker (X or blank) | X |
| construction | Checkbox indicating the job category of the worker (X or blank) | X |
| health | Checkbox indicating the job category of the worker (X or blank) | X |
| driving | Checkbox indicating the job category of the worker (X or blank) | X |
| food | Checkbox indicating the job category of the worker (X or blank) | X |
| maintenance | Checkbox indicating the job category of the worker (X or blank) | X |
| material_handling | Checkbox indicating the job category of the worker (X or blank) | X |
| farming | Checkbox indicating the job category of the worker (X or blank) | X |
| other | Checkbox indicating the job category of the worker (X or blank) | X |
pip install git+https://github.com/USDepartmentofLabor/soii_neural_autocoder.git
Modify the data_file variable in examples/big_single_seq_180_lr4e-4.py to point
to an appropriately formatted training dataset, or leave as is to use the dummy
training set included in the module. Then run big_single_seq_180_lr4e-4.py.
Model checkpoints and a log of training results will be saved in the same directory.