Skip to content

Conversation

@dhensle
Copy link
Contributor

@dhensle dhensle commented May 9, 2024

Adds the option of explicit chunking to all interaction simulate models that were not already hooked-up. These include destination choice, location choice, and scheduling.

Also implemented a feature where the explicit_chunk setting can be less than 1. If less than one, it specifies the fraction. So explicit_chunk: 0.1 would mean that there would be 10 chunks. If greater than 1, explicit_chunk remains the total number of rows in the chooser table.

@dhensle dhensle marked this pull request as ready for review May 21, 2024 17:43
@dhensle
Copy link
Contributor Author

dhensle commented May 21, 2024

Sharing some testing results for posterity.

Used TransLink's model at a 10% sample size. No chunking looks like this:
translink_10pct_no_chunk

I then set chunk_training to explicit with the following explicit_chunk settings for submodels:

  • workplace_location: 0.5
  • mandatory_tour_scheduling: 0.25
  • non_mandatory_tour_destination: 0.5
  • non_mandatory_tour_scheduling: 0.5
  • trip_destination: 0.5

(notice the y-axis scale difference from the above plot)
translink_10pct_exp_chunk

Run time for no chunking was 113.9 minutes and for explicit chunking was 115 minutes -- very minimal increase in runtime.

Copy link
Member

@jpn-- jpn-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Couple minor changes to simplify.


num_choosers = len(choosers.index)

explicit_and_odd_num_choosers = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of explicit_and_odd_num_choosers is not needed. It's perfectly fine for some multiple of rows_per_chunk to overrun the end of the choosers by a bit, slicing beyond the end of the range. If it were needed, checking for odd wouldn't be enough, we'd need to adjust based on the inverse of the number of chunks (e.g. 0.25 won't line up unless the total is divisible by 4 not 2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the changes in this commit: dhensle@1ed94c5

I was hitting the assert statement for the alts, (not the choosers) which prompted me to try to adjust the rows_per_chunk. But good point about odd not being good enough. I removed this functionality and replace the assert statement with a simple check on overflow of the alt index. I think the solution in the above commit fixes the issue (it runs successfully), but suggest you take a look. Thanks!

& (i == estimated_number_of_chunks)
& (rows_per_chunk > 1)
):
# last chunk may be smaller than chunk_size due to rounding error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to update the rows_per_chunk here, as noted above we can overrun the end of the choosers and be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants