-
Notifications
You must be signed in to change notification settings - Fork 118
Description
There may be a hidden math error in some scheduling models: generally, any model where the is a shift term that is the square of a time period value (start time, end time, duration).
Trip and tour start and end times are stored as int8 in the various tables where they appear. By extension, durations also may be int8. This seems superficially fine, as the most ambitious model probably won't have less than 15 minute time periods, so the max number of periods in a day would by 96, less than the max int8 value of 127.
BUT: If you write an expression in a spec file doing math with time periods, AND if that expression might overflow the int8 e.g. by squaring, AND if you don't explicitly upcast the to a more appropriate dtype, THEN the math can overflow and wrap around. We observe this if we take the square of an int8 array of time period values:
>>> pd.Series(np.arange(24), dtype=np.int8) ** 2
0 0
1 1
2 4
3 9
4 16
5 25
6 36
7 49
8 64
9 81
10 100
11 121
12 -112
13 -87
14 -60
15 -31
16 0
17 33
18 68
19 105
20 -112
21 -71
22 -28
23 17
dtype: int8
The consequences of this math error may wash out if associated coefficients are zero, or if the squared values are always 11 or less by construction or coincidence, or potentially for other reasons. Some study will be required to determine if that is the case, but my preliminary guess is that it does not wash out at least sometimes, based on discovering this through anomalies between sharrow-with-time-windows and legacy code on the prototype MWCOG model.
Based on a cursory review of model spec files in our repository, I observe that scheduling for joint and non-mandatory tours may be affected in these models:
- ARC
- MWCOG
- SEMCOG
- anyone using one of these models as a donor (probably CMAP, Met Council)
cc: @guyrousseau @JilanChen @mmoran3 @ray-ngo @jfdman @i-am-sijia