Skip to content

Utility terms that compare pandas categorical variable to strings are not evaluated correctly with Sharrow #766

@i-am-sijia

Description

@i-am-sijia

Describe the bug
After implementing the string to pandas categorical conversion, some of our current CI tests failed. They all had Sharrow turned on and set to test mode. The utility calculated with and without Sharrow are different.

[48:39.10] INFO: completed flow_LQLDEWSFEGQ5W2NJNONCWPNFSCB7O5RD.load in 0:00:18.710844 stop_frequency.work.simple_simulate.eval_mnl
[48:39.10] INFO: completed apply_flow in 0:00:21.651810 
[48:39.10] INFO: elapsed time sharrow flow 0:00:21.659376 stop_frequency.work.simple_simulate.eval_mnl
[48:39.31] INFO: elapsed time simple flow 0:00:00.207622 stop_frequency.work.simple_simulate.eval_mnl.eval_utils

Not equal to tolerance rtol=0.01, atol=0
utility not aligned
Mismatched elements: 132 / 144 (91.7%)
Max absolute difference: 1998.00011762
Max relative difference: 1729.2712081
 x: array([[    0.     , -1000.9582 , -1002.2882 , -1002.6522 , -1001.3462 ,
        -2000.7913 , -2002.1212 , -2002.4852 , -1003.1262 , -2002.5713 ,
        -2003.9012 , -2003.5703 , -1004.4472 , -2003.8922 , -2004.5272 ,...
 y: array([[ 0.000000e+00, -1.958200e+00, -3.288200e+00, -3.652200e+00,
        -2.346200e+00, -2.791200e+00, -4.121200e+00, -4.485200e+00,
        -4.126200e+00, -4.571200e+00, -5.901200e+00, -5.570200e+00,...
big problem: 132 missed close values out of 144 (91.67%)
sh_util.shape=(9, 16)
(array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4,
       4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6,
       6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7,
       7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8]), array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,  1,  2,
        3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,  1,  2,  3,  4,
        5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,  1,  2,  3,  5,  6,  7,
        9, 10, 11, 13, 14, 15,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11,
       12, 13, 14, 15,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13,
       14, 15,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,  1,  2,
        3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15]))
possible problematic expressions:
  11.1% [043] (school_esc_outbound.isin(['ride_share', 'pure_escort']))
  00.0% [044] (school_esc_inbound.isin(['ride_share', 'pure_escort']))
[48:43.24] ERROR: ===== ERROR IN stop_frequency =====
[48:43.24] ERROR: 
Not equal to tolerance rtol=0.01, atol=0
utility not aligned

To Reproduce
Steps to reproduce the behavior:

  1. Check out ...
  2. Run test_mtc_extended.py::test_prototype_mtc_extended_sharrow()

Expected behavior
The utility with and with Sharrow should be the same.

Screenshots
result of tracing the failed tour, in the stop_frequency.work:
Chooser
image

Non-sharrow evaluation
image

Sharrow evaluation]
image

Additional context
Temporary solution: I moved the pandas categorical vs string comparisons to the preprocessors.

Metadata

Metadata

Assignees

Labels

BugSomething isn't working/bug f

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions