-
Notifications
You must be signed in to change notification settings - Fork 35
Description
As mentioned here, we are seeing some file system hashes that are floats despite the name records (report.data["name_records"]
) containing integer values.
I looked into this and apparently the issue is log-dependent, where some logs are generating pandas
dataframes with ids/hashes of type int64
and other logs are returning dataframes with ids/hashes of type float64
. I traced this back to darshan.report.mod_read_all_records()
, in the if dtype == pandas
code block. Each record (pre-concatenation) has id's that are either int64
's or uint64
's, but upon concatenation, get uint64
's get converted to float64
's. This is apparently expected behavior for pandas
since np.int64
's cannot contain the largest values from uint64
's. I was able to find some discussion about it here.
This is unwanted behavior since we want to be able to make safe comparisons between the file hashes stored in report.data["name_records"]
and the counter/fcounter dataframes. As far as solutions go, I don't think we can force pandas
to keep the uint64
type when it concatenates the dataframes, but we could use astype()
to change the float64
's back to uint64
's after the the concatenation step.