Skip to content

Numpy casting of list of large python ints to np.float64 in cocoEval.evaluate #330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
underchemist opened this issue Oct 16, 2019 · 0 comments

Comments

@underchemist
Copy link

Hi,

This is more of a cautionary tale, and is unlikely to be fixed in numpy anytime soon (See numpy/numpy#7126, numpy/numpy#5745, numpy/numpy#12525 for a sample).

I have made a custom dataset using the coco format, in which the image ids are computed as a hash of the image. These hashes are stored as the image id as integers. They tend to be rather large, and if you load them into a numpy array and check dtype, some are int64 and others are larger and thus uint64. So when I am attempting to evaluate my dataset, I might do something like

from pycocotools.coco import COCO
from pycocotools.coco import COCOeval

cocoGt = COCO('path/to/dataset.json')
cocoDt = cocoGt.loadRes('path/to/results.json')
cocoEval = COCOeval(cocoGt, cocoDt, 'bbox')
cocoEval.params.imgIds = cocoGt.getImgIds()[:10]
cocoEval.evaluate()
cocoEval.accumulate()
cocoEval.summarize()
Running per image evaluation...      
DONE (t=0.46s).
Accumulating evaluation results...   
DONE (t=0.38s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = -1
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = -1
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = -1
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = -1
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = -1
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = -1
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1

After some digging I realized that the -1 values were default values and I've managed to track this down to

cocoEval.evaluate()
|
--> cocoeval.py: line 134: p.imgIds = list(np.unique(p.imgIds))

in this case, np.unique will cast the input object to a numpy array, which due to being a mix of np.int64 and np.uint64 will be cast to np.float64. Some examples,

>>> import numpy as np
>>> a = np.iinfo(np.int64).max
>>> b = np.iinfo(np.uint64).max
>>> np.array([a, a]).dtype
dtype('int64')
>>> np.array([b, b]).dtype
dtype('uint64')
>>> np.array([a, b]).dtype
dtype('float64')

This becomes a problem later on when p.imgIds is a list of float values and we are trying to use them as dictionary key values. For example in coco.py line 145

lists = [self.imgToAnns[imgId] for imgId in imgIds if imgId in self.imgToAnns]
# list = []

will return an empty list since large float64 values will fail the conditional in the list comprehension i.e. bool(1.83426386843345553e+18 in [183426386843345553]) # False.

My workaround to all of this is to pass a numpy array with the correct dtype to cocoEval.params.imgIds before I call evaluate(). However I'm wondering if it would be wise to do some validation on the dtype of the imgIds object, or at least raise a warning? This was a bit tricky to find as there was no obvious error.

  • Is it possible to avoid the np.unique call so we don't cast to a numpy array and then back to list?
  • if np.unique should be kept, what minimal thing should be done to avoid this situation or at least warn the user that their imgIds may not behave correctly?

I'm happy to submit a PR if this seems important, would appreciate some input on the the above points

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant