Skip to content

Conversation

mnc537
Copy link
Contributor

@mnc537 mnc537 commented Feb 23, 2020

Hello,

This PR allows FasterRCNN to train with negative samples.
Related to: #1598

When defining the dataset, one needs to set the field boxes in the target dict as torch.zeros((0, 4), dtype=torch.float32) for negative images since boxes is required.

This is how target should look like:

target = {}
target["boxes"] = torch.zeros((0, 4), dtype=torch.float32)
target["labels"] = torch.zeros((1, 1), dtype=torch.int64)
target["image_id"] = image_id
target["area"] = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
target["iscrowd"] = torch.zeros((0,), dtype=torch.int64)

@mnc537 mnc537 changed the title Negative faster rcnn Train Faster R-CNN with negative samples Feb 23, 2020
@mnc537 mnc537 requested a review from fmassa February 23, 2020 18:50
@cpuhrsch cpuhrsch self-requested a review February 25, 2020 00:36
@cpuhrsch
Copy link
Contributor

cpuhrsch commented Mar 2, 2020

I'm out of depth for this particular code-piece and will leave the review to @fmassa

@cpuhrsch cpuhrsch removed their request for review March 2, 2020 18:25
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks great Monica, thanks a lot!

I have only a few minor comments, can you look into it? Once they are addressed this will be good to merge.

Also, I think that we should also make sure that mask r-cnn and keypoint r-cnn work with empty targets, could you look into it in a follow-up PR?

Once again thanks a lot!

@@ -730,7 +743,8 @@ def forward(self, features, proposals, image_shapes, targets=None):
for t in targets:
# TODO: https://github.com/pytorch/pytorch/issues/26731
floating_point_types = (torch.float, torch.double, torch.half)
assert t["boxes"].dtype in floating_point_types, 'target boxes must of float type'
if t["boxes"] is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this conditional is not needed anymore


gt_boxes_in_image = gt_boxes[img_id]
if gt_boxes_in_image.numel() == 0:
gt_boxes_in_image = torch.zeros((1, 4), dtype=dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to take the device of gt_boxes_in_image into account?

Comment on lines 580 to 583
clamped_matched_idxs_in_image = torch.zeros(
(proposals_in_image.shape[0],), dtype=torch.int64
)
labels_in_image = torch.zeros((proposals_in_image.shape[0],), dtype=torch.int64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be safer if we also take the device of proposals_in_image into account while creating those tensors

@codecov-io
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (master@d45a77d). Click here to learn what that means.
The diff coverage is 0%.

Impacted file tree graph

@@           Coverage Diff            @@
##             master   #1911   +/-   ##
========================================
  Coverage          ?   0.48%           
========================================
  Files             ?      92           
  Lines             ?    7442           
  Branches          ?    1133           
========================================
  Hits              ?      36           
  Misses            ?    7393           
  Partials          ?      13
Impacted Files Coverage Δ
torchvision/models/detection/roi_heads.py 0% <0%> (ø)
torchvision/models/detection/rpn.py 0% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d45a77d...b7ed9ac. Read the comment docs.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot Monica!

@fmassa fmassa merged commit e75b497 into pytorch:master Mar 20, 2020
@rronan
Copy link

rronan commented Mar 20, 2020

Thanks for this feature!

This is how target should look like:

target = {}
target["boxes"] = torch.zeros((0, 4), dtype=torch.float32)
target["labels"] = torch.zeros((1, 1), dtype=torch.int64)
target["image_id"] = image_id
target["area"] = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
target["iscrowd"] = torch.zeros((0,), dtype=torch.int64)

One question though: why does target["labels"] look like this? Shouldn't it be of length 0 (like boxes), or with only one dimension (not two) like specified here:
https://github.com/pytorch/vision/blob/master/torchvision/models/detection/faster_rcnn.py#L47

@fmassa
Copy link
Member

fmassa commented Mar 20, 2020

@rronan I agree, I was wondering the same thing. @mnc537 is this a typo in the test?

@mnc537
Copy link
Contributor Author

mnc537 commented Mar 23, 2020

@rronan, right! It should be of length 0. There is a typo in the test, @fmassa. I'll fix it.

@fmassa
Copy link
Member

fmassa commented Mar 23, 2020

Thanks @mnc537 !

@rronan
Copy link

rronan commented Mar 24, 2020

Thank you @mnc537. I've been using this PR and did not experience any issue yet.

fmassa added a commit to fmassa/vision-1 that referenced this pull request Jun 9, 2020
* modified FasterRCNN to accept negative samples

* remove debug lines

* Change torch.zeros_like to torch.zerros

* Add unit tests

* take the `device` into account

Co-authored-by: Francisco Massa <[email protected]>
@Kirayue
Copy link

Kirayue commented Nov 11, 2020

Hi, @fmassa, @mnc537

I have a question about the targe["labels"],

According to the PR,

target = {}
target["boxes"] = torch.zeros((0, 4), dtype=torch.float32)
target["labels"] = torch.zeros((1, 1), dtype=torch.int64)
target["image_id"] = image_id
target["area"] = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
target["iscrowd"] = torch.zeros((0,), dtype=torch.int64)

@mnc537 said there was a typo, so the target should look like
target["labels"] = torch.zeros((1, ), dtype=torch.int64) or
target["labels"] = torch.ones((0, ), dtype=torch.int64) or
target["labels"] = torch.zeros((1, 1), dtype=torch.int64). or
target["labels"] = torch.zeros((0, )), dytype=torch.int64

I tried all above but there weren't any errors.

Thank you for the feature.

@dccf36
Copy link

dccf36 commented Jun 23, 2021

Does anyone encounter the problem that loss explodes after I add the negative background

@samra-irshad
Copy link

@mnc537 Thanks for this feature. Just wondering, do we need to increase the number of classes once we add the targets for negative samples?

@Kirayue
Copy link

Kirayue commented Aug 11, 2021

@samra-irshad The number of classes is the number of your objects + 1(negative samples, or image with only background).

image

https://pytorch.org/vision/stable/models.html#id35

@samra-irshad
Copy link

@Kirayue So background class (0) and negative samples (images with no object) should have same label? Or I should allocate an additional label to images with no objects?

@Kirayue
Copy link

Kirayue commented Aug 11, 2021

@samra-irshad, they are the same, so you do not need to add an additional label to indicate no objects.

@ashep29
Copy link

ashep29 commented Oct 25, 2021

This error occurs when I try to use negative sampling on unlabelled images using:

target["boxes"] = torch.zeros((0, 4), dtype=torch.float32)
target["labels"] = torch.zeros((1, 1), dtype=torch.int64)

ValueError: Expected target boxes to be a tensorof shape [N, 4], got torch.Size([0]).

Any ideas on how to address this?

@jodumagpi
Copy link

@mnc537 how should the target masks look like??

@carsumptive
Copy link

Hello, this is a feature that I wish was elaborated upon a bit further in the docs as it is quite useful and I am trying to get it to work. I believe I can implement it when using the Detecto library but I am having a couple of issues, thought it would be best to follow up on this thread if anyone is watching it.

  1. When passing 0,0,0,0 box values into the dataloader, the torchvision library's function in generalizedrcnn stops me for having degenerate boxes because the dimensions are wrong. Any word on working around this? I assume that logic is there for a reason.. This PR just doesn't seem to address that, any help is greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.