Skip to content

Adds bounding boxes conversion #2710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Oct 1, 2020
Merged

Adds bounding boxes conversion #2710

merged 23 commits into from
Oct 1, 2020

Conversation

oke-aditya
Copy link
Contributor

@oke-aditya oke-aditya commented Sep 27, 2020

Closes #2687

  • Added code
  • Added documentation
  • Added tests

I have added as per the Issue, two utility function to convert boxes to pascal VOC format (x1 y1 x2 y2).

The tests convert boxes to other format and vice-versa. This ensures that both operations are identical and can be interchangeably used. Tests passed locally.

This is ready for review. Do let me know!

cc @pmeier

@codecov
Copy link

codecov bot commented Sep 27, 2020

Codecov Report

Merging #2710 into master will increase coverage by 0.12%.
The diff coverage is 89.83%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2710      +/-   ##
==========================================
+ Coverage   72.93%   73.05%   +0.12%     
==========================================
  Files          95       96       +1     
  Lines        8239     8298      +59     
  Branches     1279     1291      +12     
==========================================
+ Hits         6009     6062      +53     
  Misses       1838     1838              
- Partials      392      398       +6     
Impacted Files Coverage Δ
torchvision/ops/boxes.py 93.25% <78.57%> (-6.75%) ⬇️
torchvision/ops/__init__.py 100.00% <100.00%> (ø)
torchvision/ops/_box_convert.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 786ec32...2425f45. Read the comment docs.

Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @oke-aditya for the PR! Few comments below. Additionally:

  1. For completeness: shouldn't we also have box_cxcywh_to_xywh and box_xywh_to_cycywh? I know you advocated against it, but I think we should discuss this. @fmassa?
  2. Why did you add the pytorch-sphinx-theme as submodule? I'm pretty sure we shouldn't do that, as the documentation is not build here.

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Sep 27, 2020

I will do the suggested changes. I just wrote code that just works for now, I guess it needs cleaning as suggested.

Also,

  1. I advised against it as we can simply cascade these two operations. box_xyxy_to_xywh(box_cxcy_to_xyxy) and obtain the required result if needed. This is something which I guess we can leave or have a function which exactly does this. This can be discussed and added if needed.

  2. Very sorry for sphinx docs, I committed and pushed by mistake. I have removed them.

@pmeier
Copy link
Collaborator

pmeier commented Sep 27, 2020

I advised against it as we can simply cascade these two operations.

True, but in that case I would ask, why we went for this set of two representations and not some other way. In general, IMO for conversion functions it is always a good idea to have a "core" representation and perform all other conversions only to and from this. Since we only have 3 different representations here, I think we should simply implement them all.

Very sorry for sphinx docs

Don't be. That is why we have code review 😉

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Sep 27, 2020

The reason why this of representations is the detection models in torchvision accept xy xy format.
Also why only xywh to xyxy and cxcywh to xyxy is because only these two I could find which were being used, do let me know if other are needed!

I agree that we should provide generic functions. Right now we have only 3 different representations (cxcywh xywh xyxy) to provide.
Later if these increase, we would be bound to provide other conversions for consistency.

Hence, building minimal stuff that can do the job was the plan.

I would be happy to add these functions for interconvertability but let @fmassa have a thought.
Both the sides have fair point! It's just choice and design we decide

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR @oke-aditya and for the reviews @pmeier and @vfdev-5 !

I have a couple of comments, most notably that we should avoid in-place operations in the input argument, I already had subtle bugs in the past because of this (imagine your results after the 1st epoch being completely wrong).

I also have a meta question that I would like to discuss here: we originally discussed about adding 2 conversion functions (xyxy_to_xywh and the other way around), but we also added the cxcywh_to_xyxy variants as well.
This brings the question of the scalability of the approach, as for each new format it adds at least 2 new functions (or ~ 2 * (n ** 2 - (n - 1) ** 2) if we do the full conversion matrix).
For reference, Detectron2 uses a BoxMode class to represent / implement the conversion types, which lets it handle the conversions as it wishes, with only a single entry-point.

I'm not advocating for using BoxMode (or something like this), but my original idea was that we would only be adding support for xyxy and xywh, which is still manageable.

I'm looking forward to your thoughts

@pmeier
Copy link
Collaborator

pmeier commented Sep 28, 2020

@fmassa

This brings the question of the scalability of the approach, as for each new format it adds at least 2 new functions (or ~ 2 * (n ** 2 - (n - 1) ** 2) if we do the full conversion matrix).

Either you have a typo in your equation or something is off. It reduces down to 2 * (2*n - 1) and it should grow quadratically. IMO the number of functions for the full conversion matrix should be n * (n - 1).

I'm not advocating for using BoxMode (or something like this), but my original idea was that we would only be adding support for xyxy and xywh, which is still manageable.

I'm looking forward to your thoughts

Speaking form a position of ignorance as I've not worked with object detection very often: do we even need to scale this? I mean are there more than the three representations (bottom left with height and width / center with height and width / bottom left and top right) at all? Sure, it is possible that any corner might be used as anchor as well as any two opposite corners might be used, but is that common? If that is the case, we should discuss which variants we want to support and how elaborate the support should be.

If not I think we can implement 2 functions for each representation and be done with it as we are a general vision library rather than specialized for object detection.

@oke-aditya
Copy link
Contributor Author

I'm not highly experienced or qualified but would like to share some thoughts (ignore if they make no sense).

  • It's not wise to have all the conversions as the matrix is quadratic and once we provide it we have to keep providing.

  • Probably cascading these operations gets the job done for now, so we have 2 conversions for each sets. I guess there are not many popular ways of representing boxes, I have seen just these three (correct me if I'm wrong).

  • If users requests to provide more such methods then we can think in future! Converting a representation to xyxy format is something we can provide easily and maintain.

Let me know thoughts, let us not create a feature that is not maintainable by us.

@oke-aditya oke-aditya requested a review from fmassa September 28, 2020 17:14
@fmassa
Copy link
Member

fmassa commented Sep 29, 2020

Either you have a typo in your equation or something is off. It reduces down to 2 * (2*n - 1) and it should grow quadratically. IMO the number of functions for the full conversion matrix should be n * (n - 1).

My intent was to say how many more functions we would need to add if we were to go from n-1 to n modes.

Speaking form a position of ignorance as I've not worked with object detection very often: do we even need to scale this? I mean are there more than the three representations (bottom left with height and width / center with height and width / bottom left and top right) at all? Sure, it is possible that any corner might be used as anchor as well as any two opposite corners might be used, but is that common? If that is the case, we should discuss which variants we want to support and how elaborate the support should be.

We don't need to add all possible conversion combinations, but just by the fact that we are already adding 4 new functions makes me think that this approach doesn't scale. I'm ok to always have the conversions passing through xyxy, but still even in this case we already have a lot of functions.

Let me illustrate with another examples on why I think we should follow a different approach here:

Here is my proposal: implement a function called convert_box(boxes, input_fmt, output_fmt) (better names welcome!) which is the single entry-point for performing those conversions. This way, the user only needs to care about one function.
input_fmt and output_fmt can be strings such as xyxy and xywh, so that we don't need to introduce any new abstractions.

Thoughts?

@oke-aditya
Copy link
Contributor Author

Great thoughts. I guess it makes much more sense and generic. User can simply pass 2 strings and get his bounding boxes converted without thinking much. So user side it is less headache.

Coming to our side, if we provide such a function. We would need to provide all conversions as the user will not know which methods are possible, he would simply expect that the boxes should be converted!

All conversions can occur through xyxy internally, that is inefficient but still it would reduce our codebase and we would have less to maintain.
We can add further efficient operations without affecting the API. Internally we might have lot of functions but we expose just 1 to user.

I completely agree with this opinion, really a good idea (probably we should have discussed in issue more and I did hurry in jumping to code, sorry for that)

@fmassa
Copy link
Member

fmassa commented Sep 29, 2020

Coming to our side, if we provide such a function. We would need to provide all conversions as the user will not know which methods are possible, he would simply expect that the boxes should be converted!

I would go with the approach you mentioned just afterwards -- always go through xyxy as an intermediate representation. As you said, we can always optimize in the future if we want.

Here is some pseudo-code illustrating one potential implementation:

def convert_boxes(boxes, in_fmt, out_fmt):
    allowed_fmts = ...
    assert in_fmt in allowed_fmts
    assert out_fmt in allowed_fmts
    if in_fmt == out_fmt:
        return boxes.clone()  # to ensure always returning a copy
    if in_fmt != 'xyxy' and out_fmt != 'xyxy':
        # convert one to xyxy and change either in_fmt or out_fmt to xyxy
    # dispatch to the existing functions
    ...

Also, I think it might be preferable to spell it as convert_boxes instead of convert_box because it supports multiple boxes at once, but I think we already named it box_area in the past so maybe it's not that much of an issue?

@oke-aditya
Copy link
Contributor Author

We can name to convert_box it won't be an issue as we always expect box to be Tensor[N] E.g. box_area, box_iou so it stays consistent.

Right now I think I will refactor to work internally with xyxy and the prototype seems great to me.

Let me refactor the code.

Should I do in a new PR or continue here? this PR will become quite dirty.
Reason being I will need to change code, docs and tests. Though a lot of it is re-use.

I guess all these conversion functions used internally be named as _box_xyxy_to_xywh since these will be used internally and we won't provide docs as well as include them in __init__ and __all__ .

This will effectively save a lot of efforts in maintaining docs for these functions internally and just maintain clear documentation and usage of the above convert_box function.

@fmassa
Copy link
Member

fmassa commented Sep 29, 2020

We can continue in this PR, and the proposal of renaming the current functions as _box_xyxy_to_xywh is what I would have suggested as well.

Good point about box_area / box_iou, which makes me think that we should maybe name it as box_convert maybe?

Don't worry about the history of the commits, it will all get squashed by GitHub before merging.

One thing to keep in mind: can you add a test for torchscriptability as well? Something like

out = box_convert(boxes, 'xyxy', 'xywh')
scripted_fn = torch.jit.script(box_convert)
out_script = scripted_fn(boxes, 'xyxy', 'xywh')
self.assertTrue((out - out_script).abs().max() < TOLERANCE)

This will ensure that our transform is ready to be exported to C++. Let us know if you have issues making the code work with torchscript.

And thanks a lot for your help!

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Sep 30, 2020

Sorry for the delay.
As per discussion, I rearranged the code a bit. I made function box_convert in boxes.py as per discussion above.

The other utility conversion functions which are few for now (might grow in future) I shifted to separate file _box_convert.py. All renamed according to conventions. Let me know if this is fine, they would pollute the boxes.py file and occupy too much unnecessary code which is abstracted out hence I shifted it.

Simply refactored the tests for this new API.

I added tests for all conversions I think, do let me know if I missed something.

Documentation is only generated for box_convert function, not for others.

Let me know if this works and if it needs changes :-)

@oke-aditya oke-aditya requested review from fmassa and pmeier September 30, 2020 17:35
@oke-aditya
Copy link
Contributor Author

oke-aditya commented Sep 30, 2020

I added the JIT test as well, but it somehow kept failing for me locally, I'm not sure about it. Can someone have a look, please?
I just commented it out to avoid CI failure here.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks great, thanks a lot for all your work @oke-aditya !

Do you remember what was the test failure that you were facing with torchscript? From looking at your implementation I don't see why it should fail.

I only have a couple of documentation suggestions, the other comment can be left for a future PR

Comment on lines 160 to 168
if in_fmt == "xywh":
boxes_xyxy = _box_xywh_to_xyxy(boxes)
if out_fmt == "cxcywh":
boxes_converted = _box_xyxy_to_cxcywh(boxes_xyxy)

elif in_fmt == "cxcywh":
boxes_xyxy = _box_cxcywh_to_xyxy(boxes)
if out_fmt == "xywh":
boxes_converted = _box_xyxy_to_xywh(boxes_xyxy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is fine, my first thought was to so something like the following

            if in_fmt == "xywh":
                boxes = _box_xywh_to_xyxy(boxes)
                in_fmt = "xyxy"
            elif in_fmt == "cxcywh":
                boxes = _box_cxcywh_to_xyxy(boxes)
                in_fmt = "xyxy"

and let the rest of the dispatch to be done in the last branch. This way, we don't need to replicate the out dispatch logic here.

You don't need to change this here so that we can move forward quickly with this PR, but it would be good to send a follow-up PR improving this part after this PR gets merged. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I will leave this for next PR, my original idea was we support direct conversions at some point of time and it should be simpler to refactor for future. But this too works fine.

Comment on lines +730 to +732
# def test_bbox_convert_jit(self):
# box_tensor = torch.tensor([[0, 0, 100, 100], [0, 0, 0, 0],
# [10, 15, 30, 35], [23, 35, 93, 95]], dtype=torch.float)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two options here:

  • we merge the PR now and try to fix torchscript later
  • we fix torchscript right now.

Do you remember what type of errors you were facing? I'm fine with both approaches, so that we can move forward with this PR (but we should fix torchscript soon if we merge this without torchscript support)

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Oct 1, 2020

I will add these documentation fixes.
I'm not quite experienced with torchscript and let us fix in a new follow-up PR. I will open it as soon as this gets merged.

(I guess all the operations in boxes have support for torchscript and let's keep it in October release as well)

IIRC torchscript failed due to not finding some else block for if (I will post the error stack in the new PR)

The code can be cleaned up as you suggested, but I would like to have torchscript support first and then clean up.

Let's leave those changes to a separate PR
@oke-aditya
Copy link
Contributor Author

I guess in one of the follow up PRs, I will clean all the assert statements, there are many places I could see such inconsistent use.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, let's fix torchscript and the minor refactorings in a follow-up PR.

Thanks a lot @oke-aditya !

@oke-aditya
Copy link
Contributor Author

Just let me add documentation. I'm about to push changes 😅

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Looking forward to the torchscript improvements!

@fmassa fmassa merged commit e70c91a into pytorch:master Oct 1, 2020
@oke-aditya oke-aditya deleted the bbox_conv branch October 1, 2020 11:26
bryant1410 pushed a commit to bryant1410/vision-1 that referenced this pull request Nov 22, 2020
* adds boxes conversion

* adds documentation

* adds xywh tests

* fixes small typo

* adds tests

* Remove sphinx theme

* corrects assertions

* cleans code as per suggestion

Signed-off-by: Aditya Oke <[email protected]>

* reverts assertion

* fixes to assertEqual

* fixes inplace operations

* Adds docstrings

* added documentation

* changes tests

* moves code to box_convert

* adds more tests

* Apply suggestions from code review

Let's leave those changes to a separate PR

* fixes documentation

Co-authored-by: Francisco Massa <[email protected]>
vfdev-5 pushed a commit to Quansight/vision that referenced this pull request Dec 4, 2020
* adds boxes conversion

* adds documentation

* adds xywh tests

* fixes small typo

* adds tests

* Remove sphinx theme

* corrects assertions

* cleans code as per suggestion

Signed-off-by: Aditya Oke <[email protected]>

* reverts assertion

* fixes to assertEqual

* fixes inplace operations

* Adds docstrings

* added documentation

* changes tests

* moves code to box_convert

* adds more tests

* Apply suggestions from code review

Let's leave those changes to a separate PR

* fixes documentation

Co-authored-by: Francisco Massa <[email protected]>
@pmeier pmeier mentioned this pull request May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bounding Boxes Conversions
4 participants