-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Adds bounding boxes conversion #2710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2710 +/- ##
==========================================
+ Coverage 72.93% 73.05% +0.12%
==========================================
Files 95 96 +1
Lines 8239 8298 +59
Branches 1279 1291 +12
==========================================
+ Hits 6009 6062 +53
Misses 1838 1838
- Partials 392 398 +6
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @oke-aditya for the PR! Few comments below. Additionally:
- For completeness: shouldn't we also have
box_cxcywh_to_xywh
andbox_xywh_to_cycywh
? I know you advocated against it, but I think we should discuss this. @fmassa? - Why did you add the
pytorch-sphinx-theme
as submodule? I'm pretty sure we shouldn't do that, as the documentation is not build here.
I will do the suggested changes. I just wrote code that just works for now, I guess it needs cleaning as suggested. Also,
|
True, but in that case I would ask, why we went for this set of two representations and not some other way. In general, IMO for conversion functions it is always a good idea to have a "core" representation and perform all other conversions only to and from this. Since we only have 3 different representations here, I think we should simply implement them all.
Don't be. That is why we have code review 😉 |
The reason why this of representations is the detection models in torchvision accept xy xy format. I agree that we should provide generic functions. Right now we have only 3 different representations (cxcywh xywh xyxy) to provide. Hence, building minimal stuff that can do the job was the plan. I would be happy to add these functions for interconvertability but let @fmassa have a thought. |
Signed-off-by: Aditya Oke <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR @oke-aditya and for the reviews @pmeier and @vfdev-5 !
I have a couple of comments, most notably that we should avoid in-place operations in the input argument, I already had subtle bugs in the past because of this (imagine your results after the 1st epoch being completely wrong).
I also have a meta question that I would like to discuss here: we originally discussed about adding 2 conversion functions (xyxy_to_xywh
and the other way around), but we also added the cxcywh_to_xyxy
variants as well.
This brings the question of the scalability of the approach, as for each new format it adds at least 2 new functions (or ~ 2 * (n ** 2 - (n - 1) ** 2)
if we do the full conversion matrix).
For reference, Detectron2 uses a BoxMode
class to represent / implement the conversion types, which lets it handle the conversions as it wishes, with only a single entry-point.
I'm not advocating for using BoxMode
(or something like this), but my original idea was that we would only be adding support for xyxy
and xywh
, which is still manageable.
I'm looking forward to your thoughts
Either you have a typo in your equation or something is off. It reduces down to
Speaking form a position of ignorance as I've not worked with object detection very often: do we even need to scale this? I mean are there more than the three representations (bottom left with height and width / center with height and width / bottom left and top right) at all? Sure, it is possible that any corner might be used as anchor as well as any two opposite corners might be used, but is that common? If that is the case, we should discuss which variants we want to support and how elaborate the support should be. If not I think we can implement 2 functions for each representation and be done with it as we are a general vision library rather than specialized for object detection. |
I'm not highly experienced or qualified but would like to share some thoughts (ignore if they make no sense).
Let me know thoughts, let us not create a feature that is not maintainable by us. |
My intent was to say how many more functions we would need to add if we were to go from
We don't need to add all possible conversion combinations, but just by the fact that we are already adding 4 new functions makes me think that this approach doesn't scale. I'm ok to always have the conversions passing through Let me illustrate with another examples on why I think we should follow a different approach here:
Here is my proposal: implement a function called Thoughts? |
Great thoughts. I guess it makes much more sense and generic. User can simply pass 2 strings and get his bounding boxes converted without thinking much. So user side it is less headache. Coming to our side, if we provide such a function. We would need to provide all conversions as the user will not know which methods are possible, he would simply expect that the boxes should be converted! All conversions can occur through I completely agree with this opinion, really a good idea (probably we should have discussed in issue more and I did hurry in jumping to code, sorry for that) |
I would go with the approach you mentioned just afterwards -- always go through Here is some pseudo-code illustrating one potential implementation: def convert_boxes(boxes, in_fmt, out_fmt):
allowed_fmts = ...
assert in_fmt in allowed_fmts
assert out_fmt in allowed_fmts
if in_fmt == out_fmt:
return boxes.clone() # to ensure always returning a copy
if in_fmt != 'xyxy' and out_fmt != 'xyxy':
# convert one to xyxy and change either in_fmt or out_fmt to xyxy
# dispatch to the existing functions
... Also, I think it might be preferable to spell it as |
We can name to Right now I think I will refactor to work internally with Let me refactor the code. Should I do in a new PR or continue here? this PR will become quite dirty. I guess all these conversion functions used internally be named as This will effectively save a lot of efforts in maintaining docs for these functions internally and just maintain clear documentation and usage of the above |
We can continue in this PR, and the proposal of renaming the current functions as Good point about Don't worry about the history of the commits, it will all get squashed by GitHub before merging. One thing to keep in mind: can you add a test for torchscriptability as well? Something like out = box_convert(boxes, 'xyxy', 'xywh')
scripted_fn = torch.jit.script(box_convert)
out_script = scripted_fn(boxes, 'xyxy', 'xywh')
self.assertTrue((out - out_script).abs().max() < TOLERANCE) This will ensure that our transform is ready to be exported to C++. Let us know if you have issues making the code work with torchscript. And thanks a lot for your help! |
Sorry for the delay. The other utility conversion functions which are few for now (might grow in future) I shifted to separate file Simply refactored the tests for this new API. I added tests for all conversions I think, do let me know if I missed something. Documentation is only generated for Let me know if this works and if it needs changes :-) |
I added the JIT test as well, but it somehow kept failing for me locally, I'm not sure about it. Can someone have a look, please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks great, thanks a lot for all your work @oke-aditya !
Do you remember what was the test failure that you were facing with torchscript? From looking at your implementation I don't see why it should fail.
I only have a couple of documentation suggestions, the other comment can be left for a future PR
torchvision/ops/boxes.py
Outdated
if in_fmt == "xywh": | ||
boxes_xyxy = _box_xywh_to_xyxy(boxes) | ||
if out_fmt == "cxcywh": | ||
boxes_converted = _box_xyxy_to_cxcywh(boxes_xyxy) | ||
|
||
elif in_fmt == "cxcywh": | ||
boxes_xyxy = _box_cxcywh_to_xyxy(boxes) | ||
if out_fmt == "xywh": | ||
boxes_converted = _box_xyxy_to_xywh(boxes_xyxy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is fine, my first thought was to so something like the following
if in_fmt == "xywh":
boxes = _box_xywh_to_xyxy(boxes)
in_fmt = "xyxy"
elif in_fmt == "cxcywh":
boxes = _box_cxcywh_to_xyxy(boxes)
in_fmt = "xyxy"
and let the rest of the dispatch to be done in the last branch. This way, we don't need to replicate the out
dispatch logic here.
You don't need to change this here so that we can move forward quickly with this PR, but it would be good to send a follow-up PR improving this part after this PR gets merged. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I will leave this for next PR, my original idea was we support direct conversions at some point of time and it should be simpler to refactor for future. But this too works fine.
# def test_bbox_convert_jit(self): | ||
# box_tensor = torch.tensor([[0, 0, 100, 100], [0, 0, 0, 0], | ||
# [10, 15, 30, 35], [23, 35, 93, 95]], dtype=torch.float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two options here:
- we merge the PR now and try to fix torchscript later
- we fix torchscript right now.
Do you remember what type of errors you were facing? I'm fine with both approaches, so that we can move forward with this PR (but we should fix torchscript soon if we merge this without torchscript support)
I will add these documentation fixes. (I guess all the operations in IIRC torchscript failed due to not finding some else block for if (I will post the error stack in the new PR) The code can be cleaned up as you suggested, but I would like to have torchscript support first and then clean up. |
Let's leave those changes to a separate PR
I guess in one of the follow up PRs, I will clean all the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, let's fix torchscript and the minor refactorings in a follow-up PR.
Thanks a lot @oke-aditya !
Just let me add documentation. I'm about to push changes 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! Looking forward to the torchscript improvements!
* adds boxes conversion * adds documentation * adds xywh tests * fixes small typo * adds tests * Remove sphinx theme * corrects assertions * cleans code as per suggestion Signed-off-by: Aditya Oke <[email protected]> * reverts assertion * fixes to assertEqual * fixes inplace operations * Adds docstrings * added documentation * changes tests * moves code to box_convert * adds more tests * Apply suggestions from code review Let's leave those changes to a separate PR * fixes documentation Co-authored-by: Francisco Massa <[email protected]>
* adds boxes conversion * adds documentation * adds xywh tests * fixes small typo * adds tests * Remove sphinx theme * corrects assertions * cleans code as per suggestion Signed-off-by: Aditya Oke <[email protected]> * reverts assertion * fixes to assertEqual * fixes inplace operations * Adds docstrings * added documentation * changes tests * moves code to box_convert * adds more tests * Apply suggestions from code review Let's leave those changes to a separate PR * fixes documentation Co-authored-by: Francisco Massa <[email protected]>
Closes #2687
I have added as per the Issue, two utility function to convert boxes to pascal VOC format (x1 y1 x2 y2).
The tests convert boxes to other format and vice-versa. This ensures that both operations are identical and can be interchangeably used. Tests passed locally.
This is ready for review. Do let me know!
cc @pmeier