Skip to content

bpo-44539: Support recognizing JPEG files without JFIF or Exif markers #26964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 20, 2021

Conversation

mohamadmansourX
Copy link
Contributor

@mohamadmansourX mohamadmansourX commented Jun 30, 2021

Previous method to check JPG images was using the following command h[6:10] in (b'JFIF', b'Exif')
However, its not always the case as some might start with b'\xff\xd8\xff\xdb' header.
\xdb defining the Quantization Table.

Reference:
https://www.digicamsoft.com/itu/itu-t81-36.html
https://web.archive.org/web/20120403212223/http://class.ee.iastate.edu/ee528/Reading%20material/JPEG_File_Format.pdf

As an example, the attached image returned None from the imghdr.what(filename)

1

https://bugs.python.org/issue44539

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@mohamadmansourX

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

mohamadmansourX added a commit to mohamadmansourX/PaddleOCR that referenced this pull request Jun 30, 2021
`imghdr.what` method fail in checking some JPG images.
`imghdr.what` use the following command to check for JPG by checking the header: `[6:10] in (b'JFIF', b'Exif')`
However, its not always the case as some might - quantized JPG - can start with b'\xff\xd8\xff\xdb' headers and not including `(b'JFIF', b'Exif')` as described in the references [here](https://www.digicamsoft.com/itu/itu-t81-36.html) and [here](https://web.archive.org/web/20120403212223/http://class.ee.iastate.edu/ee528/Reading%20material/JPEG_File_Format.pdf).

PIL can be an alternative for that issue until imghdr bug is fixed [issue](python/cpython#26964 (comment))
mohamadmansourX added a commit to mohamadmansourX/PaddleOCR that referenced this pull request Jun 30, 2021
`imghdr.what` method fail in checking some JPG images.
`imghdr.what` uses the following command to check for JPG by checking the header: `[6:10] in (b'JFIF', b'Exif')`
However, its not always the case as some might - quantized JPG - can start with b'\xff\xd8\xff\xdb' headers and not including `(b'JFIF', b'Exif')` as described in the references [here](https://www.digicamsoft.com/itu/itu-t81-36.html) and [here](https://web.archive.org/web/20120403212223/http://class.ee.iastate.edu/ee528/Reading%20material/JPEG_File_Format.pdf).

PIL can be an alternative for that issue until imghdr bug is fixed [issue](python/cpython#26964 (comment))
@fbidu
Copy link
Contributor

fbidu commented Jul 3, 2021

Thanks for the PR, @mohamadmansourX. I was able to test this PR on a Linux Mint 20.1 x86_64 machine.

  1. The supplied sample image when analyzed through the file command returns JPEG image data, baseline, precision 8, 500x332, components 3
  2. Testing the current code for imghdr.what with the supplied sample image returns None.
  3. After applying the patch, imghdr.what positively ids the image, returning jpeg

It would be nice to have a test case for this over Lib/test/test_imghdr.py. I see there are some tests there that involve only passing a hardcoded header and others that load images from Lib/test/imghdrdata. I think that at least a test with the harcoded header is welcome.

Also, please see the message about the CLA. There's also something wrong with the news entry but I couldn't figure that out

# Don't start with "- Issue #<n>: " or "- bpo-<n>: " or that sort of stuff.
###########################################################################


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove all the commented lines?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@merwok do you have any clues regarding the doc error? I tried removing the commented-out lines and running the sphinx toolchain locally but it all broke in the same way :/

Co-authored-by: Éric Araujo <[email protected]>
@merwok
Copy link
Member

merwok commented Jul 12, 2021

It seems that some issue with the blurb filename hits a bug in blurb:

https://app.travis-ci.com/github/python/cpython/jobs/523615771

 File "/home/travis/virtualenv/python3.6.10/lib/python3.6/site-packages/blurb.py", line 486, in throw
    raise BlurbError(f("Error in {filename}:{line_number}:\n{s}"))
  File "/home/travis/virtualenv/python3.6.10/lib/python3.6/site-packages/blurb.py", line 138, in f
    return s.format_map(d)
KeyError: 'filename'

@merwok
Copy link
Member

merwok commented Jul 12, 2021

It’s known and fixed: python/core-workflow#386

Maybe blurb-it needs an update.

@ambv ambv changed the title bpo-44539: Imghdr JPG Quantized case added bpo-44539: Support recognizing JPEG files without JFIF or Exif markers Jul 20, 2021
@ambv ambv merged commit 3b56b3b into python:main Jul 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants