-
Notifications
You must be signed in to change notification settings - Fork 674
Rework PNG metadata extraction #2653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
a0e290d to
0275027
Compare
This change implements gaps we observed in the current Metadata extraction for PNG. There are multiple significant changes: - For all metadata types, also search for metadata in the zTXt, tEXT and iTXt chunk. - If we find metadata in the zTXt chunk, handle it like ImageMagick metadata and interpret it as a hex string prefixed by a header, that is separated by newlines. We did significant testing on these methods internally and ran it against all test files we could find. We also ran it against all test images in libpng and pngsuite, and so far did not find a single image that had metadata in the zTXt chunk that was not compressed like the ImageMagick hex string.
197g
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it, the internal test data sounds encouraging. Regarding images, what's the status of attribution to include them here?
| 56, 66, 73, 77, 4, 4, 0, 0, 0, 0, 0, 49, 28, 2, 110, 0, 24, 65, 73, 45, 71, 101, 110, 101, | ||
| 114, 97, 116, 101, 100, 32, 119, 105, 116, 104, 32, 71, 111, 111, 103, 108, 101, 28, 2, 90, | ||
| 0, 8, 75, 105, 110, 103, 115, 116, 111, 110, 28, 2, 0, 0, 2, 0, 4, 0, 56, 66, 73, 77, 4, | ||
| 37, 0, 0, 0, 0, 0, 16, 67, 89, 196, 70, 206, 234, 16, 4, 50, 89, 230, 125, 147, 191, 230, | ||
| 81, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this changed as a result of the hex decoding? At least the original test side looks like mostly hex digits from a glance.
| let parse = || { | ||
| let mut parts = buffer.splitn(4, '\n'); | ||
|
|
||
| // Skip the first two parts, grab the size (3rd part), then the body. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We're now dropping the contents of the second part, what would be contained in it? In the example it only says Optional but if I understand the imagemagick source for it it is any ascii whitespace padding until a non-ws character.
| /// | ||
| /// This method parses such data and returns the decoded payload. | ||
| /// If the structure doesn't match, we return and empty `Vec<u8>`. | ||
| fn parse_raw_profile(buffer: &str, header: Option<&str>) -> Vec<u8> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something that would make sense to expose as a public method from the png crate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, that would really make sense! It should be a method on ZTXtChunk. I will open a PR shortly to add that.
| let size_str = parts.nth(2)?; | ||
| let body = parts.next()?; | ||
|
|
||
| // Parse size and validate a 4MB limit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this 4MB come from? Is it from the underlying standards, or something you're adding to avoid running out of memory?
This change implements gaps we observed in the current Metadata extraction for PNG. There are multiple significant changes:
We did significant testing on these methods internally and ran it against all test files we could find. We also ran it against all test images in libpng and pngsuite, and so far did not find a single image that had metadata in the zTXt chunk that was not compressed like the ImageMagick hex string.