Skip to content

Conversation

@fukudasjp
Copy link
Contributor

@fukudasjp fukudasjp commented Mar 28, 2022

What do these changes do/fix?

This PR improve PDF highlighting.

  1. 9b82647 In some PDF, highlight location was wrong like following. In the image, the highlighted 10h00 should be right below the 18h00, but it isn't. This is because I improved highlighting by giving priority to longer matches in fix: fix and improve PDF highlighting #313.

screenshot 2022-03-28 22 28 49

The text box mapping logic manages text positions where text fragments are mapped recently. The positions are used to generate longer chunk of matches. But with the improvement, the mapping process is repeated. So, the logic tried to generate longer chunk based on the position information form the previous iteration. It made the wrong highlight location.

So, I fixed the logic to reset the position information after each iteration.
screenshot 2022-03-28 22 30 46

  1. a0a3324 Sort TextContentItems from PDF by the order of bboxes in HTML.

In the mapping, longer match between field text and text form PDF makes highlight more accurate. But in some PDF, its text content items may not be ordered by text order. It produces many small mappings instead of a long mapping. So, this is to reorder the text content items using the text order in HTML field.

How do you test/verify these changes?

  • Test case added for 1
  • Use Storybook to check if the change doesn't break the PDF highlighting

Have you documented your changes (if necessary)?

Are there any breaking changes included in this pull request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants