-
Notifications
You must be signed in to change notification settings - Fork 776
Add unit tests for new HTML image attributes (data-src, data-full-src, data-lazy-srcset, srcset) Fixes Issue #689 #692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ato
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this might be missing an import:
Error: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.14.1:testCompile (default-testCompile) on project heritrix-modules: Compilation failure: Compilation failure:
Error: /home/runner/work/heritrix3/heritrix3/modules/src/test/java/org/archive/modules/extractor/ExtractorHTMLTest.java:[685,48] cannot find symbol
Error: symbol: class IOException
Error: location: class org.archive.modules.extractor.ExtractorHTMLTest
Error: /home/runner/work/heritrix3/heritrix3/modules/src/test/java/org/archive/modules/extractor/ExtractorHTMLTest.java:[693,52] cannot find symbol
Error: symbol: class IOException
Error: location: class org.archive.modules.extractor.ExtractorHTMLTest
Error: /home/runner/work/heritrix3/heritrix3/modules/src/test/java/org/archive/modules/extractor/ExtractorHTMLTest.java:[701,55] cannot find symbol
Error: symbol: class IOException
Error: location: class org.archive.modules.extractor.ExtractorHTMLTest
Error: /home/runner/work/heritrix3/heritrix3/modules/src/test/java/org/archive/modules/extractor/ExtractorHTMLTest.java:[710,57] cannot find symbol
Error: symbol: class IOException
Error: location: class org.archive.modules.extractor.ExtractorHTMLTest
|
I'm a bit confused by this PR, it almost seems corrupted in some way. There seems to be a lot of unrelated commits with identical messages which aren't showing in the full diff and GitHub is still showing it as having test failures. I'm not going to risk merging it in this state, so if you'd like this change merged, please open a new clean PR with just the intended change. :-) |
Overview Fixes #689
This expands the HTML parser test suite by adding new unit tests for modern
attributes commonly used for lazy loading and responsive images. These attributes are widely adopted across the web, and ensuring Heritrix extracts URLs correctly is essential for consistent crawling.
What’s Included
This update adds dedicated tests for URL extraction from:
data-srcdata-full-srcdata-lazy-srcsetsrcset(additional coverage)Each test verifies that Heritrix’s ExtractorHTML module correctly identifies and normalizes URLs from these attributes.
Related Issue
Fixes #689