-
Notifications
You must be signed in to change notification settings - Fork 82
Description
It seems JHOVE 1.34 has unhandled exceptions when batch-processing PDFs where JHOVE 1.30 didn't. What might be the reason for this?
I ran JHOVE CLI 1.34 against the pdfCabinetOfHorrors with
/home/remco/preservation/jhove/jhove -m PDF-hul ~/preservation/format-corpus-master/pdfCabinetOfHorrors/*.pdf
The output contains several exception messages like below.
This results in Unknown statuses in RepresentationInformation information:
RepresentationInformation: /home/remco/preservation/format-corpus-master/pdfCabinetOfHorrors/webCapture.pdf
ReportingModule: PDF-hul, Rel. 1.12.8 (2025-03-12)
LastModified: 2023-05-10 12:53:42 CEST
Size: 213342
Format: PDF
Status: Unknown
SignatureMatches:
PDF-hul
ErrorMessage: Validation ended prematurely due to an unhandled exception.
ID: JHOVE-CORE-5
InfoLink: https://github.com/openpreserve/jhove/wiki/PDF-hul-Messages#jhove-core-5
MIMEtype: application/pdf
When I do the same with JHOVE 1.30 I don't get these unhandled exception messages.
What could be the reason for this? (And is it a step forward or backward that this now happens in 1.34 and not in 1.30?)
The issue seems to happen for these documents:
file:////home/remco/preservation/format-corpus-master/pdfCabinetOfHorrors/embedded_video_quicktime.pdf
file:////home/remco/preservation/format-corpus-master/pdfCabinetOfHorrors/externalLink.pdf
file:////home/remco/preservation/format-corpus-master/pdfCabinetOfHorrors/test_fontArialNotEmbedded.pdf
file:////home/remco/preservation/format-corpus-master/pdfCabinetOfHorrors/text_only_fontsEmbeddedSubset.pdf
file:////home/remco/preservation/format-corpus-master/pdfCabinetOfHorrors/text_only_fontsNotEmbedded.pdf
file:////home/remco/preservation/format-corpus-master/pdfCabinetOfHorrors/text_only_fontsEmbeddedAll.pdf
file:////home/remco/preservation/format-corpus-master/pdfCabinetOfHorrors/webCapture.pdf
And the unhandled exception messages are:
SEVERE: Validation ended prematurely due to an unhandled exception.
java.lang.NullPointerException: Cannot invoke "edu.harvard.hul.ois.jhove.module.pdf.PageTreeNode.startWalk()" because "this._docTreeRoot" is null
at edu.harvard.hul.ois.jhove.module.PdfModule.checkPageTextStreams(PdfModule.java:3521)
at edu.harvard.hul.ois.jhove.module.PdfModule.parse(PdfModule.java:868)
at edu.harvard.hul.ois.jhove.JhoveBase.processFile(JhoveBase.java:831)
at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:603)
at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(JhoveBase.java:479)
at edu.harvard.hul.ois.jhove.Jhove.main(Jhove.java:258)
jul 07, 2025 6:30:18 P.M. edu.harvard.hul.ois.jhove.JhoveBase process
SEVERE: Validation ended prematurely due to an unhandled exception.
java.lang.NullPointerException: Cannot invoke "edu.harvard.hul.ois.jhove.module.pdf.PageTreeNode.startWalk()" because "this._docTreeRoot" is null
at edu.harvard.hul.ois.jhove.module.PdfModule.checkPageTextStreams(PdfModule.java:3521)
at edu.harvard.hul.ois.jhove.module.PdfModule.parse(PdfModule.java:868)
at edu.harvard.hul.ois.jhove.JhoveBase.processFile(JhoveBase.java:831)
at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:603)
at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(JhoveBase.java:479)
at edu.harvard.hul.ois.jhove.Jhove.main(Jhove.java:258)
jul 07, 2025 6:30:18 P.M. edu.harvard.hul.ois.jhove.JhoveBase process
SEVERE: Validation ended prematurely due to an unhandled exception.
java.lang.NullPointerException: Cannot invoke "edu.harvard.hul.ois.jhove.module.pdf.PageTreeNode.startWalk()" because "this._docTreeRoot" is null
at edu.harvard.hul.ois.jhove.module.PdfModule.checkPageTextStreams(PdfModule.java:3521)
at edu.harvard.hul.ois.jhove.module.PdfModule.parse(PdfModule.java:868)
at edu.harvard.hul.ois.jhove.JhoveBase.processFile(JhoveBase.java:831)
at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:603)
at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(JhoveBase.java:479)
at edu.harvard.hul.ois.jhove.Jhove.main(Jhove.java:258)
jul 07, 2025 6:30:18 P.M. edu.harvard.hul.ois.jhove.JhoveBase process
SEVERE: Validation ended prematurely due to an unhandled exception.
java.lang.NullPointerException: Cannot invoke "edu.harvard.hul.ois.jhove.module.pdf.PageTreeNode.startWalk()" because "this._docTreeRoot" is null
at edu.harvard.hul.ois.jhove.module.PdfModule.checkPageTextStreams(PdfModule.java:3521)
at edu.harvard.hul.ois.jhove.module.PdfModule.parse(PdfModule.java:868)
at edu.harvard.hul.ois.jhove.JhoveBase.processFile(JhoveBase.java:831)
at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:603)
at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(JhoveBase.java:479)
at edu.harvard.hul.ois.jhove.Jhove.main(Jhove.java:258)
jul 07, 2025 6:30:18 P.M. edu.harvard.hul.ois.jhove.JhoveBase process
SEVERE: Validation ended prematurely due to an unhandled exception.
java.lang.NullPointerException: Cannot invoke "edu.harvard.hul.ois.jhove.module.pdf.PageTreeNode.startWalk()" because "this._docTreeRoot" is null
at edu.harvard.hul.ois.jhove.module.PdfModule.checkPageTextStreams(PdfModule.java:3521)
at edu.harvard.hul.ois.jhove.module.PdfModule.parse(PdfModule.java:868)
at edu.harvard.hul.ois.jhove.JhoveBase.processFile(JhoveBase.java:831)
at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:603)
at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(JhoveBase.java:479)
at edu.harvard.hul.ois.jhove.Jhove.main(Jhove.java:258)
jul 07, 2025 6:30:19 P.M. edu.harvard.hul.ois.jhove.JhoveBase process
SEVERE: Validation ended prematurely due to an unhandled exception.
java.lang.NullPointerException: Cannot invoke "edu.harvard.hul.ois.jhove.module.pdf.PageTreeNode.startWalk()" because "this._docTreeRoot" is null
at edu.harvard.hul.ois.jhove.module.PdfModule.checkPageTextStreams(PdfModule.java:3521)
at edu.harvard.hul.ois.jhove.module.PdfModule.parse(PdfModule.java:868)
at edu.harvard.hul.ois.jhove.JhoveBase.processFile(JhoveBase.java:831)
at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:603)
at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(JhoveBase.java:479)
at edu.harvard.hul.ois.jhove.Jhove.main(Jhove.java:258)
jul 07, 2025 6:30:19 P.M. edu.harvard.hul.ois.jhove.JhoveBase process
SEVERE: Validation ended prematurely due to an unhandled exception.
java.lang.NullPointerException: Cannot invoke "edu.harvard.hul.ois.jhove.module.pdf.PageTreeNode.startWalk()" because "this._docTreeRoot" is null
at edu.harvard.hul.ois.jhove.module.PdfModule.checkPageTextStreams(PdfModule.java:3521)
at edu.harvard.hul.ois.jhove.module.PdfModule.parse(PdfModule.java:868)
at edu.harvard.hul.ois.jhove.JhoveBase.processFile(JhoveBase.java:831)
at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:603)
at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(JhoveBase.java:479)
at edu.harvard.hul.ois.jhove.Jhove.main(Jhove.java:258)