Skip to content

Exceptions when JHOVE 1.34 batch-processes pdfCabinetOfHorrors, but not in 1.30 #1057

@RvanVeenendaal

Description

@RvanVeenendaal

It seems JHOVE 1.34 has unhandled exceptions when batch-processing PDFs where JHOVE 1.30 didn't. What might be the reason for this?

I ran JHOVE CLI 1.34 against the pdfCabinetOfHorrors with
/home/remco/preservation/jhove/jhove -m PDF-hul ~/preservation/format-corpus-master/pdfCabinetOfHorrors/*.pdf
The output contains several exception messages like below.

This results in Unknown statuses in RepresentationInformation information:

 RepresentationInformation: /home/remco/preservation/format-corpus-master/pdfCabinetOfHorrors/webCapture.pdf
  ReportingModule: PDF-hul, Rel. 1.12.8 (2025-03-12)
  LastModified: 2023-05-10 12:53:42 CEST
  Size: 213342
  Format: PDF
  Status: Unknown
  SignatureMatches:
   PDF-hul
  ErrorMessage: Validation ended prematurely due to an unhandled exception.
   ID: JHOVE-CORE-5
   InfoLink: https://github.com/openpreserve/jhove/wiki/PDF-hul-Messages#jhove-core-5
  MIMEtype: application/pdf

When I do the same with JHOVE 1.30 I don't get these unhandled exception messages.

What could be the reason for this? (And is it a step forward or backward that this now happens in 1.34 and not in 1.30?)

The issue seems to happen for these documents:
file:////home/remco/preservation/format-corpus-master/pdfCabinetOfHorrors/embedded_video_quicktime.pdf
file:////home/remco/preservation/format-corpus-master/pdfCabinetOfHorrors/externalLink.pdf
file:////home/remco/preservation/format-corpus-master/pdfCabinetOfHorrors/test_fontArialNotEmbedded.pdf
file:////home/remco/preservation/format-corpus-master/pdfCabinetOfHorrors/text_only_fontsEmbeddedSubset.pdf
file:////home/remco/preservation/format-corpus-master/pdfCabinetOfHorrors/text_only_fontsNotEmbedded.pdf
file:////home/remco/preservation/format-corpus-master/pdfCabinetOfHorrors/text_only_fontsEmbeddedAll.pdf
file:////home/remco/preservation/format-corpus-master/pdfCabinetOfHorrors/webCapture.pdf

And the unhandled exception messages are:

SEVERE: Validation ended prematurely due to an unhandled exception.
java.lang.NullPointerException: Cannot invoke "edu.harvard.hul.ois.jhove.module.pdf.PageTreeNode.startWalk()" because "this._docTreeRoot" is null
	at edu.harvard.hul.ois.jhove.module.PdfModule.checkPageTextStreams(PdfModule.java:3521)
	at edu.harvard.hul.ois.jhove.module.PdfModule.parse(PdfModule.java:868)
	at edu.harvard.hul.ois.jhove.JhoveBase.processFile(JhoveBase.java:831)
	at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:603)
	at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(JhoveBase.java:479)
	at edu.harvard.hul.ois.jhove.Jhove.main(Jhove.java:258)

jul 07, 2025 6:30:18 P.M. edu.harvard.hul.ois.jhove.JhoveBase process
SEVERE: Validation ended prematurely due to an unhandled exception.
java.lang.NullPointerException: Cannot invoke "edu.harvard.hul.ois.jhove.module.pdf.PageTreeNode.startWalk()" because "this._docTreeRoot" is null
	at edu.harvard.hul.ois.jhove.module.PdfModule.checkPageTextStreams(PdfModule.java:3521)
	at edu.harvard.hul.ois.jhove.module.PdfModule.parse(PdfModule.java:868)
	at edu.harvard.hul.ois.jhove.JhoveBase.processFile(JhoveBase.java:831)
	at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:603)
	at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(JhoveBase.java:479)
	at edu.harvard.hul.ois.jhove.Jhove.main(Jhove.java:258)

jul 07, 2025 6:30:18 P.M. edu.harvard.hul.ois.jhove.JhoveBase process
SEVERE: Validation ended prematurely due to an unhandled exception.
java.lang.NullPointerException: Cannot invoke "edu.harvard.hul.ois.jhove.module.pdf.PageTreeNode.startWalk()" because "this._docTreeRoot" is null
	at edu.harvard.hul.ois.jhove.module.PdfModule.checkPageTextStreams(PdfModule.java:3521)
	at edu.harvard.hul.ois.jhove.module.PdfModule.parse(PdfModule.java:868)
	at edu.harvard.hul.ois.jhove.JhoveBase.processFile(JhoveBase.java:831)
	at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:603)
	at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(JhoveBase.java:479)
	at edu.harvard.hul.ois.jhove.Jhove.main(Jhove.java:258)

jul 07, 2025 6:30:18 P.M. edu.harvard.hul.ois.jhove.JhoveBase process
SEVERE: Validation ended prematurely due to an unhandled exception.
java.lang.NullPointerException: Cannot invoke "edu.harvard.hul.ois.jhove.module.pdf.PageTreeNode.startWalk()" because "this._docTreeRoot" is null
	at edu.harvard.hul.ois.jhove.module.PdfModule.checkPageTextStreams(PdfModule.java:3521)
	at edu.harvard.hul.ois.jhove.module.PdfModule.parse(PdfModule.java:868)
	at edu.harvard.hul.ois.jhove.JhoveBase.processFile(JhoveBase.java:831)
	at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:603)
	at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(JhoveBase.java:479)
	at edu.harvard.hul.ois.jhove.Jhove.main(Jhove.java:258)

jul 07, 2025 6:30:18 P.M. edu.harvard.hul.ois.jhove.JhoveBase process
SEVERE: Validation ended prematurely due to an unhandled exception.
java.lang.NullPointerException: Cannot invoke "edu.harvard.hul.ois.jhove.module.pdf.PageTreeNode.startWalk()" because "this._docTreeRoot" is null
	at edu.harvard.hul.ois.jhove.module.PdfModule.checkPageTextStreams(PdfModule.java:3521)
	at edu.harvard.hul.ois.jhove.module.PdfModule.parse(PdfModule.java:868)
	at edu.harvard.hul.ois.jhove.JhoveBase.processFile(JhoveBase.java:831)
	at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:603)
	at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(JhoveBase.java:479)
	at edu.harvard.hul.ois.jhove.Jhove.main(Jhove.java:258)

jul 07, 2025 6:30:19 P.M. edu.harvard.hul.ois.jhove.JhoveBase process
SEVERE: Validation ended prematurely due to an unhandled exception.
java.lang.NullPointerException: Cannot invoke "edu.harvard.hul.ois.jhove.module.pdf.PageTreeNode.startWalk()" because "this._docTreeRoot" is null
	at edu.harvard.hul.ois.jhove.module.PdfModule.checkPageTextStreams(PdfModule.java:3521)
	at edu.harvard.hul.ois.jhove.module.PdfModule.parse(PdfModule.java:868)
	at edu.harvard.hul.ois.jhove.JhoveBase.processFile(JhoveBase.java:831)
	at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:603)
	at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(JhoveBase.java:479)
	at edu.harvard.hul.ois.jhove.Jhove.main(Jhove.java:258)

jul 07, 2025 6:30:19 P.M. edu.harvard.hul.ois.jhove.JhoveBase process
SEVERE: Validation ended prematurely due to an unhandled exception.
java.lang.NullPointerException: Cannot invoke "edu.harvard.hul.ois.jhove.module.pdf.PageTreeNode.startWalk()" because "this._docTreeRoot" is null
	at edu.harvard.hul.ois.jhove.module.PdfModule.checkPageTextStreams(PdfModule.java:3521)
	at edu.harvard.hul.ois.jhove.module.PdfModule.parse(PdfModule.java:868)
	at edu.harvard.hul.ois.jhove.JhoveBase.processFile(JhoveBase.java:831)
	at edu.harvard.hul.ois.jhove.JhoveBase.process(JhoveBase.java:603)
	at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(JhoveBase.java:479)
	at edu.harvard.hul.ois.jhove.Jhove.main(Jhove.java:258)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions