Skip to content

Fido identifying some XLSX, PPTX, and DOCX as fido-fmt/{x} #152

@ross-spencer

Description

@ross-spencer

Dev Effort

1D

Description

Via @sromkey the MS-Office Open XML files in this Archivematica test data zip are being identified as fido-fmt/{x} in Fido:

ross-spencer@artefactual:~/git/artefactual-labs/am/src/archivematica-sampledata/SampleTransfers/OfficeDocsExtracted/objects$ fido *
FIDO v1.3.12 (formats-v94.xml, container-signature-20180920.xml, format_extensions.xml)"
OK,14,fido-fmt/189.ppt,"Microsoft Office Open XML - Powerpoint","Microsoft Office Open XML - Powerpoint",47215,"MS-OfficeOpenXML-samples/samplepptx.pptx","None","signature"
OK,10,fido-fmt/189.word,"Microsoft Office Open XML - Word","Microsoft Office Open XML - Word",14860,"MS-OfficeOpenXML-samples/sampledocx.docx","None","signature"
OK,11,fido-fmt/189.xl,"Microsoft Office Open XML - Excel","Microsoft Office Open XML - Excel",12050,"MS-OfficeOpenXML-samples/samplexlsx.xlsx","None","signature"
FIDO: Processed      9 files in 343.28 msec, 26 files/sec

If the fido-fmt{x} entries are removed as per here: #36 (comment) then the closest match seems to be generic OOXML:

ross-spencer@artefactual:~/Desktop/temp/ndsa/office-samples-and-skeletons/samples$ fido *
FIDO v1.3.12 (formats-v94.xml, container-signature-20180920.xml, format_extensions.xml)
OK,150,fmt/189,"Microsoft Office Open XML","Microsoft Office Open XML",14860,"sampledocx.docx","None","signature"
OK,8,fmt/189,"Microsoft Office Open XML","Microsoft Office Open XML",47215,"samplepptx.pptx","None","signature"
OK,9,fmt/189,"Microsoft Office Open XML","Microsoft Office Open XML",12050,"samplexlsx.xlsx","None","signature"
FIDO: Processed      3 files in 206.92 msec, 14 files/sec

Unfortunately the Skeleton Suite looks like it won't help debug here as the extracted samples (three per puid) all identify correctly.

I have extracted the samples and the skeleton files here for easy access.

NB. Also noted by Sarah is that Siegfried will identify the formats correctly:

ross-spencer@artefactual:~/git/artefactual-labs/am/src/archivematica-sampledata/SampleTransfers/OfficeDocsExtracted/objects$ sf *
---
siegfried   : 1.7.11
scandate    : 2019-02-24T12:22:11+01:00
signature   : default.sig
created     : 2019-02-16T11:10:03+01:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V94.xml; container-signature-20180917.xml'
---
filename : 'MS-OfficeOpenXML-samples/sampledocx.docx'
filesize : 14860
modified : 2007-08-14T23:29:00+02:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/412'
    format  : 'Microsoft Word for Windows'
    version : '2007 onwards'
    mime    : 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
    basis   : 'extension match docx; container name [Content_Types].xml with byte match at 460, 94 (signature 1/3)'
    warning : 
---
filename : 'MS-OfficeOpenXML-samples/samplepptx.pptx'
filesize : 47215
modified : 2007-08-14T23:51:16+02:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/215'
    format  : 'Microsoft Powerpoint for Windows'
    version : '2007 onwards'
    mime    : 'application/vnd.openxmlformats-officedocument.presentationml.presentation'
    basis   : 'extension match pptx; container name [Content_Types].xml with byte match at 2326, 96 (signature 1/3)'
    warning : 
---
filename : 'MS-OfficeOpenXML-samples/samplexlsx.xlsx'
filesize : 12050
modified : 2007-08-14T23:50:24+02:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/214'
    format  : 'Microsoft Excel for Windows'
    version : '2007 onwards'
    mime    : 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
    basis   : 'extension match xlsx; container name [Content_Types].xml with byte match at 676, 88 (signature 1/3)'
    warning : 
---

Metadata

Metadata

Labels

P1High priority issues to be scheduled in the upcoming releasebugA product defect that needs fixing

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions