-
Notifications
You must be signed in to change notification settings - Fork 47
Open
Labels
P1High priority issues to be scheduled in the upcoming releaseHigh priority issues to be scheduled in the upcoming releasebugA product defect that needs fixingA product defect that needs fixing
Milestone
Description
Dev Effort
1D
Description
Via @sromkey the MS-Office Open XML files in this Archivematica test data zip are being identified as fido-fmt/{x} in Fido:
ross-spencer@artefactual:~/git/artefactual-labs/am/src/archivematica-sampledata/SampleTransfers/OfficeDocsExtracted/objects$ fido *
FIDO v1.3.12 (formats-v94.xml, container-signature-20180920.xml, format_extensions.xml)"
OK,14,fido-fmt/189.ppt,"Microsoft Office Open XML - Powerpoint","Microsoft Office Open XML - Powerpoint",47215,"MS-OfficeOpenXML-samples/samplepptx.pptx","None","signature"
OK,10,fido-fmt/189.word,"Microsoft Office Open XML - Word","Microsoft Office Open XML - Word",14860,"MS-OfficeOpenXML-samples/sampledocx.docx","None","signature"
OK,11,fido-fmt/189.xl,"Microsoft Office Open XML - Excel","Microsoft Office Open XML - Excel",12050,"MS-OfficeOpenXML-samples/samplexlsx.xlsx","None","signature"
FIDO: Processed 9 files in 343.28 msec, 26 files/secIf the fido-fmt{x} entries are removed as per here: #36 (comment) then the closest match seems to be generic OOXML:
ross-spencer@artefactual:~/Desktop/temp/ndsa/office-samples-and-skeletons/samples$ fido *
FIDO v1.3.12 (formats-v94.xml, container-signature-20180920.xml, format_extensions.xml)
OK,150,fmt/189,"Microsoft Office Open XML","Microsoft Office Open XML",14860,"sampledocx.docx","None","signature"
OK,8,fmt/189,"Microsoft Office Open XML","Microsoft Office Open XML",47215,"samplepptx.pptx","None","signature"
OK,9,fmt/189,"Microsoft Office Open XML","Microsoft Office Open XML",12050,"samplexlsx.xlsx","None","signature"
FIDO: Processed 3 files in 206.92 msec, 14 files/secUnfortunately the Skeleton Suite looks like it won't help debug here as the extracted samples (three per puid) all identify correctly.
I have extracted the samples and the skeleton files here for easy access.
NB. Also noted by Sarah is that Siegfried will identify the formats correctly:
ross-spencer@artefactual:~/git/artefactual-labs/am/src/archivematica-sampledata/SampleTransfers/OfficeDocsExtracted/objects$ sf *
---
siegfried : 1.7.11
scandate : 2019-02-24T12:22:11+01:00
signature : default.sig
created : 2019-02-16T11:10:03+01:00
identifiers :
- name : 'pronom'
details : 'DROID_SignatureFile_V94.xml; container-signature-20180917.xml'
---
filename : 'MS-OfficeOpenXML-samples/sampledocx.docx'
filesize : 14860
modified : 2007-08-14T23:29:00+02:00
errors :
matches :
- ns : 'pronom'
id : 'fmt/412'
format : 'Microsoft Word for Windows'
version : '2007 onwards'
mime : 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
basis : 'extension match docx; container name [Content_Types].xml with byte match at 460, 94 (signature 1/3)'
warning :
---
filename : 'MS-OfficeOpenXML-samples/samplepptx.pptx'
filesize : 47215
modified : 2007-08-14T23:51:16+02:00
errors :
matches :
- ns : 'pronom'
id : 'fmt/215'
format : 'Microsoft Powerpoint for Windows'
version : '2007 onwards'
mime : 'application/vnd.openxmlformats-officedocument.presentationml.presentation'
basis : 'extension match pptx; container name [Content_Types].xml with byte match at 2326, 96 (signature 1/3)'
warning :
---
filename : 'MS-OfficeOpenXML-samples/samplexlsx.xlsx'
filesize : 12050
modified : 2007-08-14T23:50:24+02:00
errors :
matches :
- ns : 'pronom'
id : 'fmt/214'
format : 'Microsoft Excel for Windows'
version : '2007 onwards'
mime : 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
basis : 'extension match xlsx; container name [Content_Types].xml with byte match at 676, 88 (signature 1/3)'
warning :
---Metadata
Metadata
Assignees
Labels
P1High priority issues to be scheduled in the upcoming releaseHigh priority issues to be scheduled in the upcoming releasebugA product defect that needs fixingA product defect that needs fixing