Skip to content

Conversation

@ccancellieri
Copy link
Contributor

@ccancellieri ccancellieri commented Oct 22, 2021

This will close #258 adding support to an additional param into the csw json config:

"output_schema": "mdb"

mdb is the namespace of the schema to use (in this case it's an iso19115-3.2018)

{'mdb':'http://standards.iso.org/iso/19115/-3/mdb/2.0'}

Full Example below:

{
"user":"ckan_admin",
"cql": "dc:identifier = '0-----292--------------------------'",
"output_schema": "mdb",
"default_tags": [ ],
"default_extras": {},
"group_mapping": {},
"read_only": false
}

Doing this the CSW harvester will receive the metadata in the configured outputschema (must be supported by the target csw server).

@ccancellieri
Copy link
Contributor Author

Can also help
#209
#210
#219

Copy link
Member

@amercader amercader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good and useful @ccancellieri. I just added some minor comments



# load config
self._set_source_config(harvest_object.source.config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document the new output_schema option and its default value in here so others are aware of it?

https://github.com/ckan/ckanext-spatial/blob/master/doc/harvesters.rst

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added fallback to default in case the server is not supporting iso19139 -> 19115 transformation
the fallback will log and switch back to default asking for iso19139 -> iso19139.

self.sortby = SortBy([SortProperty('dc:identifier')])
# check capabilities
_cap = self.getcapabilities(endpoint)['response']
self.capabilities=etree.ElementTree(etree.fromstring(_cap))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to follow PEP8 guidelines, specially spacing between = and , :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I can't validate the whole project and my code editor is not helping me, good catch, I'll try to fix my bad.

csw = self._ows(**kw)

# fetch target csw server capabilities for requested output schema
output_schemas=self._get_output_schemas('GetRecords')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this call to the __init__() method to avoid duplication and multiple calls to GetCapabilities?
Something like:

def __init__(self, endpoint=None):
    _cap = self.getcapabilities(endpoint)['response']
    self.capabilities = etree.ElementTree(etree.fromstring(_cap))
    self.output_schemas = {
        'GetRecords': self._get_output_schemas('GetRecords'),
        'GetRecordById': self._get_output_schemas('GetRecordById'),
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# fetch target csw server capabilities for requested output schema
output_schemas=output_schemas = self.output_schemas['GetRecordById']
if not output_schemas.get(outputschema):
raise CswError('Output schema \'{}\' not supported by target server: '.format(output_schemas))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably here I should be more tolerant Logging ERROR and returning.

@frafra
Copy link
Contributor

frafra commented Feb 18, 2022

This is great :) Do you need any help with this PR?

@frafra
Copy link
Contributor

frafra commented Mar 1, 2022

I get this generic error after applying this PR (rebased on master) : Error contacting the CSW server: can only parse strings. I think there is a problem with the changes made to the __init__ function of CswService.

@ccancellieri
Copy link
Contributor Author

Ciao @frafra thanks to look into this.
I think something bad could happen here:

record = self._xmd(etree.fromstring(csw.response))

Would you be able to check the response provided by the server?

I'm apologize but I'm not using this plugin anymore, I changed approach, so my help can be very limited on this.

@frafra
Copy link
Contributor

frafra commented Mar 4, 2022

@ccancellieri I think you are right, I will look into that.
Which approach have you taken, If I may ask? I am interested into harvesting data from GeoNetwork too.

@ccancellieri
Copy link
Contributor Author

ccancellieri commented Mar 7, 2022 via email

@frafra
Copy link
Contributor

frafra commented Mar 8, 2022

markstuart added a commit to data-govt-nz/ckanext-spatial that referenced this pull request Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Using CSW harvester OutputSchema is ignored while gmd is imposed

3 participants