-
Notifications
You must be signed in to change notification settings - Fork 1
Introduce standard InChIs for connected components #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| - "array" | ||
| - "null" | ||
| x-optimade-dimensions: | ||
| names: ["dim__cheminfo_components_stdinchi"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably can be shortened, as OPTIMADE specification does not seem to impose any restrictions on names of dimensions.
| The InChI identifier is defined by the InChI Trust (https://www.inchi-trust.org). | ||
| Every connected component in the structure MUST be represented by a separate InChI string in the list. | ||
| Values MUST start with `InChI=` and MUST be unique. | ||
| examples: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the somehow add explanations for the examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no structured way to do so, but other properties provide the explanations in the description text. I will follow that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some brief text on examples that describe our intention would be nice too, e.g., co-crystals
|
One other thought - should there be equivalent _cheminfo_components_stdinchikey and _cheminfo_components_stdinchikey_counts too for consistency? |
Excellent point - let's first get the current PR ready and merged, and then I can base the InChIKey one on this! |
|
@vaitkus has pointed it out that |
Not sure which of those is best but is |
|
I agree that _counts is maybe not that clear. I would have preferred ratios (shorter, easier to spell, more obvious to non-chemists) but just have a slight doubt that _ratios may be confused with elements_ratios which are floats and sum to 1 unlike these ratios which are integers and don't sum to 1. So I might be inclined to go for stoichiometry - cocrystal molar stoichiometry seems to be a well accepted thing. Maybe it would be more obvious if the field ended "component_stoichiometry" (or "component_ratios") so if it were called _cheminfo_stdinchikey_components_stoichiometry rather than _cheminfo_components_stdinchikey_stoichiometry (and correspondingly _cheminfo_stdinchikey_components rather than _cheminfo_components_stdinchikey)? I'm not sure about multipliers - it seems less well used in this context... |
|
Looking at Wikipedia and IUPAC pages on stoichiometry, ratios are referred to a lot. A ratio is a part of the stoichiometry, perhaps even integral to defining it. I thus include towards
(although I'm not sure if it should be ratio singular) Note on whether component should come before or after, my concern with e.g. |
|
Regarding Regarding singular vs. plural, I think in OPTIMADE list-valued properties are usually named plural. Not sure this is the rule, but I am just following the precedent. |
|
Happy to go with ratios if Ian and everyone thinks this is best and yes I take your point about putting components before inchi. |
|
I disliked I like |
ml-evs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the discussion here, happy to go with the consensus on either ratio/stoichiometry. I don't think its particularly confusing wrt. elements_ratios, we should just choose the best name for our namespace. As @merkys said the main thing that should decide it is whether we allow querying on this field; as most implementations do not implement the correlated list queries required for elements_ratios anyway, I think we can safely leave it as a "MAY" without designing around it.
Just one further comment below
| description: |- | ||
| The standard InChI identifiers for the connected components of the structure. | ||
| The InChI identifier is defined by the InChI Trust (https://www.inchi-trust.org). | ||
| Every connected component in the structure MUST be represented by a separate InChI string in the list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to be clear that the components MUST NOT (or MAY) overlap, and whether parts of the structure may not be described at all in these components?
e.g., imagining a single caffeine molecule in a (perhaps unimportant or complicated) solvent, am I allowed to just list caffeine as the component inchi?
01c9b8a to
10c3ab4
Compare
This PR introduces two properties
_cheminfo_components_stdinchiand_cheminfo_components_stdinchi_countsto communicate the standard InChIs for all connected components in a structure.