-
Notifications
You must be signed in to change notification settings - Fork 7
Description
I found many CIF files downloaded from ICSD and COD where not all the elements in the chemical formula are listed in the sites. This gives rise to wrong parsed structures (CifData and StructureData), as only known sites are included.
A warning/error should be issued when a CIF file with possibly missing sites is found.
Similarly, some CIF files may contain elements in the formula that are different in the site list.
Example 1 - missing sites
Parsing the following CIF file (H3 Li1 O2 Lithium hydroxide hydrate):
ICSD_CollCode35156.cif.txt
_chemical_name_systematic 'Lithium hydroxide hydrate'
_chemical_formula_structural 'Li O H H2 O'
_chemical_formula_sum 'H3 Li1 O2'
_chemical_name_structure_type LiOHH2O
[...]
_atom_type_symbol
_atom_type_oxidation_number
Li1+ 1
O2- -2
H1+ 1
loop_
_atom_site_label
_atom_site_type_symbol
_atom_site_symmetry_multiplicity
_atom_site_Wyckoff_symbol
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
_atom_site_B_iso_or_equiv
_atom_site_occupancy
_atom_site_attached_hydrogens
Li1 Li1+ 4 h 0 0.34781(8) 0.5 . 1. 0
O1 O2- 4 i 0.28610(8) 0 0.39545(33) . 1. 0
O2 O2- 4 g 0 0.20685(4) 0 . 1. 0
#End of TTdata_35156-ICSD
returns a StructureData with formula Li2O4 and no hydrogen, because H is not present in the site list (and no attached hydrogens are reported either).
Missing H atoms are the most common ones, but it seems to happen also for some doped structures, where the dopant is not listed in the site list.
Example 2 - missing sites
Similarly (I1 Li1 O3 Lithium iodate(V) - alpha):
ICSD_CollCode20928.cif.txt
_chemical_name_systematic 'Lithium iodate(V) - alpha'
_chemical_formula_structural 'Li (I O3)'
_chemical_formula_sum 'I1 Li1 O3'
[...]
_atom_type_symbol
_atom_type_oxidation_number
Li1+ 1
I5+ 5
O2- -2
loop_
_atom_site_label
_atom_site_type_symbol
_atom_site_symmetry_multiplicity
_atom_site_Wyckoff_symbol
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
_atom_site_B_iso_or_equiv
_atom_site_occupancy
_atom_site_attached_hydrogens
Li1 Li1+ 2 b 0.3333 0.6667 0.9270(67) 1.27(43) 1. 0
#End of TTdata_20928-ICSD
returns a StructureData with formula Li2.
Example 3 - inconsistent elements
In some CIF files the elements in the formula are different from those in the sites.
For example, in the following example Na "becomes" K in the site list:
ICSD_CollCode163048.cif.txt
'Dilithium dipotassium bis(di-t-butyl-bis(dimethylsilyl)oxadiazane)'
_chemical_formula_structural 'Li2 Na2 ((C4 H9)2 N2 (Si (C H3)2)2 O)2'
_chemical_formula_sum 'C24 H60 Li2 N4 Na2 O2 Si4'
[...]
the formula of the StructureData being K4 Li4 Si8 H120 C48 N8 O4.
Straightforward check
A straightforward check is the comparison of the elements of the CIF's chemical formula with the elements in the parsed StructureData.
def check_formulas(cif, structure):
import re
from CifFile import StarError
formula_s = structure.get_formula('count_compact', ' ')
formula_c = None
try:
assert len(cif.values.keys()) == 1, 'More than one CIF key.'
cif_block = cif.values[cif.values.keys()[0]]
for key in ('_chemical_formula_sum',): #, '_chemical_formula_structure', '_chemical_formula'):
if key in cif_block.keys():
formula_c = cif_block[key]
break
except (StarError): # ignore unparsable CIF files
formula_c = c.get_attribute('formulae')
if formula_c is not None:
elements_s = set(s.get_kind_names())
elements_c = set([el for el in re.split("[^a-zA-Z]+", formula_c) if el])
missing_elements = elements_s ^ elements_c # symmetric difference of the sets -- contains elements not present in both sets
if missing_elements:
print(' structure: {} -- cif: {} -- MISSING ELEMENTS: {}'.format(formula_s, formula_c, missing_elements))
A more advanced implementation should also check whether the ratios between the elements of the parsed structure are consistent with the CIF's chemical formula.