Chemical JSON: Difference between revisions
(Use molecular mass rather than just mass) |
No edit summary |
||
(8 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
We are moving development of the Chemical JSON format to a [https://github.com/OpenChemistry/chemicaljson GitHub repository] for better coordination. | |||
The example presented below encodes a chemical molecule in [http://json.org/ JSON]. The intent is be be expressive, and allow for most properties to be optional and encoded as arrays where appropriate. I took a [http://www.xml-cml.org/ CML] file for [https://github.com/OpenChemistry/avogadrodata/blob/master/data/ethane.cml ethane] and attempted to translate it to a JSON representation. This forms the basis of our storage in Mongo DB, as well as an on disk format. Looking at some recent work on memory mapped binary JSON data structures in C++, I am encouraged by the flexibility and efficiency of this representation along with its strong programming language coverage. | |||
A major challenge will be in establishing a list of accepted names, and a convention for adding new names. I think adding in a version number, and having a structure like JSON that is discoverable will make this approachable. | |||
A minimal [https://github.com/OpenChemistry/avogadrodata/blob/master/data/ethane.cjson example of ethane], with just what a computer needs is outlined below, with a C++ implementation of a reader and writer [https://github.com/OpenChemistry/avogadrolibs/blob/master/avogadro/io/cjsonformat.cpp available in Avogadro here]. | |||
<source lang="JavaScript"> | |||
{ | |||
"chemical json": 0, | |||
"name": "ethane", | |||
"inchi": "1/C2H6/c1-2/h1-2H3", | |||
"formula": "C 2 H 6", | |||
"atoms": { | |||
"elements": { | |||
"number": [ 1, 6, 1, 1, 6, 1, 1, 1 ] | |||
}, | |||
"coords": { | |||
"3d": [ 1.185080, -0.003838, 0.987524, | |||
0.751621, -0.022441, -0.020839, | |||
1.166929, 0.833015, -0.569312, | |||
1.115519, -0.932892, -0.514525, | |||
-0.751587, 0.022496, 0.020891, | |||
-1.166882, -0.833372, 0.568699, | |||
-1.115691, 0.932608, 0.515082, | |||
-1.184988, 0.004424, -0.987522 ] | |||
} | |||
}, | |||
"bonds": { | |||
"connections": { | |||
"index": [ 0, 1, | |||
1, 2, | |||
1, 3, | |||
1, 4, | |||
4, 5, | |||
4, 6, | |||
4, 7 ] | |||
}, | |||
"order": [ 1, 1, 1, 1, 1, 1, 1 ] | |||
}, | |||
"properties": { | |||
"molecular mass": 30.0690, | |||
"melting point": -172, | |||
"boiling point": -88 | |||
} | |||
} | |||
</source> | |||
Crystal structures are specified using cell parameters and fractional (lattice) coordinates: | |||
<source lang="JavaScript"> | |||
{ | |||
"chemical json": 0, | |||
"name": "TiO2 rutile", | |||
"formula": "Ti 2 O 4", | |||
"unit cell": { | |||
"a": 2.95812, | |||
"b": 4.59373, | |||
"c": 4.59373, | |||
"alpha": 90.0, | |||
"beta": 90.0, | |||
"gamma": 90.0 | |||
}, | |||
"atoms": { | |||
"elements": { | |||
"number": [ 22, 22, 8, 8, 8, 8 ] | |||
}, | |||
"coords": { | |||
"3d fractional": [ 0.00000, 0.00000, 0.00000, | |||
0.50000, 0.50000, 0.50000, | |||
0.00000, 0.30530, 0.30530, | |||
0.00000, 0.69470, 0.69470, | |||
0.50000, 0.19470, 0.80530, | |||
0.50000, 0.80530, 0.19470 ] | |||
} | |||
} | |||
} | |||
</source> | |||
==Earlier revisions== | |||
Some more verbose representations, with additional fields that are not necessarily required are presented below (and formed part of the initial design). The first iteration was not as nested for example, | |||
<source lang="JavaScript"> | <source lang="JavaScript"> | ||
Line 6: | Line 88: | ||
"name": "ethane", | "name": "ethane", | ||
"inchi": "1/C2H6/c1-2/h1-2H3", | "inchi": "1/C2H6/c1-2/h1-2H3", | ||
"inchikey": "WETWJCDKMRHUPV-UHFFFAOYSA-N" | |||
"formula": { | "formula": { | ||
"concise": "C 2 H 6" | "concise": "C 2 H 6" | ||
Line 49: | Line 132: | ||
</source> | </source> | ||
The JSON above validates [http://jsonlint.com/ here], and has a few extra arrays that I think are only really necessary for human readers (atom string IDs, element type rather than number). An alternative, with perhaps a little more standardization in the naming and some extra nesting might be as follows. Code can check whether expected fields are there, and act accordingly. | The JSON above validates [http://jsonlint.com/ here], and has a few extra arrays that I think are only really necessary for human readers (atom string IDs, element type rather than number). An alternative, with perhaps a little more standardization in the naming and some extra nesting might be as follows. Code can check whether expected fields are there, and act accordingly. This is close to the minimal example presented at the top of the page, but has arrays such as atoms.elements.type in addition to atoms.elements.number. | ||
<source lang="JavaScript"> | <source lang="JavaScript"> | ||
Line 92: | Line 175: | ||
5, 7, | 5, 7, | ||
5, 8 ] | 5, 8 ] | ||
}, | }, | ||
"order": [ 1, 1, 1, 1, 1, 1, 1 ] | "order": [ 1, 1, 1, 1, 1, 1, 1 ] |
Latest revision as of 13:18, 3 May 2017
We are moving development of the Chemical JSON format to a GitHub repository for better coordination.
The example presented below encodes a chemical molecule in JSON. The intent is be be expressive, and allow for most properties to be optional and encoded as arrays where appropriate. I took a CML file for ethane and attempted to translate it to a JSON representation. This forms the basis of our storage in Mongo DB, as well as an on disk format. Looking at some recent work on memory mapped binary JSON data structures in C++, I am encouraged by the flexibility and efficiency of this representation along with its strong programming language coverage.
A major challenge will be in establishing a list of accepted names, and a convention for adding new names. I think adding in a version number, and having a structure like JSON that is discoverable will make this approachable.
A minimal example of ethane, with just what a computer needs is outlined below, with a C++ implementation of a reader and writer available in Avogadro here.
{
"chemical json": 0,
"name": "ethane",
"inchi": "1/C2H6/c1-2/h1-2H3",
"formula": "C 2 H 6",
"atoms": {
"elements": {
"number": [ 1, 6, 1, 1, 6, 1, 1, 1 ]
},
"coords": {
"3d": [ 1.185080, -0.003838, 0.987524,
0.751621, -0.022441, -0.020839,
1.166929, 0.833015, -0.569312,
1.115519, -0.932892, -0.514525,
-0.751587, 0.022496, 0.020891,
-1.166882, -0.833372, 0.568699,
-1.115691, 0.932608, 0.515082,
-1.184988, 0.004424, -0.987522 ]
}
},
"bonds": {
"connections": {
"index": [ 0, 1,
1, 2,
1, 3,
1, 4,
4, 5,
4, 6,
4, 7 ]
},
"order": [ 1, 1, 1, 1, 1, 1, 1 ]
},
"properties": {
"molecular mass": 30.0690,
"melting point": -172,
"boiling point": -88
}
}
Crystal structures are specified using cell parameters and fractional (lattice) coordinates:
{
"chemical json": 0,
"name": "TiO2 rutile",
"formula": "Ti 2 O 4",
"unit cell": {
"a": 2.95812,
"b": 4.59373,
"c": 4.59373,
"alpha": 90.0,
"beta": 90.0,
"gamma": 90.0
},
"atoms": {
"elements": {
"number": [ 22, 22, 8, 8, 8, 8 ]
},
"coords": {
"3d fractional": [ 0.00000, 0.00000, 0.00000,
0.50000, 0.50000, 0.50000,
0.00000, 0.30530, 0.30530,
0.00000, 0.69470, 0.69470,
0.50000, 0.19470, 0.80530,
0.50000, 0.80530, 0.19470 ]
}
}
}
Earlier revisions
Some more verbose representations, with additional fields that are not necessarily required are presented below (and formed part of the initial design). The first iteration was not as nested for example,
{
"version": 0,
"name": "ethane",
"inchi": "1/C2H6/c1-2/h1-2H3",
"inchikey": "WETWJCDKMRHUPV-UHFFFAOYSA-N"
"formula": {
"concise": "C 2 H 6"
},
"atoms": {
"ids": [ "a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8" ],
"elementType": [ "H", "C", "H", "H", "C", "H", "H", "H" ],
"elementNumber": [ 1, 6, 1, 1, 6, 1, 1, 1 ],
"coords": {
"3d": [ 1.185080, -0.003838, 0.987524,
0.751621, -0.022441, -0.020839,
1.166929, 0.833015, -0.569312,
1.115519, -0.932892, -0.514525,
-0.751587, 0.022496, 0.020891,
-1.166882, -0.833372, 0.568699,
-1.115691, 0.932608, 0.515082,
-1.184988, 0.004424, -0.987522 ]
}
},
"bonds": {
"connectionIds": [ "a1", "a2",
"a2", "a3",
"a2", "a4",
"a2", "a5",
"a5", "a6",
"a5", "a7",
"a5", "a8" ],
"connectionIndex": [ 1, 2,
2, 3,
2, 4,
2, 5,
5, 6,
5, 7,
5, 8 ],
"order": [ 1, 1, 1, 1, 1, 1, 1 ]
},
"properties": {
"molecular mass": 30.0690,
"melting point": -172,
"boiling point": -88
}
}
The JSON above validates here, and has a few extra arrays that I think are only really necessary for human readers (atom string IDs, element type rather than number). An alternative, with perhaps a little more standardization in the naming and some extra nesting might be as follows. Code can check whether expected fields are there, and act accordingly. This is close to the minimal example presented at the top of the page, but has arrays such as atoms.elements.type in addition to atoms.elements.number.
{
"version": 0,
"name": "ethane",
"inchi": "1/C2H6/c1-2/h1-2H3",
"formula": {
"concise": "C 2 H 6"
},
"atoms": {
"ids": [ "a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8" ],
"elements": {
"type": [ "H", "C", "H", "H", "C", "H", "H", "H" ],
"number": [ 1, 6, 1, 1, 6, 1, 1, 1 ]
},
"coords": {
"3d": [ 1.185080, -0.003838, 0.987524,
0.751621, -0.022441, -0.020839,
1.166929, 0.833015, -0.569312,
1.115519, -0.932892, -0.514525,
-0.751587, 0.022496, 0.020891,
-1.166882, -0.833372, 0.568699,
-1.115691, 0.932608, 0.515082,
-1.184988, 0.004424, -0.987522 ]
}
},
"bonds": {
"connections": {
"ids": [ "a1", "a2",
"a2", "a3",
"a2", "a4",
"a2", "a5",
"a5", "a6",
"a5", "a7",
"a5", "a8" ],
"index": [ 1, 2,
2, 3,
2, 4,
2, 5,
5, 6,
5, 7,
5, 8 ]
},
"order": [ 1, 1, 1, 1, 1, 1, 1 ]
},
"properties": {
"molecular mass": 30.0690,
"melting point": -172,
"boiling point": -88
}
}