Chemical JSON
This is my first attempt to outline a schema for encoding molecule objects in JSON. The intent is be be expressive, and allow for many properties to be optional and encoded as arrays where appropriate. I took a CML file for ethane (present in the Avogadro source tree) and attempted to translate it to a JSON representation. This would form the basis of our storage in Mongo DB, as well as a possible on disk format. Looking at some recent work on memory mapped binary JSON data structures in C++, I am encouraged by the flexibility and efficiency of this representation along with its strong programming language coverage.
{
"version": 0,
"name": "ethane",
"inchi": "1/C2H6/c1-2/h1-2H3",
"formula": {
"concise": "C 2 H 6"
},
"atoms": {
"ids": [ "a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8" ],
"elementType": [ "H", "C", "H", "H", "C", "H", "H", "H" ],
"elementNumber": [ 1, 6, 1, 1, 6, 1, 1, 1 ],
"coords": {
"3d": [ 1.185080, -0.003838, 0.987524,
0.751621, -0.022441, -0.020839,
1.166929, 0.833015, -0.569312,
1.115519, -0.932892, -0.514525,
-0.751587, 0.022496, 0.020891,
-1.166882, -0.833372, 0.568699,
-1.115691, 0.932608, 0.515082,
-1.184988, 0.004424, -0.987522 ]
}
},
"bonds": {
"connectionIds": [ "a1", "a2",
"a2", "a3",
"a2", "a4",
"a2", "a5",
"a5", "a6",
"a5", "a7",
"a5", "a8" ],
"connectionIndex": [ 1, 2,
2, 3,
2, 4,
2, 5,
5, 6,
5, 7,
5, 8 ],
"order": [ 1, 1, 1, 1, 1, 1, 1 ]
},
"properties": {
"molecular weight": 30.0690,
"melting point": -172,
"boiling point": -88
}
}
The JSON above validates here, and has a few extra arrays that I think are only really necessary for human readers (atom string IDs, element type rather than number).