Chemical JSON: Difference between revisions
(Added my first draft of a JSON molecular representation.) |
(Added a second variant with a little more nesting, and perhaps more consistent names.) |
||
Line 49: | Line 49: | ||
</source> | </source> | ||
The JSON above validates [http://jsonlint.com/ here], and has a few extra arrays that I think are only really necessary for human readers (atom string IDs, element type rather than number). | The JSON above validates [http://jsonlint.com/ here], and has a few extra arrays that I think are only really necessary for human readers (atom string IDs, element type rather than number). An alternative, with perhaps a little more standardization in the naming and some extra nesting might be as follows. Code can check whether expected fields are there, and act accordingly. | ||
<source lang="JavaScript"> | |||
{ | |||
"version": 0, | |||
"name": "ethane", | |||
"inchi": "1/C2H6/c1-2/h1-2H3", | |||
"formula": { | |||
"concise": "C 2 H 6" | |||
}, | |||
"atoms": { | |||
"ids": [ "a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8" ], | |||
"elements": { | |||
"type": [ "H", "C", "H", "H", "C", "H", "H", "H" ], | |||
"number": [ 1, 6, 1, 1, 6, 1, 1, 1 ] | |||
}, | |||
"coords": { | |||
"3d": [ 1.185080, -0.003838, 0.987524, | |||
0.751621, -0.022441, -0.020839, | |||
1.166929, 0.833015, -0.569312, | |||
1.115519, -0.932892, -0.514525, | |||
-0.751587, 0.022496, 0.020891, | |||
-1.166882, -0.833372, 0.568699, | |||
-1.115691, 0.932608, 0.515082, | |||
-1.184988, 0.004424, -0.987522 ] | |||
} | |||
}, | |||
"bonds": { | |||
"connections": { | |||
"ids": [ "a1", "a2", | |||
"a2", "a3", | |||
"a2", "a4", | |||
"a2", "a5", | |||
"a5", "a6", | |||
"a5", "a7", | |||
"a5", "a8" ], | |||
"index": [ 1, 2, | |||
2, 3, | |||
2, 4, | |||
2, 5, | |||
5, 6, | |||
5, 7, | |||
5, 8 ] | |||
}, | |||
"order": [ 1, 1, 1, 1, 1, 1, 1 ] | |||
}, | |||
"properties": { | |||
"molecular weight": 30.0690, | |||
"melting point": -172, | |||
"boiling point": -88 | |||
} | |||
} | |||
</source> | |||
A major challenge will be in establishing a list of accepted names, and a convention for adding new names. I think adding in a version number, and having a structure like JSON that is discoverable will make this approachable. |
Revision as of 23:12, 31 December 2011
This is my first attempt to outline a schema for encoding molecule objects in JSON. The intent is be be expressive, and allow for many properties to be optional and encoded as arrays where appropriate. I took a CML file for ethane (present in the Avogadro source tree) and attempted to translate it to a JSON representation. This would form the basis of our storage in Mongo DB, as well as a possible on disk format. Looking at some recent work on memory mapped binary JSON data structures in C++, I am encouraged by the flexibility and efficiency of this representation along with its strong programming language coverage.
{
"version": 0,
"name": "ethane",
"inchi": "1/C2H6/c1-2/h1-2H3",
"formula": {
"concise": "C 2 H 6"
},
"atoms": {
"ids": [ "a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8" ],
"elementType": [ "H", "C", "H", "H", "C", "H", "H", "H" ],
"elementNumber": [ 1, 6, 1, 1, 6, 1, 1, 1 ],
"coords": {
"3d": [ 1.185080, -0.003838, 0.987524,
0.751621, -0.022441, -0.020839,
1.166929, 0.833015, -0.569312,
1.115519, -0.932892, -0.514525,
-0.751587, 0.022496, 0.020891,
-1.166882, -0.833372, 0.568699,
-1.115691, 0.932608, 0.515082,
-1.184988, 0.004424, -0.987522 ]
}
},
"bonds": {
"connectionIds": [ "a1", "a2",
"a2", "a3",
"a2", "a4",
"a2", "a5",
"a5", "a6",
"a5", "a7",
"a5", "a8" ],
"connectionIndex": [ 1, 2,
2, 3,
2, 4,
2, 5,
5, 6,
5, 7,
5, 8 ],
"order": [ 1, 1, 1, 1, 1, 1, 1 ]
},
"properties": {
"molecular weight": 30.0690,
"melting point": -172,
"boiling point": -88
}
}
The JSON above validates here, and has a few extra arrays that I think are only really necessary for human readers (atom string IDs, element type rather than number). An alternative, with perhaps a little more standardization in the naming and some extra nesting might be as follows. Code can check whether expected fields are there, and act accordingly.
{
"version": 0,
"name": "ethane",
"inchi": "1/C2H6/c1-2/h1-2H3",
"formula": {
"concise": "C 2 H 6"
},
"atoms": {
"ids": [ "a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8" ],
"elements": {
"type": [ "H", "C", "H", "H", "C", "H", "H", "H" ],
"number": [ 1, 6, 1, 1, 6, 1, 1, 1 ]
},
"coords": {
"3d": [ 1.185080, -0.003838, 0.987524,
0.751621, -0.022441, -0.020839,
1.166929, 0.833015, -0.569312,
1.115519, -0.932892, -0.514525,
-0.751587, 0.022496, 0.020891,
-1.166882, -0.833372, 0.568699,
-1.115691, 0.932608, 0.515082,
-1.184988, 0.004424, -0.987522 ]
}
},
"bonds": {
"connections": {
"ids": [ "a1", "a2",
"a2", "a3",
"a2", "a4",
"a2", "a5",
"a5", "a6",
"a5", "a7",
"a5", "a8" ],
"index": [ 1, 2,
2, 3,
2, 4,
2, 5,
5, 6,
5, 7,
5, 8 ]
},
"order": [ 1, 1, 1, 1, 1, 1, 1 ]
},
"properties": {
"molecular weight": 30.0690,
"melting point": -172,
"boiling point": -88
}
}
A major challenge will be in establishing a list of accepted names, and a convention for adding new names. I think adding in a version number, and having a structure like JSON that is discoverable will make this approachable.