Chemical JSON: Difference between revisions
(Use molecular mass rather than just mass) |
(Added a link to the corresponding C++ implementation of Chemical JSON in the Avogadro source tree.) |
||
Line 105: | Line 105: | ||
A major challenge will be in establishing a list of accepted names, and a convention for adding new names. I think adding in a version number, and having a structure like JSON that is discoverable will make this approachable. | A major challenge will be in establishing a list of accepted names, and a convention for adding new names. I think adding in a version number, and having a structure like JSON that is discoverable will make this approachable. | ||
So, a more minimal example, with just what a computer needs... | So, a more minimal example, with just what a computer needs is outlined below, with a C++ implementation of a reader and writer [https://github.com/OpenChemistry/avogadrolibs/blob/master/avogadro/io/cjsonformat.cpp available in Avogadro here]. | ||
<source lang="JavaScript"> | <source lang="JavaScript"> |
Revision as of 18:54, 30 December 2012
This is my first attempt to outline a schema for encoding molecule objects in JSON. The intent is be be expressive, and allow for many properties to be optional and encoded as arrays where appropriate. I took a CML file for ethane (present in the Avogadro source tree) and attempted to translate it to a JSON representation. This would form the basis of our storage in Mongo DB, as well as a possible on disk format. Looking at some recent work on memory mapped binary JSON data structures in C++, I am encouraged by the flexibility and efficiency of this representation along with its strong programming language coverage.
{
"version": 0,
"name": "ethane",
"inchi": "1/C2H6/c1-2/h1-2H3",
"formula": {
"concise": "C 2 H 6"
},
"atoms": {
"ids": [ "a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8" ],
"elementType": [ "H", "C", "H", "H", "C", "H", "H", "H" ],
"elementNumber": [ 1, 6, 1, 1, 6, 1, 1, 1 ],
"coords": {
"3d": [ 1.185080, -0.003838, 0.987524,
0.751621, -0.022441, -0.020839,
1.166929, 0.833015, -0.569312,
1.115519, -0.932892, -0.514525,
-0.751587, 0.022496, 0.020891,
-1.166882, -0.833372, 0.568699,
-1.115691, 0.932608, 0.515082,
-1.184988, 0.004424, -0.987522 ]
}
},
"bonds": {
"connectionIds": [ "a1", "a2",
"a2", "a3",
"a2", "a4",
"a2", "a5",
"a5", "a6",
"a5", "a7",
"a5", "a8" ],
"connectionIndex": [ 1, 2,
2, 3,
2, 4,
2, 5,
5, 6,
5, 7,
5, 8 ],
"order": [ 1, 1, 1, 1, 1, 1, 1 ]
},
"properties": {
"molecular mass": 30.0690,
"melting point": -172,
"boiling point": -88
}
}
The JSON above validates here, and has a few extra arrays that I think are only really necessary for human readers (atom string IDs, element type rather than number). An alternative, with perhaps a little more standardization in the naming and some extra nesting might be as follows. Code can check whether expected fields are there, and act accordingly.
{
"version": 0,
"name": "ethane",
"inchi": "1/C2H6/c1-2/h1-2H3",
"formula": {
"concise": "C 2 H 6"
},
"atoms": {
"ids": [ "a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8" ],
"elements": {
"type": [ "H", "C", "H", "H", "C", "H", "H", "H" ],
"number": [ 1, 6, 1, 1, 6, 1, 1, 1 ]
},
"coords": {
"3d": [ 1.185080, -0.003838, 0.987524,
0.751621, -0.022441, -0.020839,
1.166929, 0.833015, -0.569312,
1.115519, -0.932892, -0.514525,
-0.751587, 0.022496, 0.020891,
-1.166882, -0.833372, 0.568699,
-1.115691, 0.932608, 0.515082,
-1.184988, 0.004424, -0.987522 ]
}
},
"bonds": {
"connections": {
"ids": [ "a1", "a2",
"a2", "a3",
"a2", "a4",
"a2", "a5",
"a5", "a6",
"a5", "a7",
"a5", "a8" ],
"index": [ 1, 2,
2, 3,
2, 4,
2, 5,
5, 6,
5, 7,
5, 8 ]
},
"order": [ 1, 1, 1, 1, 1, 1, 1 ]
},
"properties": {
"molecular mass": 30.0690,
"melting point": -172,
"boiling point": -88
}
}
A major challenge will be in establishing a list of accepted names, and a convention for adding new names. I think adding in a version number, and having a structure like JSON that is discoverable will make this approachable.
So, a more minimal example, with just what a computer needs is outlined below, with a C++ implementation of a reader and writer available in Avogadro here.
{
"chemical json": 0,
"name": "ethane",
"inchi": "1/C2H6/c1-2/h1-2H3",
"formula": "C 2 H 6",
"atoms": {
"elements": {
"number": [ 1, 6, 1, 1, 6, 1, 1, 1 ]
},
"coords": {
"3d": [ 1.185080, -0.003838, 0.987524,
0.751621, -0.022441, -0.020839,
1.166929, 0.833015, -0.569312,
1.115519, -0.932892, -0.514525,
-0.751587, 0.022496, 0.020891,
-1.166882, -0.833372, 0.568699,
-1.115691, 0.932608, 0.515082,
-1.184988, 0.004424, -0.987522 ]
}
},
"bonds": {
"connections": {
"index": [ 0, 1,
1, 2,
1, 3,
1, 4,
4, 5,
4, 6,
4, 7 ]
},
"order": [ 1, 1, 1, 1, 1, 1, 1 ]
},
"properties": {
"molecular mass": 30.0690,
"melting point": -172,
"boiling point": -88
}
}