Chemical JSON

From wiki.openchemistry.org
Revision as of 00:01, 1 January 2012 by Marcus.hanwell (talk | contribs) (Added my first draft of a JSON molecular representation.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This is my first attempt to outline a schema for encoding molecule objects in JSON. The intent is be be expressive, and allow for many properties to be optional and encoded as arrays where appropriate. I took a CML file for ethane (present in the Avogadro source tree) and attempted to translate it to a JSON representation. This would form the basis of our storage in Mongo DB, as well as a possible on disk format. Looking at some recent work on memory mapped binary JSON data structures in C++, I am encouraged by the flexibility and efficiency of this representation along with its strong programming language coverage.

{
  "version": 0,
  "name": "ethane",
  "inchi": "1/C2H6/c1-2/h1-2H3",
  "formula": {
    "concise": "C 2 H 6"
  },
  "atoms": {
    "ids": [ "a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8" ],
    "elementType":   [ "H", "C", "H", "H", "C", "H", "H", "H" ],
    "elementNumber": [  1,   6,   1,   1,   6,   1,   1,   1 ],
    "coords": {
      "3d": [  1.185080, -0.003838,  0.987524,
               0.751621, -0.022441, -0.020839,
               1.166929,  0.833015, -0.569312,
               1.115519, -0.932892, -0.514525,
              -0.751587,  0.022496,  0.020891,
              -1.166882, -0.833372,  0.568699,
              -1.115691,  0.932608,  0.515082,
              -1.184988,  0.004424, -0.987522 ]
    }
  },
  "bonds": {
    "connectionIds": [ "a1", "a2",
                       "a2", "a3",
                       "a2", "a4",
                       "a2", "a5",
                       "a5", "a6",
                       "a5", "a7",
                       "a5", "a8" ],
    "connectionIndex": [ 1, 2,
                         2, 3,
                         2, 4,
                         2, 5,
                         5, 6,
                         5, 7,
                         5, 8 ],
    "order": [ 1, 1, 1, 1, 1, 1, 1 ]
  },
  "properties": {
    "molecular weight": 30.0690,
    "melting point": -172,
    "boiling point": -88
  }
}

The JSON above validates here, and has a few extra arrays that I think are only really necessary for human readers (atom string IDs, element type rather than number).