Chemical JSON

From wiki.openchemistry.org
Revision as of 14:18, 3 May 2017 by Marcus.hanwell (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

We are moving development of the Chemical JSON format to a GitHub repository for better coordination.

The example presented below encodes a chemical molecule in JSON. The intent is be be expressive, and allow for most properties to be optional and encoded as arrays where appropriate. I took a CML file for ethane and attempted to translate it to a JSON representation. This forms the basis of our storage in Mongo DB, as well as an on disk format. Looking at some recent work on memory mapped binary JSON data structures in C++, I am encouraged by the flexibility and efficiency of this representation along with its strong programming language coverage.

A major challenge will be in establishing a list of accepted names, and a convention for adding new names. I think adding in a version number, and having a structure like JSON that is discoverable will make this approachable.

A minimal example of ethane, with just what a computer needs is outlined below, with a C++ implementation of a reader and writer available in Avogadro here.

{
  "chemical json": 0,
  "name": "ethane",
  "inchi": "1/C2H6/c1-2/h1-2H3",
  "formula": "C 2 H 6",
  "atoms": {
    "elements": {
      "number": [  1,   6,   1,   1,   6,   1,   1,   1 ]
    },
    "coords": {
      "3d": [  1.185080, -0.003838,  0.987524,
               0.751621, -0.022441, -0.020839,
               1.166929,  0.833015, -0.569312,
               1.115519, -0.932892, -0.514525,
              -0.751587,  0.022496,  0.020891,
              -1.166882, -0.833372,  0.568699,
              -1.115691,  0.932608,  0.515082,
              -1.184988,  0.004424, -0.987522 ]
    }
  },
  "bonds": {
    "connections": {
      "index": [ 0, 1,
                 1, 2,
                 1, 3,
                 1, 4,
                 4, 5,
                 4, 6,
                 4, 7 ]
    },
    "order": [ 1, 1, 1, 1, 1, 1, 1 ]
  },
  "properties": {
    "molecular mass": 30.0690,
    "melting point": -172,
    "boiling point": -88
  }
}

Crystal structures are specified using cell parameters and fractional (lattice) coordinates:

{
  "chemical json": 0,
  "name": "TiO2 rutile",
  "formula": "Ti 2 O 4",
  "unit cell": {
    "a": 2.95812,
    "b": 4.59373,
    "c": 4.59373,
    "alpha": 90.0,
    "beta":  90.0,
    "gamma": 90.0
  },
  "atoms": {
    "elements": {
      "number": [ 22, 22, 8, 8, 8, 8 ]
    },
    "coords": {
      "3d fractional": [ 0.00000, 0.00000, 0.00000,
                         0.50000, 0.50000, 0.50000,
                         0.00000, 0.30530, 0.30530,
                         0.00000, 0.69470, 0.69470,
                         0.50000, 0.19470, 0.80530,
                         0.50000, 0.80530, 0.19470 ]
    }
  }
}

Earlier revisions

Some more verbose representations, with additional fields that are not necessarily required are presented below (and formed part of the initial design). The first iteration was not as nested for example,

{
  "version": 0,
  "name": "ethane",
  "inchi": "1/C2H6/c1-2/h1-2H3",
  "inchikey": "WETWJCDKMRHUPV-UHFFFAOYSA-N"
  "formula": {
    "concise": "C 2 H 6"
  },
  "atoms": {
    "ids": [ "a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8" ],
    "elementType":   [ "H", "C", "H", "H", "C", "H", "H", "H" ],
    "elementNumber": [  1,   6,   1,   1,   6,   1,   1,   1 ],
    "coords": {
      "3d": [  1.185080, -0.003838,  0.987524,
               0.751621, -0.022441, -0.020839,
               1.166929,  0.833015, -0.569312,
               1.115519, -0.932892, -0.514525,
              -0.751587,  0.022496,  0.020891,
              -1.166882, -0.833372,  0.568699,
              -1.115691,  0.932608,  0.515082,
              -1.184988,  0.004424, -0.987522 ]
    }
  },
  "bonds": {
    "connectionIds": [ "a1", "a2",
                       "a2", "a3",
                       "a2", "a4",
                       "a2", "a5",
                       "a5", "a6",
                       "a5", "a7",
                       "a5", "a8" ],
    "connectionIndex": [ 1, 2,
                         2, 3,
                         2, 4,
                         2, 5,
                         5, 6,
                         5, 7,
                         5, 8 ],
    "order": [ 1, 1, 1, 1, 1, 1, 1 ]
  },
  "properties": {
    "molecular mass": 30.0690,
    "melting point": -172,
    "boiling point": -88
  }
}

The JSON above validates here, and has a few extra arrays that I think are only really necessary for human readers (atom string IDs, element type rather than number). An alternative, with perhaps a little more standardization in the naming and some extra nesting might be as follows. Code can check whether expected fields are there, and act accordingly. This is close to the minimal example presented at the top of the page, but has arrays such as atoms.elements.type in addition to atoms.elements.number.

{
  "version": 0,
  "name": "ethane",
  "inchi": "1/C2H6/c1-2/h1-2H3",
  "formula": {
    "concise": "C 2 H 6"
  },
  "atoms": {
    "ids": [ "a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8" ],
    "elements": {
      "type":   [ "H", "C", "H", "H", "C", "H", "H", "H" ],
      "number": [  1,   6,   1,   1,   6,   1,   1,   1 ]
    },
    "coords": {
      "3d": [  1.185080, -0.003838,  0.987524,
               0.751621, -0.022441, -0.020839,
               1.166929,  0.833015, -0.569312,
               1.115519, -0.932892, -0.514525,
              -0.751587,  0.022496,  0.020891,
              -1.166882, -0.833372,  0.568699,
              -1.115691,  0.932608,  0.515082,
              -1.184988,  0.004424, -0.987522 ]
    }
  },
  "bonds": {
    "connections": {
      "ids": [ "a1", "a2",
               "a2", "a3",
               "a2", "a4",
               "a2", "a5",
               "a5", "a6",
               "a5", "a7",
               "a5", "a8" ],
      "index": [ 1, 2,
                 2, 3,
                 2, 4,
                 2, 5,
                 5, 6,
                 5, 7,
                 5, 8 ]
    },
    "order": [ 1, 1, 1, 1, 1, 1, 1 ]
  },
  "properties": {
    "molecular mass": 30.0690,
    "melting point": -172,
    "boiling point": -88
  }
}