MongoChem PubChem Import

From wiki.openchemistry.org
Revision as of 12:13, 10 April 2013 by Kyle.lutz (talk | contribs) (Created page with "== Importing Data == The PubChem data in the SDF format can be imported with MongoChem's SDF importer. Under the File->Import menu select SDF to bring up the import dialog. Th...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Importing Data

The PubChem data in the SDF format can be imported with MongoChem's SDF importer. Under the File->Import menu select SDF to bring up the import dialog. Then navigate to the SDF file and click "Import". This will automatically load the data into the Mongo database.

The following will be extracted from the SDF data fields and inserted into the database:

  • PUBCHEM_IUPAC_TRADITIONAL_NAME -> name
  • PUBCHEM_IUPAC_INCHI -> inchi
  • PUBCHEM_IUPAC_INCHIKEY -> inchikey
  • PUBCHEM_MOLECULAR_WEIGHT -> mass, descriptors.mass
  • PUBCHEM_CACTVS_TPSA -> descriptors.tpsa
  • PUBCHEM_XLOGP3_AA -> descriptors.xlogp3

The following fields will be calculated from the molecular structure:

  • formula
  • atomCount
  • heavyAtomCount
  • vabc
  • mass (if PUBCHEM_MOLECULAR_WEIGHT is not present)

Sample Data

A sample dataset containing the first 2,500 molecules in PubChem is available here: pubchem2500.sdf.gz.