MongoChem PubChem Import

From wiki.openchemistry.org
Jump to navigation Jump to search

Importing Data

The PubChem data in the SDF format can be imported with MongoChem's SDF importer. Under the File->Import menu select SDF to bring up the import dialog. Then navigate to the SDF file and click "Import". This will automatically load the data into the Mongo database.

The following will be extracted from the SDF data fields and inserted into the database:

  • PUBCHEM_IUPAC_TRADITIONAL_NAME -> name
  • PUBCHEM_IUPAC_INCHI -> inchi
  • PUBCHEM_IUPAC_INCHIKEY -> inchikey
  • PUBCHEM_MOLECULAR_WEIGHT -> mass, descriptors.mass
  • PUBCHEM_CACTVS_TPSA -> descriptors.tpsa
  • PUBCHEM_XLOGP3_AA -> descriptors.xlogp3

The following fields will be calculated from the molecular structure:

  • formula
  • atomCount
  • heavyAtomCount
  • vabc
  • mass (if PUBCHEM_MOLECULAR_WEIGHT is not present)

Sample Data

A sample dataset containing the first 2,500 molecules in PubChem is available here: pubchem2500.sdf.gz.