GSoC Ideas 2022: Difference between revisions

From wiki.openchemistry.org
Jump to navigation Jump to search
No edit summary
No edit summary
 
(9 intermediate revisions by 4 users not shown)
Line 3: Line 3:
Open Chemistry is an umbrella for projects in chemistry, materials science, biochemistry, and related areas. While we have participated in the last few Google Summer of Code programs and will apply again in 2022, there is '''no guarantee''' that we will be selected again for GSoC in 2022.
Open Chemistry is an umbrella for projects in chemistry, materials science, biochemistry, and related areas. While we have participated in the last few Google Summer of Code programs and will apply again in 2022, there is '''no guarantee''' that we will be selected again for GSoC in 2022.


One important factor is that GSoC in 2022 will include both shorter projects and longer projects. You should consider the appropriate timeline for your project proposal.
One important factor is that GSoC in 2022 will include both shorter projects (~175 hours) and longer projects (~350 hours). You should consider the appropriate timeline for your project proposal. We have indicated in the project totals where we suggest particular lengths.
 
If you are unsure of the scope of a project, please reach out and discuss BEFORE the proposal deadline.
 
When possible, submitting drafts a week or more in dance of the proposal deadline is preferred because we can make suggestions towards your proposal.


We have gathered a pool of interested mentors together who are seasoned developers in each of these projects. We welcome original ideas in addition to what's listed here - please suggest something interesting for open source chemistry!
We have gathered a pool of interested mentors together who are seasoned developers in each of these projects. We welcome original ideas in addition to what's listed here - please suggest something interesting for open source chemistry!
Line 11: Line 15:
When adding a new idea to this page, please try to include the following information:
When adding a new idea to this page, please try to include the following information:


* Size of the project (Medium = ~175 hours of work, ~6 weeks) or (Large = ~350 hours of work, ~12 weeks)
* Size of the project (~175 hours of work, ~6 weeks) or (~350 hours of work, ~12 weeks)
* A brief explanation of the idea.
* A brief explanation of the idea.
* Expected results/feature additions.
* Expected results/feature additions.
Line 27: Line 31:
[http://two.avogadro.cc/ Avogadro 2] is a chemical editor and visualization application, it is also a set of reusable software libraries written in C++ using principles of modularity for maximum reuse. We offer permissively licensed, open source, cross platform software components in the Avogadro 2 libraries, along with an end-user application with full source code, and binaries.
[http://two.avogadro.cc/ Avogadro 2] is a chemical editor and visualization application, it is also a set of reusable software libraries written in C++ using principles of modularity for maximum reuse. We offer permissively licensed, open source, cross platform software components in the Avogadro 2 libraries, along with an end-user application with full source code, and binaries.


=== Project [Large]: Scripting Bindings ===
=== Project [350 hours]: Scripting Bindings ===


'''Brief explanation:''' Implement an embedded scripting language (i.e., Python) in Avogadro 2
'''Brief explanation:''' Implement an embedded scripting language (i.e., Python) in Avogadro 2
Line 39: Line 43:
'''Mentor:''' Geoff Hutchison (geoffh at pitt dot edu) or Marcus D. Hanwell (mhanwell at bnl dot gov)
'''Mentor:''' Geoff Hutchison (geoffh at pitt dot edu) or Marcus D. Hanwell (mhanwell at bnl dot gov)


=== Project [Medium]: Integrate with RDKit ===
=== Project [175 hours]: Integrate with RDKit ===


'''Brief explanation:''' Integrate the RDKit toolkit into Avogadro for conformer sampling and force field optimization
'''Brief explanation:''' Integrate the RDKit toolkit into Avogadro for conformer sampling and force field optimization
Line 49: Line 53:
'''Mentor:''' Geoff Hutchison (geoffh at pitt dot edu)
'''Mentor:''' Geoff Hutchison (geoffh at pitt dot edu)


=== Project [Medium or Large]: Tools for Interactive Molecular Dynamics ===
=== Project [175 or 350 hours]: Tools for Interactive Molecular Dynamics ===


'''Brief explanation:''' Building solvent boxes, implementing standard molecular dynamics using in-progress optimization framework.
'''Brief explanation:''' Building solvent boxes, implementing standard molecular dynamics using in-progress optimization framework. The scope could be 175 or 350 hours - please discuss what scale project you have in mind.


'''Expected results:''' Avogadro (v1) has interactive force field optimization allowing building and manipulation (e.g., push-pull atoms into position). Some users call this 'video game mode' ;-) A new optimization framework is in progress, including calling external programs for energies and forces. The project would enable building out MD simulations, including tools to add water or solvent boxes, build larger systems (e.g., via PackMol integration) and implement simple MD integration and thermostats.
'''Expected results:''' Avogadro (v1) has interactive force field optimization allowing building and manipulation (e.g., push-pull atoms into position). Some users call this 'video game mode' ;-) A new optimization framework is in progress, including calling external programs for energies and forces. The project would enable building out MD simulations, including tools to add water or solvent boxes, build larger systems (e.g., via PackMol integration) and implement simple MD integration and thermostats.
Line 59: Line 63:
'''Mentor:''' Geoff Hutchison (geoffh at pitt dot edu)
'''Mentor:''' Geoff Hutchison (geoffh at pitt dot edu)


=== Project [Large]: Efficient Molecular Surfaces / Orbitals ===
=== Project [350 hours]: Efficient Molecular Surfaces / Orbitals ===


'''Brief explanation:''' Generating and rendering molecular surfaces is a common task, from solvent-accessible and solvent-excluded surfaces to molecular orbitals, electron density, spin density, etc.
'''Brief explanation:''' Generating and rendering molecular surfaces is a common task, from solvent-accessible and solvent-excluded surfaces to molecular orbitals, electron density, spin density, etc.
Line 74: Line 78:
[http://openbabel.org Open Babel] is an open toolbox for chemistry, designed to speak the many languages of chemical data. It's an open, collaborative project allowing anyone to search, convert, analyze, or store data from molecular modeling, chemistry, solid-state materials, biochemistry, or related areas.
[http://openbabel.org Open Babel] is an open toolbox for chemistry, designed to speak the many languages of chemical data. It's an open, collaborative project allowing anyone to search, convert, analyze, or store data from molecular modeling, chemistry, solid-state materials, biochemistry, or related areas.


=== Project [Medium]: Integrate CoordGen library ===
=== Project [175 hours]: Integrate CoordGen library ===


'''Expected results:''' Schrodinger has released a BSD-licensed library for 2D chemical structure layout (https://github.com/schrodinger/coordgenlibs) and it has been successfully integrated into RDKit. The student will be responsible for integrating CoordGen into Open Babel. Code will be written in C++.
'''Expected results:''' Schrodinger has released a BSD-licensed library for 2D chemical structure layout (https://github.com/schrodinger/coordgenlibs) and it has been successfully integrated into RDKit. The student will be responsible for integrating CoordGen into Open Babel. Code will be written in C++.
Line 80: Line 84:
'''Mentor:''' Geoff Hutchison (geoffh at pitt dot edu)
'''Mentor:''' Geoff Hutchison (geoffh at pitt dot edu)


=== Project [Medium]: Implement MMTF format ===
=== Project [175 hours]: Implement MMTF format ===


'''Brief explanation:''' Implementation of MMTF file format in OpenBabel.   
'''Brief explanation:''' Implementation of MMTF file format in OpenBabel.   
Line 88: Line 92:
'''Mentor:''' Geoff Hutchison (geoffh at pitt dot edu) or David Koes (dkoes at pitt dot edu)
'''Mentor:''' Geoff Hutchison (geoffh at pitt dot edu) or David Koes (dkoes at pitt dot edu)


=== Project [Medium]: Test Framework Overhaul  ===
=== Project [175 hours]: Test Framework Overhaul  ===


'''Brief explanation:''' Automated testing is an important part of maintaining code quality.  This project will improve the current testing regime of openbabel.
'''Brief explanation:''' Automated testing is an important part of maintaining code quality.  This project will improve the current testing regime of openbabel.
Line 98: Line 102:
'''Mentor:''' Geoff Hutchison (geoffh at pitt dot edu), David Koes (dkoes at pitt dot edu), the OpenBabel development community.
'''Mentor:''' Geoff Hutchison (geoffh at pitt dot edu), David Koes (dkoes at pitt dot edu), the OpenBabel development community.


=== Project [Large]: Develop a JavaScript version of Open Babel ===
=== Project [350 hours]: Develop a JavaScript version of Open Babel ===


'''Brief explanation:''' Building on existing work, you will use Emscripten to compile the C++ codebase of Open Babel to JavaScript. This will make it easy to write in-browser applications that need cheminformatics functionality.
'''Brief explanation:''' Building on existing work, you will use Emscripten to compile the C++ codebase of Open Babel to JavaScript. This will make it easy to write in-browser applications that need cheminformatics functionality.
Line 110: Line 114:
'''Mentor''': Noel O'Boyle (baoilleach at gmail dot com)
'''Mentor''': Noel O'Boyle (baoilleach at gmail dot com)


=== Project [Large]: Develop a validation and standardization filter ===
=== Project [350 hours]: Develop a validation and standardization filter ===


'''Brief explanation''': Given a particular molecular structure, can we say how chemically plausible is it, and use this as to filter or warn about problems (e.g., undefined stereo centers)?
'''Brief explanation''': Given a particular molecular structure, can we say how chemically plausible is it, and use this as to filter or warn about problems (e.g., undefined stereo centers)?
Line 128: Line 132:
[http://cclib.github.io cclib] is an open source library, written in Python, for parsing and interpreting the results of computational chemistry packages. The goals of cclib are centered around the reuse of data obtained from these programs when stored in program-specific output files.
[http://cclib.github.io cclib] is an open source library, written in Python, for parsing and interpreting the results of computational chemistry packages. The goals of cclib are centered around the reuse of data obtained from these programs when stored in program-specific output files.


===Project: Implement new parsers===
===Project: [175 or 350 hours] Implement new parsers===


'''Brief explanation''': There are outstanding issues on GitHub for supporting more programs (e.g. CFOUR, xtb, NBO, GAMESS dat, MRCC, DIRAC), and parsing binary files for various QM programs (e.g. Gaussian, NWChem, and ORCA). There may also be more programs missing that haven't been considered.
'''Brief explanation''': There are outstanding issues on GitHub for supporting more programs (e.g. CFOUR, xtb, NBO, GAMESS dat, MRCC, DIRAC), and parsing binary files for various QM programs (e.g. Gaussian, NWChem, and ORCA). There may also be more programs missing that haven't been considered.
Line 138: Line 142:
'''Mentors''': Eric Berquist (eric.john.berquist at gmail dot com) and/or Shiv Upadhyay (shivnupadhyay at gmail dot com) and/or Adam Tenderholt (atenderholt at gmail dot com) and/or Karol Langner (karol.langner at gmail dot com)
'''Mentors''': Eric Berquist (eric.john.berquist at gmail dot com) and/or Shiv Upadhyay (shivnupadhyay at gmail dot com) and/or Adam Tenderholt (atenderholt at gmail dot com) and/or Karol Langner (karol.langner at gmail dot com)


===Project: Implement new bridges===
===Project: [175 or 350 hours] Implement new bridges===


'''Brief explanation''': There are outstanding issues on GitHub for more integrations with external programs (e.g. chemfiles, RDKit) via their Python bindings. There may also be more programs missing that haven't been considered.
'''Brief explanation''': There are outstanding issues on GitHub for more integrations with external programs (e.g. chemfiles, RDKit) via their Python bindings. There may also be more programs missing that haven't been considered.
Line 148: Line 152:
'''Mentors''': Eric Berquist (eric.john.berquist at gmail dot com) and/or Shiv Upadhyay (shivnupadhyay at gmail dot com) and/or Adam Tenderholt (atenderholt at gmail dot com) and/or Karol Langner (karol.langner at gmail dot com)
'''Mentors''': Eric Berquist (eric.john.berquist at gmail dot com) and/or Shiv Upadhyay (shivnupadhyay at gmail dot com) and/or Adam Tenderholt (atenderholt at gmail dot com) and/or Karol Langner (karol.langner at gmail dot com)


===Project: Implement new methods===
===Project [350 hours]: Implement new methods===


'''Brief explanation''': There are outstanding issues on GitHub for more analysis methods being added directly to cclib (e.g. calculating geometric parameters). There may also be other methods that are desirable to include which haven't been considered.
'''Brief explanation''': There are outstanding issues on GitHub for more analysis methods being added directly to cclib (e.g. calculating geometric parameters). There may also be other methods that are desirable to include which haven't been considered.
Line 158: Line 162:
'''Mentors''': Eric Berquist (eric.john.berquist at gmail dot com) and/or Shiv Upadhyay (shivnupadhyay at gmail dot com) and/or Adam Tenderholt (atenderholt at gmail dot com) and/or Karol Langner (karol.langner at gmail dot com)
'''Mentors''': Eric Berquist (eric.john.berquist at gmail dot com) and/or Shiv Upadhyay (shivnupadhyay at gmail dot com) and/or Adam Tenderholt (atenderholt at gmail dot com) and/or Karol Langner (karol.langner at gmail dot com)


===Project: Julia bindings===
===Project [350 hours]: Julia bindings===


'''Brief explanation''': The Julia programming language (https://julialang.org/) is growing in popularity for computational chemistry as a language that both production-level computation and analysis can be performed in seamlessly. In order to analyze computational chemistry outputs from traditional programs in Julia, rather than reimplement all cclib functionality in Julia, we should be able to call cclib from Julia directly and reuse its core functionality.
'''Brief explanation''': The Julia programming language (https://julialang.org/) is growing in popularity for computational chemistry as a language that both production-level computation and analysis can be performed in seamlessly. In order to analyze computational chemistry outputs from traditional programs in Julia, rather than reimplement all cclib functionality in Julia, we should be able to call cclib from Julia directly and reuse its core functionality.
Line 168: Line 172:
'''Mentors''': Eric Berquist (eric.john.berquist at gmail dot com)
'''Mentors''': Eric Berquist (eric.john.berquist at gmail dot com)


===Project: Additional visualization for OpenChemVault===
===Project [350 hours]: Additional visualization for OpenChemVault===


'''Brief explanation''': OpenChemVault (https://github.com/cclib/openchemvault) is capable of parsing output files, storing them, and displaying geometries, but any sort of additional visualization (such as plotting molecular orbitals or spectra) is missing. The capabilities of GaussSum (http://gausssum.sourceforge.net/) are a possible starting point.
'''Brief explanation''': OpenChemVault (https://github.com/cclib/openchemvault) is capable of parsing output files, storing them, and displaying geometries, but any sort of additional visualization (such as plotting molecular orbitals or spectra) is missing. The capabilities of GaussSum (http://gausssum.sourceforge.net/) are a possible starting point.
Line 180: Line 184:
== RDKit Project Ideas ==
== RDKit Project Ideas ==


[http://www.rdkit.org The RDKit] is a BSD licensed open source cheminformatics toolkit written in C++ with wrappers for use from Python, Java, and C#. The RDKit also provides "cartridge" functionality that allows chemical searching in the open-source relational database PostgreSQL.
[http://www.rdkit.org The RDKit] is a BSD licensed open source cheminformatics toolkit written in C++ with wrappers for use from Python, Java, C#, and JavaScript. The RDKit also provides "cartridge" functionality that allows chemical searching in the open-source relational database PostgreSQL.
 
=== Project [175 hours]: Port xyz2mol to the RDKit core ===
 
'''Brief explanation:''' Assignment of bond orders to molecules where only atomic coordinates are available is a challenging problem. The xyz2mol package from Prof. Jan H. Jensen's research group in Denmark, https://github.com/jensengroup/xyz2mol, is a robust and well-tested solution to the problem. The goal of this project is to port the xyz2mol code from Python to C++ and integrate it into the core RDKit. Jan Jensen will help us on this project by answering questions and providing advice on the re-implementation.
 
'''Expected results:''' A C++ implementation of the xyz2mol code along with a robust set of test cases. Wrappers for the calculator so that it is accessible from within the Python and SWIG (Java and C#) wrappers.
 
'''Prerequisites:''' C++
 
'''Mentor:''' Joey Storer (JWStorer at dow.com)
 


=== Project: Implement Molecular Interaction Fields calculations in the RDKit ===
=== Project [350 hours]: Implement Molecular Interaction Fields calculations in the RDKit ===


'''Brief explanation:''' There is an old PR for the RDKit that implements molecular interaction fields: https://github.com/rdkit/rdkit/pull/318. This was never merged because the author ran out of time. At this point a lot of work would be required to update and finish this PR, but the results would be super useful for the RDKit community.
'''Brief explanation:''' There is an old PR for the RDKit that implements molecular interaction fields: https://github.com/rdkit/rdkit/pull/318. This was never merged because the author ran out of time. At this point a lot of work would be required to update and finish this PR, but the results would be super useful for the RDKit community.


'''Expected results:''' A C++ implementation of the GRID calculator code along with a robust set of test cases. Wrappers for the reader so that it is accessible from within the Python and SWIG (Java and C#) wrappers.
'''Expected results:''' A C++ implementation of the GRID calculator code along with a robust set of test cases. Wrappers for the calculator so that it is accessible from within the Python and SWIG (Java and C#) wrappers.


'''Prerequisites:''' C++
'''Prerequisites:''' C++
Line 192: Line 207:
'''Mentor:''' Greg Landrum (greg.landrum at t5informatics dot com)
'''Mentor:''' Greg Landrum (greg.landrum at t5informatics dot com)


=== Project: RDKit+OpenMM GPU Molecular Force Fields ===
=== Project [350 hours]: RDKit+OpenMM GPU Molecular Force Fields ===


'''Brief explanation:''' OpenMM (http://openmm.org/) is a high-performance toolkit for force-field based molecular simulation that includes GPU and CPU support. The goal of this project is to make it easy to use OpenMM force fields to minimize the energies of or perform molecular dynamics calculations on RDKit molecules.
'''Brief explanation:''' OpenMM (http://openmm.org/) is a high-performance toolkit for force-field based molecular simulation that includes GPU and CPU support. The goal of this project is to make it easy to use OpenMM force fields to minimize the energies of or perform molecular dynamics calculations on RDKit molecules.


'''Expected results:''' OpenMM supports a wide range of force fields, but not the classical MMFF94 or UFF methods implemented in RDKit. Needed is C++ functionality allowing RDKit molecules to be sent to OpenMM for minimization and/or to perform molecular dynamics. A robust set of regression tests for this functionality. Python wrappers around the new functionality. The work would likely involve completing the MMFF94 implementation described by Paolo Tosco at the 2017 RDKit UGM (https://github.com/rdkit/UGM_2017/blob/master/Presentations/Tosco_RDKit_OpenMM_integration.pdf) and extending to other force fields like UFF.
'''Expected results:''' OpenMM supports a wide range of force fields, but not the classical MMFF94 or UFF methods implemented in RDKit. Needed is C++ functionality allowing RDKit molecules to be sent to OpenMM for minimization and/or to perform molecular dynamics. A robust set of regression tests for this functionality. Python wrappers around the new functionality. The work would likely involve completing the MMFF94 implementation described by Paolo Tosco at the 2017 RDKit UGM (https://github.com/rdkit/UGM_2017/blob/master/Presentations/Tosco_RDKit_OpenMM_integration.pdf) and extending to other force fields like UFF. Another approach is the small-molecule support used by OpenFF: https://github.com/openmm/openmmforcefields


'''Prerequisites:''' C++ and some Python
'''Prerequisites:''' C++ and some Python
Line 205: Line 220:


QC-Devs (https://qcdevs.org/) develops various free, open-source, and cross-platform libraries for scientific computing, especially theoretical and computational chemistry. Our goal is to make programming accessible to chemists and promote precepts of sustainable software development. The two main pieces of the QC-Devs ecosystem are:
QC-Devs (https://qcdevs.org/) develops various free, open-source, and cross-platform libraries for scientific computing, especially theoretical and computational chemistry. Our goal is to make programming accessible to chemists and promote precepts of sustainable software development. The two main pieces of the QC-Devs ecosystem are:
HORTON (electronic structure theory): https://quantumelephant.org/
ChemTools (molecular structure and reactivity): https://chemtools.org/
All our repositories are hosted on Theochem organization (https://github.com/theochem) on GitHub.


=== Project: Visualization of Molecular Structure and Reactivity ===
<ul>
'''Brief Explanation:''' ChemTools (https://github.com/theochem/chemtools) is a post-processing library for extracting chemical insight from quantum chemistry calculations. Currently, ChemTools relies on Visual Molecular Dynamics (VMD) and Matplotlib for visualization. ChemTools has the functionality to generate visualization scripts for VMD, so the user can easily generate informative plots like iso-surface of electron density colored by electrostatic potential.
<li><blockquote><p>HORTON (electronic structure theory): [https://quantumelephant.org/ <u>https://quantumelephant.org/</u>]</p></blockquote></li>
<li><blockquote><p>ChemTools (molecular structure and reactivity): [https://chemtools.org/ <u>https://chemtools.org/</u>]</p></blockquote></li></ul>
 
All our repositories are hosted on Theochem organization ([https://github.com/theochem <u>https://github.com/theochem</u>]) on GitHub.
 
=== Project [175 hours or 350 hours]: Visualization of Molecular Structure and Reactivity ===
 


'''Expected Results:''' Add functionality to ChemTools to generate visualization scripts for PyMol, IQMol, and Avogadro. The current functionality for VMD can be used as a template.
'''Brief Explanation:''' ChemTools ([https://github.com/theochem/chemtools <u>https://github.com/theochem/chemtools</u>]) is a post-processing library for extracting chemical insight from quantum chemistry calculations. Currently, ChemTools relies on Visual Molecular Dynamics (VMD) and Matplotlib for visualization. ChemTools has the functionality to generate visualization scripts for VMD, so the user can easily generate informative plots like iso-surface of electron density colored by electrostatic potential. Visualization of (annotated) molecular structures and molecular structure changes along reaction pathways are also of interest, but the implementations are unpolished.
Difficulty Level: Intermediate


'''Relevant Skills:''' Experience with Python and visualization
'''Expected Results:'''


'''Mentor:''' Farnaz Heidar-Zadeh (farnaz.heidarzadeh at queensu dot ca), Gabriela Sánchez Díaz (sanchezg at mcmaster dot ca), and Esteban Vohringer-Martinez (estebanvohringer at qcmmlab dot com)
<blockquote>'''175 hours:''' Add functionality to ChemTools to generate visualization scripts for ChimeraX ([https://www.cgl.ucsf.edu/chimerax/ <u>https://www.cgl.ucsf.edu/chimerax/</u>]). The current functionality for VMD can be used as a template.


=== Project: Visualize Chemical Reactions ===
'''350 hours:''' Add ChemTools as a back-end for SEQCROW ([https://cxtoolshed.rbvi.ucsf.edu/apps/seqcrow <u>https://cxtoolshed.rbvi.ucsf.edu/apps/seqcrow</u>]), a free and open-source bundle ([https://github.com/QChASM/SEQCROW <u>https://github.com/QChASM/SEQCROW</u>]) in the ChimeraX toolshed ([https://cxtoolshed.rbvi.ucsf.edu/ <u>https://cxtoolshed.rbvi.ucsf.edu/</u>]) for building molecules and interacting with the output of quantum chemistry calculations.
'''Brief Explanation:''' GOpt (https://github.com/theochem/gopt) is a Python library for optimizing molecular structures and determining chemical reaction pathways. Currently, GOpt can output a series of chemically relevant numerical structures (e.g., structures along the intrinsic reaction coordinate; optimization trajectories), but there is no interface to visualize these structures or perform structural or chemical analysis of them. The goal of this project is to generate visualization scripts for Avogadro, PyMol and/or IQMol, all of which can provide animations of reaction pathways and optimization trajectories. A stretch goal is to provide a workflow linking GOpt to ChemTools (https://github.com/theochem/chemtools), so that structural and reactivity indicators can be computed and visualized along reaction pathways.


'''Expected Results:''' Add functionality to GOpt to generate visualization scripts for Avogadro, PyMol and/or IQMol. (Stretch goal: Interface Gopt and ChemTools to facilitate chemical reaction path analysis.)
'''Difficulty Level:''' Intermediate (175) to High-Intermediate (350)
</blockquote>
'''Relevant Skills:''' Experience with Python, visualization, and software interfacing (350)


'''Difficulty Level:''' Easy
'''Mentors:''' Ali Tehrani (alirezatehrani24 at gmail dot com), Gabriela Sanchez Diaz (sanchezg at mcmaster dot ca), Farnaz Heidar-Zadeh (farnaz.heidarzadeh at queensu dot ca), and Esteban Vohringer-Martinez (estebanvohringer at qcmmlab dot com).


'''Relevant Skills:''' Experience with Python
=== Project [175 or 350 hours]: Extended interoperability of GOpt and Quantum Chemistry Software ===


'''Mentor:''' Derrick Yang (yxt1991 at gmail dot com) and Paul Ayers (ayers at mcmaster dot ca)
'''Brief Explanation:''' ChemTools ([https://github.com/theochem/chemtools <u>https://github.com/theochem/chemtools</u>]) is a post-processing library for extracting chemical insight from quantum chemistry calculations. Currently, ChemTools relies on modules of the HORTON library to compute the basic quantities required for its analysis. The goal of this project is to extend the interoperability of ChemTools, so that it can use the [https://github.com/psi4 <u>Psi4</u>] ([https://github.com/psi4 <u>https://github.com/psi4</u>]) &amp; [https://github.com/pyscf/pyscf <u>PySCF</u>] ([https://github.com/pyscf/pyscf <u>https://github.com/pyscf/pyscf</u>]) packages and take advantage of their features.


=== Project: Extended interoperability of GOpt and Quantum Chemistry Software ===
'''Expected Results:'''
'''Brief Explanation:''' GOpt (https://github.com/theochem/gopt) is a Python library for optimizing molecular structures and determining chemical reaction pathways.  Currently, it obtains the required information (e.g. atomic forces and Hessian matrix) for optimization from the Gaussian quantum chemistry package. The goal of this project is to make it possible for GOpt to use Psi4, PySCF, ORCA, and NWChem at every step of the optimization.


'''Expected Results:''' Expanding the scope of the GOpt library by increasing the number of quantum chemistry packages it can use for studying chemical reactions. You are expected to use IOData (https://github.com/theochem/iodata) which is a Python library for parsing, storing, and writing various quantum chemistry file formats and generating input files for quantum chemistry packages. This involves:
<blockquote>'''175 hours:''' Writing wrappers for Psi4 or PySCF to compute various quantum mechanical properties and provide those properties to ChemTools for further analysis. The current wrappers for HORTON can be used as a template. Both Psi4 &amp; PySCF have Python interfaces.
GOpt using IOData to write an appropriate input file for the above-mentioned quantum chemistry package.  
GOpt using IOData to parse the (formatted) output files from these quantum chemistry packages to extract the necessary information (energy, gradient, Hessian, etc.) required.


'''350 hours:''' Writing wrappers for Psi4 and PySCF.
</blockquote>
'''Difficulty Level:''' Intermediate
'''Difficulty Level:''' Intermediate


'''Relevant Skills:''' Experience with Python.
'''Relevant Skills:''' Experience with scientific Python, advanced Numpy, object-oriented programming, and knowledge of quantum chemistry software
 
'''Mentors:''' Gabriela Sanchez Diaz (sanchezg at mcmaster dot ca), Ali Tehrani (alirezatehrani24 at gmail dot com), and Farnaz Heidar-Zadeh (farnaz.heidarzadeh at queensu dot ca).
 
=== Project [175 hours]: Extended Periodic Table of Atomic Properties ===


'''Mentor:''' Derrick Yang (yxt1991 at gmail dot com), Farnaz Heidar-Zadeh (farnaz.heidarzadeh at queensu dot ca), and Paul Ayers (ayers at mcmaster dot ca)
'''Brief Explanation:''' A database of atomic properties is often used to estimate molecular properties via group additivity rules. While several "periodic table" databases exist, most of them lack data about excited states and charged atoms, and certainly about exotic species (e.g., excited states of highly charged species). Most also lack information about local properties (e.g., the electron density and local property densities).


=== Project: Implement Workflows for Calculation and Usage of Databases of Isolated Atom Densities ===
AtomDB ([https://github.com/theochem/AtomDB <u>https://github.com/theochem/AtomDB</u>]) provides data for global and local properties, from experiment, theory, and computation. As there are many ways to calculate these properties, and not all properties are accessible from all calculations, it is important that the database be adaptable, so that data can easily be added from new computational and theoretical models. Setting up and processing such calculations by hand (for different elements, ions, spin states, ...) is extremely tedious and error-prone.
'''Brief Explanation:''' A database of atomic electron densities is often used to analyze electron densities of gas-phase molecules or condensed phases. In practice, there are many ways to calculate the electron density, using different theoretical models and computational tools. As a consequence, such a database is not a one-time effort, but rather a procedure that is regularly repeated with different computational settings and theoretical models. Setting up and processing such calculations by hand (for different elements, ions, spin states, ...) is extremely tedious and error-prone. The implementation of an easy-to-use workflow would heavily reduce the burden of researchers who make use of such databases. This project also aims to facilitate the exchange and archival of atomic density databases.


'''Expected Results:'''
'''Expected Results:'''
Extension of Denspart (https://github.com/theochem/denspart) with a database that can store (spherical) atomic electron densities together with atomic metadata. This program currently uses a hard-coded database.
 
Development and implementation of a JSON specification for archival and exchange of atomic density databases.
<ol style="list-style-type: decimal;">
Implementation of a workflow for setting up new databases. This involves (i) the generation of input files for existing quantum chemistry codes together with a suitable job script to execute the calculations on an HPC and (ii) processing the outputs of these calculations. This workflow will be implemented using other packages in the HORTON project, such as IOData, Grid, and GBasis. (See https://github.com/theochem)
<li><blockquote><p>Extension of AtomDB with a workflow for generating/storing atomic properties from new calculations, including (spherically-averaged) atomic property densities. This entails generating input files for existing quantum chemistry codes together with a suitable job script to execute the calculations on an HPC and (ii) processing the outputs of these calculations. This workflow will be implemented using other packages in the HORTON project, such as IOData, Grid, and GBasis. (See [https://github.com/theochem <u>https://github.com/theochem</u>])</p></blockquote></li>
<li><blockquote><p>Utilities for generating molecular properties and property densities using various atom/group additivity rules.</p></blockquote></li></ol>


'''Difficulty Level:''' Intermediate
'''Difficulty Level:''' Intermediate


'''Relevant Skills:''' Experience with Python, NumPy  
'''Relevant Skills:''' Experience with Python, NumPy, object-oriented programming, quantum chemistry on hight performance computing (HPC)
 
'''Mentors:''' Gabriela Sanchez Diaz (sanchezg at mcmaster dot ca), Ali Tehrani (alirezatehrani24 at gmail dot com), Paul Ayers (ayers at mcmaster dot ca), and Farnaz Heidar-Zadeh (farnaz.heidarzadeh at queensu dot ca).
 
=== Project [175 hours]: Orthogonal Procrustes for Rectangular Matrices ===
 
'''Brief Explanation:''' Procrustes ([https://github.com/theochem/procrustes <u>https://github.com/theochem/procrustes</u>]) is a library for finding the optimal transformation that makes two matrices as close as possible to each other. Procrustes analysis has numerous applications in object recognition, though our primary interest pertains to its utility for quantifying chemical and physical (dis)similarity of molecular structures. Currently, when two input matrices have different numbers of columns, the smaller matrix is augmented by columns of zeros (zero-padding). An alternative to this artificial approach was recently proposed for the special case of orthogonal transformations [<nowiki/>[https://epubs.siam.org/doi/10.1137/19M1270872 <u>SIAM Journal of Matrix vol. 41, pp. 957-983 (2020)</u>]]. The goal of this process is to implement the SCFRTR method (algorithm 5.1) into the Procrustes library.
 
'''Expected Results:'''
 
<blockquote>'''175 hours:''' Extension of Procrustes to include the SCFRTR algorithm as an alternative to zero-padding for unbalanced orthogonal Procrustes problems.
</blockquote>
'''Difficulty Level:''' Advanced
 
'''Relevant Skills:''' Experience with scientific Python, advanced Numpy, object-oriented programming, and numerical analysis.
 
'''Mentor:''' Fanwang Meng (fwmeng88 at gmail dot com) and Paul Ayers (ayers at mcmaster dot ca).
 
=== Project [175 or 350 hours]: Positive Semi-definite Procrustes (PSDP) Problem===
 
'''Brief Explanation:''' Procrustes ([https://github.com/theochem/procrustes <u>https://github.com/theochem/procrustes</u>]) is a library for finding the optimal transformation that makes two matrices as close as possible to each other. The main idea of this project is to solve the problem of minimization of ||'''PX'''-'''Y''' || where '''P''' is a positive semidefinite matrix and '''X''' and '''Y''' are the input matrices. A basic solution is provided by [<nowiki/>[https://ieeexplore.ieee.org/document/325890 <u>A new algorithm for the positive semi-definite Procrustes problem</u>]].
 
'''Expected Results:'''
 
<blockquote>'''175 hours:''' Extension of Procrustes to include the positive semi-definite algorithm as a new class of Procrustes problems.
 
'''350 hours:''' Consider additional extensions, where the trace, diagonal elements, or other linear constraints of '''P''' are imposed. One useful example of this type of problem is the "closest covariance matrix."
</blockquote>
'''Difficulty Level:''' Advanced
 
'''Relevant Skills:''' Experience with scientific Python, advanced Numpy, object-oriented programming, and numerical analysis.
 
'''Mentors:''' Fanwang Meng (fwmeng88 at gmail dot com) and Paul Ayers (ayers at mcmaster dot ca).
 
=== Project [175 hours] Molecule Alignment with Procrustes Algorithm ===


'''Mentor:''' Toon Verstraelen (Toon.Verstraelen at ugent dot be) and Farnaz Heidar-Zadeh (farnaz.heidarzadeh at queensu dot ca)
'''Brief Explanation:''' Procrustes ([https://github.com/theochem/procrustes <u>https://github.com/theochem/procrustes</u>]) is a library for finding the optimal transformation that makes two matrices as close as possible to each other. Permutation Procrustes methods can be used for molecular alignment ([https://link.springer.com/article/10.1007/s10910-012-0119-2 ''<u>J Math Chem</u>'' <u>(2013) 51:927ΓÇô936</u>]). The goal of this progress is to develop a utility that uses the Procrustes package to perform molecular alignment.


=== Project: Orthogonal Procrustes for Rectangular Matrices ===
'''Expected Results:''' An open-source Python software.
'''Brief Explanation:''' Procrustes (https://github.com/theochem/procrustes) is a library for finding the optimal transformation that makes two matrices as close as possible to each other. Procrustes analysis has numerous applications in object recognition, though our primary interest pertains to its utility for quantifying chemical and physical (dis)similarity of molecular structures. Currently, when two input matrices have different numbers of columns, the smaller matrix is augmented by columns of zeros (zero-padding). An alternative to this artificial approach was recently proposed for the special case of orthogonal transformations [SIAM Journal of Matrix vol. 41, pp. 957-983 (2020)]. The goal of this process is to implement the SCFRTR method (algorithm 5.1) from this reference into the Procrustes library. 
Expected Results: Extension of Procrustes to include the SCFRTR algorithm as an alternative to zero-padding for unbalanced orthogonal Procrustes problems.  


<blockquote>'''175 hours:''' Using the Procrustes package, write a utility that takes two molecular structures and optimizes their alignment. In addition to simply optimizing the structural alignment, provide atom-atom mapping and extensions to more general problems (e.g., multi-molecule alignment).
</blockquote>
'''Difficulty Level:''' Advanced
'''Difficulty Level:''' Advanced


'''Relevant Skills:''' Experience with Python, NumPy, and numerical analysis
'''Relevant Skills:''' Experience with scientific Python, advanced Numpy, and object-oriented programming.
 
'''Mentors:''' Fanwang Meng (fwmeng88 at gmail dot com) and Paul Ayers (ayers at mcmaster dot ca).


'''Mentor:''' Ali Tehrani (19at27 at queensu dot ca), David Kim (david.kim.91 at gmail dot com), Paul Ayers (ayers at mcmaster dot ca)
=== Project [350 hours]: Faster Molecular Integrals with Density-Fitting ===


=== Project: Faster Molecular Integrals with Density-Fitting ===
'''Brief Explanation:''' [https://github.com/theochem/gbasis <u>GBasis</u>] ([https://github.com/theochem/gbasis <u>https://github.com/theochem/gbasis</u>]) is a library for evaluating and analytically integrating Gaussian-type orbitals and their related quantities, especially molecular integrals. In many applications, the computational bottleneck is the evaluation of two-electron integrals, as the number of two-electron integrals grows as the fourth power of the basis-set size. By introducing an auxiliary, density-fitting, basis, this power is reduced to the third power of the basis-set size, which in many cases eliminates the computational bottleneck, since there are often other facets of the computation that scale more severely than this. The goal of this project is to implement density-fitting methods into GBasis.
'''Brief Explanation:''' GBasis (https://github.com/theochem/gbasis) is a library for evaluating and analytically integrating Gaussian-type orbitals and their related quantities, especially molecular integrals. In many applications, the computational bottleneck is the evaluation of two-electron integrals, as the number of two-electron integrals grows as the fourth power of the basis-set size. By introducing an auxiliary, density-fitting, basis, this power is reduced to the third power of the basis-set size, which in many cases eliminates the computational bottleneck, since there are often other facets of the computation that scale more severely than this. The goal of this project is to implement density-fitting methods into GBasis.  


'''Expected Results:''' Extension of GBasis to support density fitting. This involves expanding products of basis functions in the auxiliary basis, evaluating 2-electron integrals in the auxiliary basis, and using these two entities to construct molecular integrals more efficiently.
'''Expected Results:'''


<blockquote>'''350 hours:''' Extension of GBasis to support density fitting. This involves expanding products of basis functions in the auxiliary basis, evaluating 2-electron integrals in the auxiliary basis, and using these two entities to construct molecular integrals more efficiently.
</blockquote>
'''Difficulty Level:''' Intermediate to Advanced
'''Difficulty Level:''' Intermediate to Advanced


'''Relevant Skills:''' Experience with Python, NumPy
'''Relevant Skills:''' Experience with scientific Python, advanced Numpy, and object-oriented programming.
 
'''Mentors:''' Ali Tehrani (alirezatehrani24 at gmail dot com), Gabriela Sanchez Diaz (sanchezg at mcmaster dot ca), and Paul Ayers (ayers at mcmaster dot ca).
 
=== Project [350 hours]: Quantum Theory of Atoms in Molecules (QTAIM) ===
 
'''Brief Explanation:''' QTAIM ([https://www.chemistry.mcmaster.ca/aim/aim_0.html <u>https://www.chemistry.mcmaster.ca/aim/aim_0.html</u>]) uses the molecular electron density to obtain chemical insight by assigning every point in space to an atom in a molecule and identifying line segments with maximum density (bond paths) between atoms. Once a partitioning of space is assigned, one can easily integrate various functions over each partition. The goal of this project is to (a) create functionality that partitions the points on a numerical integration grid, (b) offer the ability to integrate various functions over each partition, and (c) locate bond paths between atoms. The existing code for this task needs to be modified, optimized, and extended (based on available pseudo- and prototype-code).
 
'''Expected Results:'''
 
<blockquote>'''350 hours:''' Implement QTAIM using the QC-Devs software ecosystem (especially IOData, grid, and gbasis) [https://github.com/theochem/chemtools <u>https://github.com/theochem</u>]): Key deliverables include:
</blockquote>
<ol style="list-style-type: decimal;">
<li><blockquote><p>partition the electron density into atomic regions in two different ways.</p></blockquote></li>
<li><blockquote><p>perform numerical integration over each partition.</p></blockquote></li>
<li><blockquote><p>find the bond paths between each partition.</p></blockquote></li></ol>
 
'''Difficulty Level:''' High.
 
'''Relevant Skills:''' Experience with scientific Python, advanced Numpy, and object-oriented programming. Prior knowledge of QTAIM, quantum chemical topology, or quantum chemistry software is helpful.
 
'''Mentors:''' Ali Tehrani (alirezatehrani24 at gmail dot com) and Paul Ayers (ayers at mcmaster dot ca).
 
=== Project [350 hours]: Computing The Pair Density From Wave-function ===
 
'''Brief Explanation:''' The electron pair density represents the probability of observing two electrons at two points in space. It provides key quantitative and qualitative information about electron correlation, as well as qualitative information about chemical bonding and, in particular, about how Lewis structures emerge from quantum mechanics.
 
'''Expected Results:'''
 
<blockquote>'''350 hours:''' To provide a Python function to compute the pair-density using GBasis ([https://github.com/theochem/gbasis <u>https://github.com/theochem/gbasis</u>]) as a Python function, starting from wave-function information that is read with IOData ([https://github.com/theochem/iodata <u>https://github.com/theochem/iodata</u>]). Key indicators like the intracule and extracule should be supported.
</blockquote>
'''Difficulty Level:''' Intermediate
 
'''Relevant Skills:''' Experience with scientific Python, advanced Numpy, and object-oriented programming.


'''Mentor:''' Ali Tehrani (19at27 at queensu dot ca), David Kim (david.kim.91 at gmail dot com), Paul Ayers (ayers at mcmaster dot ca)
'''Mentors:''' Ali Tehrani (alirezatehrani24 at gmail dot com), Gabriela Sanchez Diaz (sanchezg at mcmaster dot ca), and Paul Ayers (ayers at mcmaster dot ca).


== CalcUS Project Ideas ==
== CalcUS Project Ideas ==
Line 282: Line 373:
[https://github.com/cyllab/CalcUS CalcUS] is a platform aiming to democratize access to quantum chemistry by providing a user-friendly web-based interface to simplify running and analyzing quantum mechanical calculations.
[https://github.com/cyllab/CalcUS CalcUS] is a platform aiming to democratize access to quantum chemistry by providing a user-friendly web-based interface to simplify running and analyzing quantum mechanical calculations.


=== Project [Medium]: Improving the web frontend ===
=== Project [175 hours]: Improving the web frontend ===


'''Brief Explanation:''' CalcUS aims to provide all the relevant information from the calculations directly in the web interface, as well as tools to analyze those results. However, some useful elements of the interface are missing or suboptimal. In particular, [https://github.com/jspreadsheet/ce Jspreadsheet] should be implemented to allow data analysis in the browser. Multiple other aspects of the interface could be improved, either related to style of functionalities.
'''Brief Explanation:''' CalcUS aims to provide all the relevant information from the calculations directly in the web interface, as well as tools to analyze those results. However, some useful elements of the interface are missing or suboptimal. In particular, [https://github.com/jspreadsheet/ce Jspreadsheet] should be implemented to allow data analysis in the browser. Multiple other aspects of the interface could be improved, either related to style of functionalities.
Line 290: Line 381:
'''Prerequisites:''' Knowledge of HTML and Javascript and at least some knowledge of Python. Familiarity with JQuery, Django and PostgreSQL is helpful.
'''Prerequisites:''' Knowledge of HTML and Javascript and at least some knowledge of Python. Familiarity with JQuery, Django and PostgreSQL is helpful.


'''Mentor:''' Raphaël Robidas (raphael dot robidas at usherbrooke dot com)
'''Mentor:''' Raphaël Robidas (raphael dot robidas at usherbrooke dot ca)




=== Project [Medium]: Develop large-scale calculation management tools ===
=== Project [175 hours]: Develop large-scale calculation management tools ===


'''Brief Explanation:''' Quantum chemistry projects can involve performing calculations on a large number of structures (10-100) with different parameters. CalcUS should have features to make this process seamless and highly automated, from launching the calculations to reporting the results.
'''Brief Explanation:''' Quantum chemistry projects can involve performing calculations on a large number of structures (10-100) with different parameters. CalcUS should have features to make this process seamless and highly automated, from launching the calculations to reporting the results.
Line 301: Line 392:
'''Prerequisites:''' Knowledge of HTML, Javascript and Python. Familiarity with JQuery, Django and PostgreSQL is helpful.
'''Prerequisites:''' Knowledge of HTML, Javascript and Python. Familiarity with JQuery, Django and PostgreSQL is helpful.


'''Mentor:''' Raphaël Robidas (raphael dot robidas at usherbrooke dot com)
'''Mentor:''' Raphaël Robidas (raphael dot robidas at usherbrooke dot ca)




=== Project [Large]: Implement multi-step calculation protocols ===
=== Project [350 hours]: Implement multi-step calculation protocols ===


'''Brief Explanation:''' Quantum chemistry projects often involve the same series of sequential calculations. Currently, each calculation has to be launched manually, which is often not necessary. This project aims to add the feature to create custom multi-step calculation protocols as well as the underlying mechanics which make the protocols run smoothly.
'''Brief Explanation:''' Quantum chemistry projects often involve the same series of sequential calculations. Currently, each calculation has to be launched manually, which is often not necessary. This project aims to add the feature to create custom multi-step calculation protocols as well as the underlying mechanics which make the protocols run smoothly.
Line 312: Line 403:
'''Prerequisites:''' Knowledge of HTML, Javascript and Python. Familiarity with JQuery, Django and PostgreSQL is helpful.
'''Prerequisites:''' Knowledge of HTML, Javascript and Python. Familiarity with JQuery, Django and PostgreSQL is helpful.


'''Mentor:''' Raphaël Robidas (raphael dot robidas at usherbrooke dot com)
'''Mentor:''' Raphaël Robidas (raphael dot robidas at usherbrooke dot ca)


== ccinput Project Ideas ==
== ccinput Project Ideas ==
Line 318: Line 409:
[https://github.com/cyllab/ccinput ccinput] is a library and standalone tool to create computational chemistry input files.
[https://github.com/cyllab/ccinput ccinput] is a library and standalone tool to create computational chemistry input files.


=== Project [Large]: Add support for NWChem ===
=== Project [350 hours]: Add support for NWChem ===


'''Brief Explanation:''' Implementing the creation of NWChem input files for most of its features.
'''Brief Explanation:''' Implementing the creation of NWChem input files for most of its features.
Line 326: Line 417:
'''Prerequisites:''' Knowledge of Python. Familiarity with quantum chemistry is helpful, but not required.
'''Prerequisites:''' Knowledge of Python. Familiarity with quantum chemistry is helpful, but not required.


'''Mentor:''' Raphaël Robidas (raphael dot robidas at usherbrooke dot com)
'''Mentor:''' Raphaël Robidas (raphael dot robidas at usherbrooke dot ca)


==3Dmol.js Project Ideas==
==3Dmol.js Project Ideas==
Line 332: Line 423:
[http://3dmol.csb.pitt.edu 3Dmol.js] is a modern, object-oriented JavaScript library for visualizing molecular data that is forked from GLmol.  A particular emphasis is placed on performance.
[http://3dmol.csb.pitt.edu 3Dmol.js] is a modern, object-oriented JavaScript library for visualizing molecular data that is forked from GLmol.  A particular emphasis is placed on performance.


=== Project: Improve 3Dmol.js ===
=== Project [175 hours]: More cartoon options for nucleic acids. ===
 
'''Brief explanation:''' Implement additional visualizations of nucleic acids.
 
'''Expected results:''' See https://github.com/3dmol/3Dmol.js/issues/559 
 
'''Prerequisites:''' Experience with JavaScript and client-server programming, some experience with OpenGL/WebGL ideal, but not necessary.
 
'''Mentor:''' David Koes  (dkoes@pitt.edu)
 
 
=== Project [175 or 350 hours]: Improve 3Dmol.js ===


'''Brief explanation:''' Make significant improvements to 3Dmol.js functionality or performance.
'''Brief explanation:''' Make significant improvements to 3Dmol.js functionality or performance.
Line 340: Line 442:
'''Prerequisites:''' Experience with JavaScript and client-server programming, some experience with OpenGL/WebGL ideal, but not necessary.
'''Prerequisites:''' Experience with JavaScript and client-server programming, some experience with OpenGL/WebGL ideal, but not necessary.


'''Mentor:''' David Koes l (dkoes@pitt.edu)
'''Mentor:''' David Koes (dkoes@pitt.edu)


==gnina Project Ideas==
==gnina Project Ideas==
Line 346: Line 448:
[https://github.com/gnina gnina] is a C/C++ framework for applying deep learning to molecular docking.
[https://github.com/gnina gnina] is a C/C++ framework for applying deep learning to molecular docking.


=== Project: Improve gnina ===
=== Project [175 or 350 hours]: Improve gnina ===


'''Brief explanation:'''  Make significant improvements to gnina functionality or performance.
'''Brief explanation:'''  Make significant improvements to gnina functionality or performance.
Line 357: Line 459:


== DeepChem Project Ideas ==
== DeepChem Project Ideas ==
[https://deepchem.io DeepChem] aims to provide a high quality open-source toolchain that democratizes the use of deep-learning in drug discovery, materials science, quantum chemistry, and biology. Additional project ideas are discussed at https://forum.deepchem.io/t/google-summer-of-code-ideas/356.
[https://deepchem.io DeepChem] aims to provide a high quality open-source toolchain that democratizes the use of deep-learning in drug discovery, materials science, quantum chemistry, and biology. Additional project ideas are discussed at https://forum.deepchem.io/t/brainstorming-gsoc-2022-topics/658.


=== Project: PyTorch Lightning Implementation ===
=== Project [350 hours]: PyTorch Lightning Implementation ===
   
   
'''Brief explanation:''' Allow for implementation of DeepChem models in PyTorch Lightning.
'''Brief explanation:''' Allow for implementation of DeepChem models in PyTorch Lightning.
Line 369: Line 471:
'''Mentor:''' Bharath Ramsundar (bharath at deepforestsci dot com)
'''Mentor:''' Bharath Ramsundar (bharath at deepforestsci dot com)


=== Project: Semiconductor Modeling Support ===
=== Project [350 hours]: Layer Documentation ===
'''Brief explanation:''' DeepChem is moving towards a concept of first class layers. Improving the documentation for existing layers will help us make our current collection of layers more useful for the community.
'''Expected results:''' This project should also add a tutorial for using the layers to the DeepChem tutorial series, and should plan to add a few new layers as well.
'''Prerequisites:''' PyTorch/TensorFlow, Python
'''Mentor:''' Bharath Ramsundar (bharath at deepforestsci dot com)
 
=== Project [350 hours]: PyTorch Porting ===
'''Brief explanation:'''  DeepChem is shifting towards using PyTorch as its primary backend, but many models are still implemented in TensorFlow. A good project could be to pick a TensorFlow model or two, then port its layers and model into PyTorch along with suitable unit tests. 
'''Expected results:''' At least one model should be ported from TensorFlow to PyTorch successfully with associated unit tests. See See https://github.com/deepchem/deepchem/issues/2863
'''Prerequisites:''' PyTorch/TensorFlow, Python
'''Mentor:''' Bharath Ramsundar (bharath at deepforestsci dot com)
 
=== Project [350 hours]: HuggingFace Integration ===
'''Brief explanation:'''  HuggingFace Integration: Last year, we had a few student projects explore HuggingFace/DeepChem integration, but these projects were not able to merge in HuggingFace models into DeepChem.
'''Expected results:''' This project would create a working HuggingFace model in DeepChem along with tutorials on how to use HuggingFace with DeepChem. 
'''Prerequisites:''' PyTorch/TensorFlow, Python
'''Mentor:''' Bharath Ramsundar (bharath at deepforestsci dot com)
 
=== Project [350 hours]: Improved PINNs Support ===
'''Brief explanation:''' Improving our PINNs Support: One of the exciting new features in DeepChem 2.6.0 is support for PINNs, a class of techniques to solve PDEs with neural networks. The API for this class is still rudimentary and supports only a limited class of models and requires handcoding the loss.
'''Expected results:''' Extend the API to allow for a broader class of PDEs to be implemented. I’d suggest using Schrodinger’s equation as a test since Schrodinger can be solved in 1D as a toy and extended to arbitrarily high dimensions for larger molecules.
'''Prerequisites:''' PyTorch/TensorFlow, Python
'''Mentor:''' Bharath Ramsundar (bharath at deepforestsci dot com)
 
=== Project [350 hours]: Improved Equivariance Support ===
   
   
'''Brief explanation:''' Add support for semiconductor modeling deep learning tools.
'''Brief explanation:''' Improve Equivariant Support: DeepChem has no support for equivariant models. Given the increasing importance of equivariance for scientific machine learning this is a major oversight.  
   
   
'''Expected results:''' This project would involve implementing semiconductor models from https://arxiv.org/ftp/arxiv/papers/2101/2101.04383.pdf. These models should be added to DeepChem along with suitable tests, and a suitable jupyter notebook usage tutorial.
'''Expected results:''' This project would aim to add a tutorial about equivariant modeling and add an equivariant model to DeepChem. You may want to use e3nn or another library to facilitate implementation.
   
   
'''Prerequisites:''' PyTorch/TensorFlow, Python
'''Prerequisites:''' PyTorch/TensorFlow, Python
Line 379: Line 521:
'''Mentor:''' Bharath Ramsundar (bharath at deepforestsci dot com)
'''Mentor:''' Bharath Ramsundar (bharath at deepforestsci dot com)


=== Project [350 hours]: Improved Antibody Support ===
'''Brief explanation:''' Improving Antibody Support: DeepChem at present doesn’t have much tooling or support for working with anbtibodies.
'''Expected results:''' This project would add suitable antibody datasets to MoleculeNet and create a tutorial walking users through antibody design and modeling with DeepChem. If necessary, students may add antibody-specific models as well.
'''Prerequisites:''' PyTorch/TensorFlow, Python
'''Mentor:''' Bharath Ramsundar (bharath at deepforestsci dot com)


== Miscellaneous Project Ideas ==
== Miscellaneous Project Ideas ==
Line 385: Line 536:




===Project: OneMol: Google Docs & YouTube for Molecules ===
===Project [350 hours]: OneMol: Google Docs & YouTube for Molecules ===
[[File:OneMolsm.png|right]]
[[File:OneMolsm.png|right]]
'''Brief explanation:''' There is a huge need in the research community for improved collaboration tools on web and desktop. OneMol will provide an open API for collaborating on molecular data that both Avogadro and 3Dmol.js will support as reference implementations. OneMol compliant applications will be able to manipulate and view molecular data in real time so that changes made by one client will be propagated to other clients.
'''Brief explanation:''' There is a huge need in the research community for improved collaboration tools on web and desktop. OneMol will provide an open API for collaborating on molecular data that both Avogadro and 3Dmol.js will support as reference implementations. OneMol compliant applications will be able to manipulate and view molecular data in real time so that changes made by one client will be propagated to other clients.

Latest revision as of 16:20, 10 March 2022

Guidelines

Open Chemistry is an umbrella for projects in chemistry, materials science, biochemistry, and related areas. While we have participated in the last few Google Summer of Code programs and will apply again in 2022, there is no guarantee that we will be selected again for GSoC in 2022.

One important factor is that GSoC in 2022 will include both shorter projects (~175 hours) and longer projects (~350 hours). You should consider the appropriate timeline for your project proposal. We have indicated in the project totals where we suggest particular lengths.

If you are unsure of the scope of a project, please reach out and discuss BEFORE the proposal deadline.

When possible, submitting drafts a week or more in dance of the proposal deadline is preferred because we can make suggestions towards your proposal.

We have gathered a pool of interested mentors together who are seasoned developers in each of these projects. We welcome original ideas in addition to what's listed here - please suggest something interesting for open source chemistry!

Adding Ideas

When adding a new idea to this page, please try to include the following information:

  • Size of the project (~175 hours of work, ~6 weeks) or (~350 hours of work, ~12 weeks)
  • A brief explanation of the idea.
  • Expected results/feature additions.
  • Any prerequisites for working on the project.
  • Links to any further information, discussions, bug reports etc.
  • Any special mailing lists if not the standard mailing list for the project
  • Your name and email address for contact (if willing to mentor, or nominated mentor).

Proposal Guidelines

Students need to write and submit a proposal, we have added the applying to GSoC page to help guide our students on what we would like to see in those proposals.

Avogadro 2 Project Ideas

Avogadro 2 is a chemical editor and visualization application, it is also a set of reusable software libraries written in C++ using principles of modularity for maximum reuse. We offer permissively licensed, open source, cross platform software components in the Avogadro 2 libraries, along with an end-user application with full source code, and binaries.

Project [350 hours]: Scripting Bindings

Brief explanation: Implement an embedded scripting language (i.e., Python) in Avogadro 2

Expected results: Enable an embedded scripting console as well as support for implementing modular extensions (tools, rendering, etc.) in Python. Python bindings exist, using PyBind11 with the new codebase, and the Avogadro 2 core libraries are pip installable. Extending the coverage of the API from the rudimentary parts of core/io would be a good starting point. An ideal solution would connect to PySide, to allow scripting to add UI like menu items, windows, etc. and provide documentation and example scripts. The interface should be maintainable as new classes and methods are added.

Example scripts, documentation, are highly encouraged.

Prerequisites: Experience in C++ and Python, some experience with PyBind11, Qt for Python, PySide suggested.

Mentor: Geoff Hutchison (geoffh at pitt dot edu) or Marcus D. Hanwell (mhanwell at bnl dot gov)

Project [175 hours]: Integrate with RDKit

Brief explanation: Integrate the RDKit toolkit into Avogadro for conformer sampling and force field optimization

Expected results: RDKit is a BSD-licensed cheminformatics toolkit with a wide range of features useful for Avogadro 2. Most notably, RDKit offers efficient and accurate 3D coordinate generation, conformer sampling, and force field optimization. Implement a connection between Avogadro objects (molecules and atoms) and RDKit objects and implement conformer sampling and force field optimization code.

'Prerequisites: Experience in C++, some experience with Python will be helpful.

Mentor: Geoff Hutchison (geoffh at pitt dot edu)

Project [175 or 350 hours]: Tools for Interactive Molecular Dynamics

Brief explanation: Building solvent boxes, implementing standard molecular dynamics using in-progress optimization framework. The scope could be 175 or 350 hours - please discuss what scale project you have in mind.

Expected results: Avogadro (v1) has interactive force field optimization allowing building and manipulation (e.g., push-pull atoms into position). Some users call this 'video game mode' ;-) A new optimization framework is in progress, including calling external programs for energies and forces. The project would enable building out MD simulations, including tools to add water or solvent boxes, build larger systems (e.g., via PackMol integration) and implement simple MD integration and thermostats.

'Prerequisites: Experience in C++, ideally with knowledge of molecular dynamics methods and tools. Some Python would be helpful

Mentor: Geoff Hutchison (geoffh at pitt dot edu)

Project [350 hours]: Efficient Molecular Surfaces / Orbitals

Brief explanation: Generating and rendering molecular surfaces is a common task, from solvent-accessible and solvent-excluded surfaces to molecular orbitals, electron density, spin density, etc.

Expected results: An efficient multi-threaded or GPU-enabled surface generation and rendering framework for Avogadro, including mapping properties as color maps onto the surface. Ideally, this would include integration with features of QC-Devs and other packages for calculating various properties or surfaces and/or rendering them for animations.

'Prerequisites: Experience in C++, ideally with knowledge of OpenGL shaders. Some understanding of quantum chemistry would be helpful.

Mentor: Geoff Hutchison (geoffh at pitt dot edu)


Open Babel Project Ideas

Open Babel is an open toolbox for chemistry, designed to speak the many languages of chemical data. It's an open, collaborative project allowing anyone to search, convert, analyze, or store data from molecular modeling, chemistry, solid-state materials, biochemistry, or related areas.

Project [175 hours]: Integrate CoordGen library

Expected results: Schrodinger has released a BSD-licensed library for 2D chemical structure layout (https://github.com/schrodinger/coordgenlibs) and it has been successfully integrated into RDKit. The student will be responsible for integrating CoordGen into Open Babel. Code will be written in C++.

Mentor: Geoff Hutchison (geoffh at pitt dot edu)

Project [175 hours]: Implement MMTF format

Brief explanation: Implementation of MMTF file format in OpenBabel.

Expected results:' Macromolecular Transmission Format (MMTF) is a new compact binary format to transmit and store biomolecular structural data quickly and accurately (http://mmtf.rcsb.org). Your task is to implement support for this format in the OpenBabel open-source cheminformatics toolkit (http://openbabel.org). Code will be written in C++.

Mentor: Geoff Hutchison (geoffh at pitt dot edu) or David Koes (dkoes at pitt dot edu)

Project [175 hours]: Test Framework Overhaul

Brief explanation: Automated testing is an important part of maintaining code quality. This project will improve the current testing regime of openbabel.

Expected results: A comprehensive test framework that automates the generation of unit tests for all supported languages and simplifies the creation of new test cases will be implemented. The student will be responsible for choosing the most appropriate framework, porting existing test cases, and expanding the test suite to enhance code coverage.

Prerequisites: Experience in C++. Knowledge of modern software engineering practices or test frameworks is ideal.

Mentor: Geoff Hutchison (geoffh at pitt dot edu), David Koes (dkoes at pitt dot edu), the OpenBabel development community.

Project [350 hours]: Develop a JavaScript version of Open Babel

Brief explanation: Building on existing work, you will use Emscripten to compile the C++ codebase of Open Babel to JavaScript. This will make it easy to write in-browser applications that need cheminformatics functionality.

Expected results: Following from work described in a recent paper (https://pubs.acs.org/doi/abs/10.1021/acs.jcim.7b00434), a JavaScript version of the Open Babel toolkit will be created. The generation of any necessary wrappers should be automated to allow it to track changes in the Open Babel API.

Ideally, the project will adapt a core JavaScript library openbabel.js that allows modules, such as file formats to be imported separately (e.g., smilesformat.js, pdbformat.js, xyzformat.js, etc.)

Prerequisities: Some experience in C++, and also with JavaScript.

Mentor: Noel O'Boyle (baoilleach at gmail dot com)

Project [350 hours]: Develop a validation and standardization filter

Brief explanation: Given a particular molecular structure, can we say how chemically plausible is it, and use this as to filter or warn about problems (e.g., undefined stereo centers)?

Expected results: Given a set of reference structures (e.g. ChEMBL), it should be possible to build a model that can say how normal/unusual a query structure is. For example, given a set of drug-like molecules, a molecule with a ruthenium atom might be considered unusual; or given any set of molecules, a 5-coordinate carbon is unusual.

Such a model could be used as a filter, or as a warning to flag up problematic structures.

Code could be modeled on MolVS using RDKit [[1]]

Prerequisites: Experience in C++ or Python, and an interest in data science or statistics.

Mentor: Noel O'Boyle (baoilleach at gmail dot com) or Geoff Hutchison (geoffh at pitt dot edu)

cclib Project Ideas

cclib is an open source library, written in Python, for parsing and interpreting the results of computational chemistry packages. The goals of cclib are centered around the reuse of data obtained from these programs when stored in program-specific output files.

Project: [175 or 350 hours] Implement new parsers

Brief explanation: There are outstanding issues on GitHub for supporting more programs (e.g. CFOUR, xtb, NBO, GAMESS dat, MRCC, DIRAC), and parsing binary files for various QM programs (e.g. Gaussian, NWChem, and ORCA). There may also be more programs missing that haven't been considered.

Expected results: Implement parsers for one or more new programs/formats, generate test data, and write unit and regression tests for each parser.

Prerequisites: Experience with Python, basic familiarity with computational chemistry programs, and access to the program(s) needed to generate the test data.

Mentors: Eric Berquist (eric.john.berquist at gmail dot com) and/or Shiv Upadhyay (shivnupadhyay at gmail dot com) and/or Adam Tenderholt (atenderholt at gmail dot com) and/or Karol Langner (karol.langner at gmail dot com)

Project: [175 or 350 hours] Implement new bridges

Brief explanation: There are outstanding issues on GitHub for more integrations with external programs (e.g. chemfiles, RDKit) via their Python bindings. There may also be more programs missing that haven't been considered.

Expected results: Implement bridges for one or more new programs, along with writing unit tests and documentation for each bridge.

Prerequisites: Experience with Python and ideally familiarity with the program that is being bridged.

Mentors: Eric Berquist (eric.john.berquist at gmail dot com) and/or Shiv Upadhyay (shivnupadhyay at gmail dot com) and/or Adam Tenderholt (atenderholt at gmail dot com) and/or Karol Langner (karol.langner at gmail dot com)

Project [350 hours]: Implement new methods

Brief explanation: There are outstanding issues on GitHub for more analysis methods being added directly to cclib (e.g. calculating geometric parameters). There may also be other methods that are desirable to include which haven't been considered.

Expected results: Implement one or more new methods, along with writing unit tests and documentation for each method.

Prerequisites: Experience with Python and familiarity with the method(s) being added, depending on the complexity of the method.

Mentors: Eric Berquist (eric.john.berquist at gmail dot com) and/or Shiv Upadhyay (shivnupadhyay at gmail dot com) and/or Adam Tenderholt (atenderholt at gmail dot com) and/or Karol Langner (karol.langner at gmail dot com)

Project [350 hours]: Julia bindings

Brief explanation: The Julia programming language (https://julialang.org/) is growing in popularity for computational chemistry as a language that both production-level computation and analysis can be performed in seamlessly. In order to analyze computational chemistry outputs from traditional programs in Julia, rather than reimplement all cclib functionality in Julia, we should be able to call cclib from Julia directly and reuse its core functionality.

Expected results: Julia bindings to cclib IO functionality and a Julia-native representation of cclib data objects, with each cclib attribute accessible as a native Julia type. The bindings should be available on the default Julia package registry. The remainder of the project is more open-ended, but an example application of using the bindings would be ideal.

Prerequisites: Experience with Python and/or Julia, and ideally some familiarity with important quantities from computational chemistry outputs.

Mentors: Eric Berquist (eric.john.berquist at gmail dot com)

Project [350 hours]: Additional visualization for OpenChemVault

Brief explanation: OpenChemVault (https://github.com/cclib/openchemvault) is capable of parsing output files, storing them, and displaying geometries, but any sort of additional visualization (such as plotting molecular orbitals or spectra) is missing. The capabilities of GaussSum (http://gausssum.sourceforge.net/) are a possible starting point.

Expected results: Implement one or more new visualizations for the OpenChemVault web interface.

Prerequisites: Experience with Python common visualizations that are desirable for computational chemistry outputs. No previous experience with JavaScript is necessary.

Mentors: Eric Berquist (eric.john.berquist at gmail dot com) and/or Shiv Upadhyay (shivnupadhyay at gmail dot com) and/or Adam Tenderholt (atenderholt at gmail dot com) and/or Karol Langner (karol.langner at gmail dot com)

RDKit Project Ideas

The RDKit is a BSD licensed open source cheminformatics toolkit written in C++ with wrappers for use from Python, Java, C#, and JavaScript. The RDKit also provides "cartridge" functionality that allows chemical searching in the open-source relational database PostgreSQL.

Project [175 hours]: Port xyz2mol to the RDKit core

Brief explanation: Assignment of bond orders to molecules where only atomic coordinates are available is a challenging problem. The xyz2mol package from Prof. Jan H. Jensen's research group in Denmark, https://github.com/jensengroup/xyz2mol, is a robust and well-tested solution to the problem. The goal of this project is to port the xyz2mol code from Python to C++ and integrate it into the core RDKit. Jan Jensen will help us on this project by answering questions and providing advice on the re-implementation.

Expected results: A C++ implementation of the xyz2mol code along with a robust set of test cases. Wrappers for the calculator so that it is accessible from within the Python and SWIG (Java and C#) wrappers.

Prerequisites: C++

Mentor: Joey Storer (JWStorer at dow.com)


Project [350 hours]: Implement Molecular Interaction Fields calculations in the RDKit

Brief explanation: There is an old PR for the RDKit that implements molecular interaction fields: https://github.com/rdkit/rdkit/pull/318. This was never merged because the author ran out of time. At this point a lot of work would be required to update and finish this PR, but the results would be super useful for the RDKit community.

Expected results: A C++ implementation of the GRID calculator code along with a robust set of test cases. Wrappers for the calculator so that it is accessible from within the Python and SWIG (Java and C#) wrappers.

Prerequisites: C++

Mentor: Greg Landrum (greg.landrum at t5informatics dot com)

Project [350 hours]: RDKit+OpenMM GPU Molecular Force Fields

Brief explanation: OpenMM (http://openmm.org/) is a high-performance toolkit for force-field based molecular simulation that includes GPU and CPU support. The goal of this project is to make it easy to use OpenMM force fields to minimize the energies of or perform molecular dynamics calculations on RDKit molecules.

Expected results: OpenMM supports a wide range of force fields, but not the classical MMFF94 or UFF methods implemented in RDKit. Needed is C++ functionality allowing RDKit molecules to be sent to OpenMM for minimization and/or to perform molecular dynamics. A robust set of regression tests for this functionality. Python wrappers around the new functionality. The work would likely involve completing the MMFF94 implementation described by Paolo Tosco at the 2017 RDKit UGM (https://github.com/rdkit/UGM_2017/blob/master/Presentations/Tosco_RDKit_OpenMM_integration.pdf) and extending to other force fields like UFF. Another approach is the small-molecule support used by OpenFF: https://github.com/openmm/openmmforcefields

Prerequisites: C++ and some Python

Mentor: TBA, likely Geoff Hutchison (geoffh at pitt.edu) and others

QC-Devs Project Ideas

QC-Devs (https://qcdevs.org/) develops various free, open-source, and cross-platform libraries for scientific computing, especially theoretical and computational chemistry. Our goal is to make programming accessible to chemists and promote precepts of sustainable software development. The two main pieces of the QC-Devs ecosystem are:

All our repositories are hosted on Theochem organization (https://github.com/theochem) on GitHub.

Project [175 hours or 350 hours]: Visualization of Molecular Structure and Reactivity

Brief Explanation: ChemTools (https://github.com/theochem/chemtools) is a post-processing library for extracting chemical insight from quantum chemistry calculations. Currently, ChemTools relies on Visual Molecular Dynamics (VMD) and Matplotlib for visualization. ChemTools has the functionality to generate visualization scripts for VMD, so the user can easily generate informative plots like iso-surface of electron density colored by electrostatic potential. Visualization of (annotated) molecular structures and molecular structure changes along reaction pathways are also of interest, but the implementations are unpolished.

Expected Results:

175 hours: Add functionality to ChemTools to generate visualization scripts for ChimeraX (https://www.cgl.ucsf.edu/chimerax/). The current functionality for VMD can be used as a template.

350 hours: Add ChemTools as a back-end for SEQCROW (https://cxtoolshed.rbvi.ucsf.edu/apps/seqcrow), a free and open-source bundle (https://github.com/QChASM/SEQCROW) in the ChimeraX toolshed (https://cxtoolshed.rbvi.ucsf.edu/) for building molecules and interacting with the output of quantum chemistry calculations.

Difficulty Level: Intermediate (175) to High-Intermediate (350)

Relevant Skills: Experience with Python, visualization, and software interfacing (350)

Mentors: Ali Tehrani (alirezatehrani24 at gmail dot com), Gabriela Sanchez Diaz (sanchezg at mcmaster dot ca), Farnaz Heidar-Zadeh (farnaz.heidarzadeh at queensu dot ca), and Esteban Vohringer-Martinez (estebanvohringer at qcmmlab dot com).

Project [175 or 350 hours]: Extended interoperability of GOpt and Quantum Chemistry Software

Brief Explanation: ChemTools (https://github.com/theochem/chemtools) is a post-processing library for extracting chemical insight from quantum chemistry calculations. Currently, ChemTools relies on modules of the HORTON library to compute the basic quantities required for its analysis. The goal of this project is to extend the interoperability of ChemTools, so that it can use the Psi4 (https://github.com/psi4) & PySCF (https://github.com/pyscf/pyscf) packages and take advantage of their features.

Expected Results:

175 hours: Writing wrappers for Psi4 or PySCF to compute various quantum mechanical properties and provide those properties to ChemTools for further analysis. The current wrappers for HORTON can be used as a template. Both Psi4 & PySCF have Python interfaces.

350 hours: Writing wrappers for Psi4 and PySCF.

Difficulty Level: Intermediate

Relevant Skills: Experience with scientific Python, advanced Numpy, object-oriented programming, and knowledge of quantum chemistry software

Mentors: Gabriela Sanchez Diaz (sanchezg at mcmaster dot ca), Ali Tehrani (alirezatehrani24 at gmail dot com), and Farnaz Heidar-Zadeh (farnaz.heidarzadeh at queensu dot ca).

Project [175 hours]: Extended Periodic Table of Atomic Properties

Brief Explanation: A database of atomic properties is often used to estimate molecular properties via group additivity rules. While several "periodic table" databases exist, most of them lack data about excited states and charged atoms, and certainly about exotic species (e.g., excited states of highly charged species). Most also lack information about local properties (e.g., the electron density and local property densities).

AtomDB (https://github.com/theochem/AtomDB) provides data for global and local properties, from experiment, theory, and computation. As there are many ways to calculate these properties, and not all properties are accessible from all calculations, it is important that the database be adaptable, so that data can easily be added from new computational and theoretical models. Setting up and processing such calculations by hand (for different elements, ions, spin states, ...) is extremely tedious and error-prone.

Expected Results:

  1. Extension of AtomDB with a workflow for generating/storing atomic properties from new calculations, including (spherically-averaged) atomic property densities. This entails generating input files for existing quantum chemistry codes together with a suitable job script to execute the calculations on an HPC and (ii) processing the outputs of these calculations. This workflow will be implemented using other packages in the HORTON project, such as IOData, Grid, and GBasis. (See https://github.com/theochem)

  2. Utilities for generating molecular properties and property densities using various atom/group additivity rules.

Difficulty Level: Intermediate

Relevant Skills: Experience with Python, NumPy, object-oriented programming, quantum chemistry on hight performance computing (HPC)

Mentors: Gabriela Sanchez Diaz (sanchezg at mcmaster dot ca), Ali Tehrani (alirezatehrani24 at gmail dot com), Paul Ayers (ayers at mcmaster dot ca), and Farnaz Heidar-Zadeh (farnaz.heidarzadeh at queensu dot ca).

Project [175 hours]: Orthogonal Procrustes for Rectangular Matrices

Brief Explanation: Procrustes (https://github.com/theochem/procrustes) is a library for finding the optimal transformation that makes two matrices as close as possible to each other. Procrustes analysis has numerous applications in object recognition, though our primary interest pertains to its utility for quantifying chemical and physical (dis)similarity of molecular structures. Currently, when two input matrices have different numbers of columns, the smaller matrix is augmented by columns of zeros (zero-padding). An alternative to this artificial approach was recently proposed for the special case of orthogonal transformations [SIAM Journal of Matrix vol. 41, pp. 957-983 (2020)]. The goal of this process is to implement the SCFRTR method (algorithm 5.1) into the Procrustes library.

Expected Results:

175 hours: Extension of Procrustes to include the SCFRTR algorithm as an alternative to zero-padding for unbalanced orthogonal Procrustes problems.

Difficulty Level: Advanced

Relevant Skills: Experience with scientific Python, advanced Numpy, object-oriented programming, and numerical analysis.

Mentor: Fanwang Meng (fwmeng88 at gmail dot com) and Paul Ayers (ayers at mcmaster dot ca).

Project [175 or 350 hours]: Positive Semi-definite Procrustes (PSDP) Problem

Brief Explanation: Procrustes (https://github.com/theochem/procrustes) is a library for finding the optimal transformation that makes two matrices as close as possible to each other. The main idea of this project is to solve the problem of minimization of ||PX-Y || where P is a positive semidefinite matrix and X and Y are the input matrices. A basic solution is provided by [A new algorithm for the positive semi-definite Procrustes problem].

Expected Results:

175 hours: Extension of Procrustes to include the positive semi-definite algorithm as a new class of Procrustes problems.

350 hours: Consider additional extensions, where the trace, diagonal elements, or other linear constraints of P are imposed. One useful example of this type of problem is the "closest covariance matrix."

Difficulty Level: Advanced

Relevant Skills: Experience with scientific Python, advanced Numpy, object-oriented programming, and numerical analysis.

Mentors: Fanwang Meng (fwmeng88 at gmail dot com) and Paul Ayers (ayers at mcmaster dot ca).

Project [175 hours] Molecule Alignment with Procrustes Algorithm

Brief Explanation: Procrustes (https://github.com/theochem/procrustes) is a library for finding the optimal transformation that makes two matrices as close as possible to each other. Permutation Procrustes methods can be used for molecular alignment (J Math Chem (2013) 51:927ΓÇô936). The goal of this progress is to develop a utility that uses the Procrustes package to perform molecular alignment.

Expected Results: An open-source Python software.

175 hours: Using the Procrustes package, write a utility that takes two molecular structures and optimizes their alignment. In addition to simply optimizing the structural alignment, provide atom-atom mapping and extensions to more general problems (e.g., multi-molecule alignment).

Difficulty Level: Advanced

Relevant Skills: Experience with scientific Python, advanced Numpy, and object-oriented programming.

Mentors: Fanwang Meng (fwmeng88 at gmail dot com) and Paul Ayers (ayers at mcmaster dot ca).

Project [350 hours]: Faster Molecular Integrals with Density-Fitting

Brief Explanation: GBasis (https://github.com/theochem/gbasis) is a library for evaluating and analytically integrating Gaussian-type orbitals and their related quantities, especially molecular integrals. In many applications, the computational bottleneck is the evaluation of two-electron integrals, as the number of two-electron integrals grows as the fourth power of the basis-set size. By introducing an auxiliary, density-fitting, basis, this power is reduced to the third power of the basis-set size, which in many cases eliminates the computational bottleneck, since there are often other facets of the computation that scale more severely than this. The goal of this project is to implement density-fitting methods into GBasis.

Expected Results:

350 hours: Extension of GBasis to support density fitting. This involves expanding products of basis functions in the auxiliary basis, evaluating 2-electron integrals in the auxiliary basis, and using these two entities to construct molecular integrals more efficiently.

Difficulty Level: Intermediate to Advanced

Relevant Skills: Experience with scientific Python, advanced Numpy, and object-oriented programming.

Mentors: Ali Tehrani (alirezatehrani24 at gmail dot com), Gabriela Sanchez Diaz (sanchezg at mcmaster dot ca), and Paul Ayers (ayers at mcmaster dot ca).

Project [350 hours]: Quantum Theory of Atoms in Molecules (QTAIM)

Brief Explanation: QTAIM (https://www.chemistry.mcmaster.ca/aim/aim_0.html) uses the molecular electron density to obtain chemical insight by assigning every point in space to an atom in a molecule and identifying line segments with maximum density (bond paths) between atoms. Once a partitioning of space is assigned, one can easily integrate various functions over each partition. The goal of this project is to (a) create functionality that partitions the points on a numerical integration grid, (b) offer the ability to integrate various functions over each partition, and (c) locate bond paths between atoms. The existing code for this task needs to be modified, optimized, and extended (based on available pseudo- and prototype-code).

Expected Results:

350 hours: Implement QTAIM using the QC-Devs software ecosystem (especially IOData, grid, and gbasis) https://github.com/theochem): Key deliverables include:

  1. partition the electron density into atomic regions in two different ways.

  2. perform numerical integration over each partition.

  3. find the bond paths between each partition.

Difficulty Level: High.

Relevant Skills: Experience with scientific Python, advanced Numpy, and object-oriented programming. Prior knowledge of QTAIM, quantum chemical topology, or quantum chemistry software is helpful.

Mentors: Ali Tehrani (alirezatehrani24 at gmail dot com) and Paul Ayers (ayers at mcmaster dot ca).

Project [350 hours]: Computing The Pair Density From Wave-function

Brief Explanation: The electron pair density represents the probability of observing two electrons at two points in space. It provides key quantitative and qualitative information about electron correlation, as well as qualitative information about chemical bonding and, in particular, about how Lewis structures emerge from quantum mechanics.

Expected Results:

350 hours: To provide a Python function to compute the pair-density using GBasis (https://github.com/theochem/gbasis) as a Python function, starting from wave-function information that is read with IOData (https://github.com/theochem/iodata). Key indicators like the intracule and extracule should be supported.

Difficulty Level: Intermediate

Relevant Skills: Experience with scientific Python, advanced Numpy, and object-oriented programming.

Mentors: Ali Tehrani (alirezatehrani24 at gmail dot com), Gabriela Sanchez Diaz (sanchezg at mcmaster dot ca), and Paul Ayers (ayers at mcmaster dot ca).

CalcUS Project Ideas

CalcUS is a platform aiming to democratize access to quantum chemistry by providing a user-friendly web-based interface to simplify running and analyzing quantum mechanical calculations.

Project [175 hours]: Improving the web frontend

Brief Explanation: CalcUS aims to provide all the relevant information from the calculations directly in the web interface, as well as tools to analyze those results. However, some useful elements of the interface are missing or suboptimal. In particular, Jspreadsheet should be implemented to allow data analysis in the browser. Multiple other aspects of the interface could be improved, either related to style of functionalities.

Expected Results: Replace the current spreadsheet for Jspreadsheet and configure it, implement data loading from the database (PostgreSQL) and saving/download of the spreadsheet; customize elements of the UI such as alerts, error pages; keep the web pages as responsive as possible; generally improve the code and fix encountered bugs.

Prerequisites: Knowledge of HTML and Javascript and at least some knowledge of Python. Familiarity with JQuery, Django and PostgreSQL is helpful.

Mentor: Raphaël Robidas (raphael dot robidas at usherbrooke dot ca)


Project [175 hours]: Develop large-scale calculation management tools

Brief Explanation: Quantum chemistry projects can involve performing calculations on a large number of structures (10-100) with different parameters. CalcUS should have features to make this process seamless and highly automated, from launching the calculations to reporting the results.

Expected Results: Create a variation of the calculation web UI, aimed specifically at batch calculations with variable parameters, design and implement the workflow to handle these batch calculations, implement results gathering and reporting in a convenient format, write relevant unit and/or integration tests.

Prerequisites: Knowledge of HTML, Javascript and Python. Familiarity with JQuery, Django and PostgreSQL is helpful.

Mentor: Raphaël Robidas (raphael dot robidas at usherbrooke dot ca)


Project [350 hours]: Implement multi-step calculation protocols

Brief Explanation: Quantum chemistry projects often involve the same series of sequential calculations. Currently, each calculation has to be launched manually, which is often not necessary. This project aims to add the feature to create custom multi-step calculation protocols as well as the underlying mechanics which make the protocols run smoothly.

Expected Results: Add an interface to create multi-step protocols, create the data structures to store these protocols and their progress, integrate the automated launch of subsequent steps using the current calculation handling code, add simple verifications after each step completion, write relevant unit and/or integration tests.

Prerequisites: Knowledge of HTML, Javascript and Python. Familiarity with JQuery, Django and PostgreSQL is helpful.

Mentor: Raphaël Robidas (raphael dot robidas at usherbrooke dot ca)

ccinput Project Ideas

ccinput is a library and standalone tool to create computational chemistry input files.

Project [350 hours]: Add support for NWChem

Brief Explanation: Implementing the creation of NWChem input files for most of its features.

Expected Results:' Implementing the creation of NWChem input files which follow the correct structures, implementing support of various keywords and modifiers, allowing the use of the Basis Set Exchange data, adding the relevant static data about NWChem (supported methods, solvents, etc.), creation of extensive unit tests for all features, writing the documentation.

Prerequisites: Knowledge of Python. Familiarity with quantum chemistry is helpful, but not required.

Mentor: Raphaël Robidas (raphael dot robidas at usherbrooke dot ca)

3Dmol.js Project Ideas

3Dmol.js is a modern, object-oriented JavaScript library for visualizing molecular data that is forked from GLmol. A particular emphasis is placed on performance.

Project [175 hours]: More cartoon options for nucleic acids.

Brief explanation: Implement additional visualizations of nucleic acids.

Expected results: See https://github.com/3dmol/3Dmol.js/issues/559

Prerequisites: Experience with JavaScript and client-server programming, some experience with OpenGL/WebGL ideal, but not necessary.

Mentor: David Koes (dkoes@pitt.edu)


Project [175 or 350 hours]: Improve 3Dmol.js

Brief explanation: Make significant improvements to 3Dmol.js functionality or performance.

Expected results: This is an open-ended project that must be driven by the applicant. A strong proposal will identify significant shortcomings in the current code and explain how it will be addressed. The GitHub Issues page may provide some ideas. A proposal must include a significant initial pull request.

Prerequisites: Experience with JavaScript and client-server programming, some experience with OpenGL/WebGL ideal, but not necessary.

Mentor: David Koes (dkoes@pitt.edu)

gnina Project Ideas

gnina is a C/C++ framework for applying deep learning to molecular docking.

Project [175 or 350 hours]: Improve gnina

Brief explanation: Make significant improvements to gnina functionality or performance.

Expected results: This is an open-ended project that must be driven by the applicant. A strong proposal will identify significant shortcomings in the current code and explain how it will be addressed. The GitHub Issues page may provide some ideas. A proposal must include a significant initial pull request.

Prerequisites: Experience with CUDA/C/C++ programming and the basics of deep learning.

Mentor: David Koes l (dkoes@pitt.edu)

DeepChem Project Ideas

DeepChem aims to provide a high quality open-source toolchain that democratizes the use of deep-learning in drug discovery, materials science, quantum chemistry, and biology. Additional project ideas are discussed at https://forum.deepchem.io/t/brainstorming-gsoc-2022-topics/658.

Project [350 hours]: PyTorch Lightning Implementation

Brief explanation: Allow for implementation of DeepChem models in PyTorch Lightning.

Expected results: PyTorch lightning is a popular framework for PyTorch. This project would look into enabling the easy construction of PyTorch lightning based models for DeepChem. Completion of this project should require the implementation of a good test suite and a jupyter notebook tutorial for implementing PyTorch Lightning models in DeepChem.

Prerequisites: PyTorch Lightning, Python

Mentor: Bharath Ramsundar (bharath at deepforestsci dot com)

Project [350 hours]: Layer Documentation

Brief explanation: DeepChem is moving towards a concept of first class layers. Improving the documentation for existing layers will help us make our current collection of layers more useful for the community.

Expected results: This project should also add a tutorial for using the layers to the DeepChem tutorial series, and should plan to add a few new layers as well.

Prerequisites: PyTorch/TensorFlow, Python

Mentor: Bharath Ramsundar (bharath at deepforestsci dot com)

Project [350 hours]: PyTorch Porting

Brief explanation: DeepChem is shifting towards using PyTorch as its primary backend, but many models are still implemented in TensorFlow. A good project could be to pick a TensorFlow model or two, then port its layers and model into PyTorch along with suitable unit tests.

Expected results: At least one model should be ported from TensorFlow to PyTorch successfully with associated unit tests. See See https://github.com/deepchem/deepchem/issues/2863

Prerequisites: PyTorch/TensorFlow, Python

Mentor: Bharath Ramsundar (bharath at deepforestsci dot com)

Project [350 hours]: HuggingFace Integration

Brief explanation: HuggingFace Integration: Last year, we had a few student projects explore HuggingFace/DeepChem integration, but these projects were not able to merge in HuggingFace models into DeepChem.

Expected results: This project would create a working HuggingFace model in DeepChem along with tutorials on how to use HuggingFace with DeepChem.

Prerequisites: PyTorch/TensorFlow, Python

Mentor: Bharath Ramsundar (bharath at deepforestsci dot com)

Project [350 hours]: Improved PINNs Support

Brief explanation: Improving our PINNs Support: One of the exciting new features in DeepChem 2.6.0 is support for PINNs, a class of techniques to solve PDEs with neural networks. The API for this class is still rudimentary and supports only a limited class of models and requires handcoding the loss.

Expected results: Extend the API to allow for a broader class of PDEs to be implemented. I’d suggest using Schrodinger’s equation as a test since Schrodinger can be solved in 1D as a toy and extended to arbitrarily high dimensions for larger molecules.

Prerequisites: PyTorch/TensorFlow, Python

Mentor: Bharath Ramsundar (bharath at deepforestsci dot com)

Project [350 hours]: Improved Equivariance Support

Brief explanation: Improve Equivariant Support: DeepChem has no support for equivariant models. Given the increasing importance of equivariance for scientific machine learning this is a major oversight.

Expected results: This project would aim to add a tutorial about equivariant modeling and add an equivariant model to DeepChem. You may want to use e3nn or another library to facilitate implementation.

Prerequisites: PyTorch/TensorFlow, Python

Mentor: Bharath Ramsundar (bharath at deepforestsci dot com)

Project [350 hours]: Improved Antibody Support

Brief explanation: Improving Antibody Support: DeepChem at present doesn’t have much tooling or support for working with anbtibodies.

Expected results: This project would add suitable antibody datasets to MoleculeNet and create a tutorial walking users through antibody design and modeling with DeepChem. If necessary, students may add antibody-specific models as well.

Prerequisites: PyTorch/TensorFlow, Python

Mentor: Bharath Ramsundar (bharath at deepforestsci dot com)

Miscellaneous Project Ideas

These ideas would likely benefit two or more projects.


Project [350 hours]: OneMol: Google Docs & YouTube for Molecules

OneMolsm.png

Brief explanation: There is a huge need in the research community for improved collaboration tools on web and desktop. OneMol will provide an open API for collaborating on molecular data that both Avogadro and 3Dmol.js will support as reference implementations. OneMol compliant applications will be able to manipulate and view molecular data in real time so that changes made by one client will be propagated to other clients.

File-sharing is a means for sharing data, but it does not share real-time interactions; each user’s data exists in its own isolated environment. Screen-sharing provides a common viewpoint for all participants, but allowing others to interact with the data requires granting access to the host workstation. This approach is needlessly inefficient for the task of collaborating on molecular data, and this inefficiency introduces scalability issues. For example, a simple rotation necessitates a full screen update when the fundamental change in state was a simple change in viewing angles.

The OneMol framework consists of three main components: a client module, embedded in a molecular viewer; a facilitator module that enforces a consistent viewer state between all the clients; and a storage module that stores the raw molecular data. All three modules may coexist on the same machine within the same application. However, we anticipate a more common modality will be to use a publicly hosted facilitator server, since this simplifies network connectivity in the face of firewalls and network address translation.

Expected results: Prototype web services to allow web and/or desktop collaboration using 3DMol as a viewer, likely integrating with existing storage systems (e.g., MongoChem or PQR).

Prerequisites: Experience with scripting, and web services. Interest and experience with databases like MongoDB or DSpace very helpful.

Mentor: David Koes (dkoes@pitt.edu) or Geoffrey Hutchison (geoffh at pitt.edu)