GSoC Ideas 2015: Difference between revisions

From wiki.openchemistry.org
Jump to navigation Jump to search
(Added an initial page, with a couple of possible ideas for this summer)
 
(→‎cclib Project Ideas: add me to cclib projects as possible mentor)
 
(37 intermediate revisions by 6 users not shown)
Line 11: Line 11:
* Any prerequisites for working on the project.
* Any prerequisites for working on the project.
* Links to any further information, discussions, bug reports etc.
* Links to any further information, discussions, bug reports etc.
* Any special mailing lists if not the standard mailing list for the VTK.
* Any special mailing lists if not the standard mailing list for the project
* Your name and email address for contact (if willing to mentor, or nominated mentor).
* Your name and email address for contact (if willing to mentor, or nominated mentor).


Line 20: Line 20:
'''Brief explanation:''' Support for biological data, representations, and visualization
'''Brief explanation:''' Support for biological data, representations, and visualization


'''Expected results:''' Add support for molecular fragments on top of the molecule model, extending this to residues, and supporting reading/writing this secondary structure. Additional rendering modes for secondary biological structures, building up a biomolecule from residues, and  
'''Expected results:''' Add support for molecular fragments on top of the molecule model, extending this to residues, and supporting reading/writing this secondary structure (e.g., PDB format). Additional rendering modes for secondary biological structures (i.e. ribbons, cartoons, etc.), building up a biomolecule from residues, and adding residue labels.


'''Prerequisites:''' Experience in C++, some experience with OpenGL and an biochemistry ideally, but not necessary.
'''Prerequisites:''' Experience in C++, some experience with OpenGL and an biochemistry ideally, but not necessary.
Line 30: Line 30:
'''Brief explanation:''' Improve support for molecular dynamics simulations in Avogadro 2
'''Brief explanation:''' Improve support for molecular dynamics simulations in Avogadro 2


'''Expected results:''' Initial support is already present, with support for reading in basic trajectories from XYZ files, and static .gro files for GROMACS. Extend this to more fully support the needs of molecular dynamics, reading in trajectory files, ideally loading in time steps on demand for large files rather than loading the entire file in up front. Invesstigate ways to support generating input, and dealing with extremely large systems (over one million particles). Add support for characterizing particle movement, rare events, and visualizing these in addition to simple trajectory animations.
'''Expected results:''' Initial support is already present, with support for reading in basic trajectories from XYZ files, and static .gro files for GROMACS. Extend this to more fully support the needs of molecular dynamics, reading in trajectory files, ideally loading in time steps on demand for large files rather than loading the entire file in up front. Investigate whether compression techniques (e.g., delta compression) can improve reading and rendering performance. Investigate ways to support generating input, and dealing with extremely large systems (over one million particles). Add support for characterizing particle movement (e.g., pair-wise distribution functions), rare events, and visualizing these in addition to simple trajectory animations.


'''Prerequisites:''' Experience in C++, some experience with OpenGL and an MD code ideally, but not necessary.
'''Prerequisites:''' Experience in C++, some experience with OpenGL and an MD code ideally, but not necessary.


'''Mentor:''' Marcus D. Hanwell (marcus dot hanwell at kitware dot com).
'''Mentor:''' Marcus D. Hanwell (marcus dot hanwell at kitware dot com).
=== Project: Scripting Bindings ===
'''Brief explanation:''' Implement an embedded scripting language (e.g., Python or JavaScript) in Avogadro 2
'''Expected results:''' Create bindings for the C++ libraries in Python or JavaScript / QtScript. This should allow an embedded scripting console as well as support for implementing modular extensions (tools, rendering, etc.) in Python or JavaScript. A Boost.Python implementation existed in Avogadro v1, but has not been re-implemented with the new code base. An ideal solution would connect to QML and Qt to allow scripting to add menu items, windows, etc. and provide documentation and example scripts. The interface should be maintainable as new classes and methods are added.
'''Prerequisites:''' Experience in C++ and Python or JavaScript, some experience with SWIG, Boost.Python
'''Mentor:''' Geoff Hutchison (geoffh at pitt dot edu)
=== Project: Point Group Symmetry ===
'''Brief explanation:''' Ability to identify and impose point group symmetry. 
'''Expected Results:''' Add support for point group identification and to impose symmetry while building a molecule to Avogadro. Preparing Quantum Chemistry input geometries with symmetry can be a difficult task due to different input formats and expectations between many of the widely used packages.  This development will interface symmetry detection and building with the Quantum Chemistry input generators already present in Avogadro to allow for more sophisticated calculations across all currently supported packages, for example, GAMESS, Gaussian, Molpro and QChem. (link to published symmetry detection algorithms would be nice)
'''Prerequisites:''' Experience in C++, some experience with OpenGL and molecular point group theory, but not necessary.
'''Mentor:''' Albert DeFusco (defusco AT pitt DOT edu)


==cclib Project Ideas==
==cclib Project Ideas==
[http://cclib.github.io cclib] is an open source library, written in Python, for parsing and interpreting the results of computational chemistry packages. The goals of cclib are centered around the reuse of data obtained from these programs and contained in output files.
=== Project: Data Export (No longer appropriate?) ===
'''Brief explanation:''' Support for exporting the results from parsing a log file to a standard format such as CML.
'''Expected results:''' Add an export module to save the ccData object as an external file.
'''Prerequisites:''' Experience in Python, some experience with XML and chemistry ideal, but not necessary.
'''Mentor:''' Adam Tenderholt (atenderholt at gmail dot com), possibly Karol Langner (karol.langner at gmail dot com).
'''Notes:'''
Eric Berquist has already done a fair amount of work on this project. See https://github.com/cclib/cclib/tree/writer/src/cclib/writer.
This project could be tied into the repository/tracker in Miscellaneous if a database export would be useful.
=== Project: Integrate with Avogadro ===
'''Brief explanation:''' Allow Avogadro to parse cclib-supported formats.
'''Expected results:''' Call python scripts to attempt parsing QM file formats with cclib, handle calls into cclib, and convert from Python Objects to C++ objects or to a format supported by Avogadro (e.g, CML or Chemical JSON).
'''Prerequisites:''' Experience in C++, some experience with the Python C-bindings ideal, but not necessary.
'''Mentor:''' Adam Tenderholt (atenderholt at gmail dot com) and Marcus D. Hanwell (marcus dot hanwell at kitware dot com), possibly Karol Langner (karol.langner at gmail dot com).


==Open Babel Project Ideas==
==Open Babel Project Ideas==
[http://openbabel.org Open Babel] is an open toolbox for chemistry, designed to speak the many languages of chemical data. It's an open, collaborative project allowing anyone to search, convert, analyze, or store data from molecular modeling, chemistry, solid-state materials, biochemistry, or related areas.
=== Project: Efficient Parallel Maximum Weighted Matching Graphs  ===
'''Brief explanation:''' Improve performance of bond perception (and aromaticity detection) using maximum weighted matching algorithms
'''Expected results:''' Currently the performance of ring perception and aromaticity detection in Open Babel is extremely poor, particularly on structures with many fused rings. The current implementation can be exponential in the number of fused rings. Improved implementations exist, mostly using high-performance or parallel implementations of maximum weighted matching graph algorithms from combinatorial optimization. Implementing an improved chemical graph library would dramatically benefit multiple areas of Open Babel.
'''Prerequisites:''' Experience in C++, some experience with OpenMP or OpenCL ideally.
'''Mentor:''' Geoff Hutchison (geoffh at pitt dot edu).
=== Project: Fragment-Based Coordinate Generation  ===
'''Brief explanation:''' A key problem is generating 3D coordinates for a known molecule. Implement a fragment-based generator to supplement the rule-based algorithm.
'''Expected results:''' Currently Open Babel uses a combination of a rule-based approach (i.e., expected geometries) to generate atom-by-atom the 3D coordinates of molecules. Fragments are only used for some ring-based structures. For inorganic and organometallic molecules, the rules may fail. Importantly, the approach is highly inefficient, since fragments can set many atoms at once. The project should generate a library reflecting a balance between efficiency (i.e., many common fragments) and size, as well as an efficient, parallel algorithm for connecting fragments. A knowledge-based fragment approach can also supplement and minimize the need for conformer sampling.
'''Prerequisites:''' Experience in C++ and linear algebra. Knowledge of statistics (e.g., Bayesian inference, data mining), OpenMP or OpenCL ideal.
'''Mentor:''' Geoff Hutchison (geoffh at pitt dot edu) or David Koes (dkoes at pitt dot edu)
==3Dmol.js Project Ideas==
[http://3dmol.csb.pitt.edu 3Dmol.js] is a modern, object-oriented JavaScript library for visualizing molecular data that is forked from GLmol.  A particular emphasis is placed on performance.
=== Project: Add support for imposters ===
'''Brief explanation:''' WebGL 2.0 provides the functionality needed to implement [http://www.arcsynthesis.org/gltut/illumination/tutorial%2013.html imposters] which can be used to dramatically accelerate the rendering of molecular data.
'''Expected results:''' Spheres and cylinders will be implemented as imposters within 3Dmol.js.  This code will be used whenever a browser properly supports the gl_fragdepth extension.
'''Prerequisites:''' Familiarity with JavaScript, WebGl and/or OpenGL, and basic matrix algebra.
Mentor: David Koes (dkoes@pitt.edu)
=== Project: Implement volumetric rendering in 3Dmol.js ===
'''Brief explanation:''' [http://http.developer.nvidia.com/GPUGems/gpugems_ch39.html Volumetric rendering]  provides a way to visualize volumetric data in more detail than simple isosurfaces.
'''Expected results:''' A number of different volumetric rendering techniques will be implemented and evaluated for a variety of molecular data types.
'''Prerequisites:''' Familiarity with JavaScript, WebGl and/or OpenGL, and basic matrix algebra.
'''Mentor:''' David Koes (dkoes@pitt.edu)
==Miscellaneous Project Ideas==
These ideas would likely benefit two or more projects.
=== Project: High Performance Force Field Calculations ===
'''Brief explanation:''' Add integrated molecular mechanics force field simulations in Avogadro 2
'''Expected results:''' Currently, Avogadro 2 relies on command-line calls to Open Babel to optimize geometries or perform conformer searching. The Open Babel code supports multiple force fields, but has poor performance. A modern implementation of a force field library would be welcome, including OpenMP and/or OpenCL support for highly parallel calculations. The architecture should support constrained geometry optimizations and multiple optimization techniques (i.e., steepest descent, conjugate gradients, quasi-Newton like L-BFGS) and be modular enough to allow new force field implementations as plugins.
Ideally the code would be implemented in a new library so it can be used by Avogadro, Open Babel, and other codes
'''Prerequisites:''' Experience in C++, some experience with OpenMP or OpenCL ideally.
'''Mentor:''' Geoff Hutchison (geoffh at pitt dot edu).
===Project: GPU Accelerated Calculation of Molecular Surfaces and QM Data===
'''Brief explanation:''' Leverage generic GPU/CPU language (e.g., OpenCL) to generate surface data such as molecular orbitals or electron (spin) density. (Note: OpenCube already seems to leverage multiple cores effectively, but a former colleague of mine developed Lumo using OpenCL: http://www.kieber-emmons.com/Lumo/. I remember near-instantaneous rendering of MOs without any pre-calculation tricks.) Similar code exists in VMD.
Additional performance improvements may come through efficient surface generation techniques used in other work (e.g., using the Euclidian Distance Transform).
'''Expected results:''' Generate appropriate kernels that can be used in any language that supports OpenCL (C, C++, Python, etc.) across multiple platforms.
'''Prerequisites:''' General programming experience, and ideally experience in chemistry and matrix manipulations.
'''Suggested Readings:'''
* http://www.ks.uiuc.edu/Publications/Papers/paper.cgi?tbcode=STON2009
* http://zhanglab.ccmb.med.umich.edu/EDTSurf/
'''Mentor:''' Adam Tenderholt (atenderholt at gmail dot com) or Geoffrey Hutchison (geoffh at pitt dot edu)
===Project: repository/tracker of computational chemistry results===
'''Brief explanation:''' Combine a number of existing tools (openbabel, cclib, MongoChem, JUMBO/Quixote, crawlers, databases) in order to index and/or track computational chemistry results. There are already thousands of raw logfiles available online, and it is not an unreasonable idea today to gather many more for domain-specific applications (for example for drug candidates, metabolites, materials, etc.). Having such a resource, especially with search capabilities, would be valuable. It would enhance reproducibility, data re-use and at the appropriate scale would enable new kinds of analyses.
'''Expected results:''' Prototype services that would discover, parse, track, index and search computational chemistry results on the web. All the pieces for this exist in some form, and the majority of the work would involve integrating into a system that combines them into a working whole and producing a registry and server component.
'''Prerequisites:''' Experience with scripting, and web services. Interest and experience with databases like MongoDB or DSpace very helpful.
'''Mentor:''' Adam Tenderholt (atenderholt at gmail dot com), possibly Karol Langner (karol.langner at gmail dot com) or Geoffrey Hutchison (geoffh at pitt.edu)

Latest revision as of 11:35, 13 February 2015

Guidelines

The Open Chemistry project is putting together a proposal for this year's Google Summer of Code. Open Chemistry is an umbrella for projects in chemistry, materials science, biochemistry, and related areas. We intend to concentrate mainly on projects to improve Avogadro 2, cclib, and Open Babel. We have gathered a pool of interested mentors together who are seasoned developers in each of these projects, and would welcome original ideas in addition to those presented here.

Adding Ideas

When adding a new idea to this page, please try to include the following information:

  • A brief explanation of the idea.
  • Expected results/feature additions.
  • Any prerequisites for working on the project.
  • Links to any further information, discussions, bug reports etc.
  • Any special mailing lists if not the standard mailing list for the project
  • Your name and email address for contact (if willing to mentor, or nominated mentor).

Avogadro 2 Project Ideas

Project: Biological Data Visualization

Brief explanation: Support for biological data, representations, and visualization

Expected results: Add support for molecular fragments on top of the molecule model, extending this to residues, and supporting reading/writing this secondary structure (e.g., PDB format). Additional rendering modes for secondary biological structures (i.e. ribbons, cartoons, etc.), building up a biomolecule from residues, and adding residue labels.

Prerequisites: Experience in C++, some experience with OpenGL and an biochemistry ideally, but not necessary.

Mentor: Marcus D. Hanwell (marcus dot hanwell at kitware dot com).

Project: Molecular Dynamics

Brief explanation: Improve support for molecular dynamics simulations in Avogadro 2

Expected results: Initial support is already present, with support for reading in basic trajectories from XYZ files, and static .gro files for GROMACS. Extend this to more fully support the needs of molecular dynamics, reading in trajectory files, ideally loading in time steps on demand for large files rather than loading the entire file in up front. Investigate whether compression techniques (e.g., delta compression) can improve reading and rendering performance. Investigate ways to support generating input, and dealing with extremely large systems (over one million particles). Add support for characterizing particle movement (e.g., pair-wise distribution functions), rare events, and visualizing these in addition to simple trajectory animations.

Prerequisites: Experience in C++, some experience with OpenGL and an MD code ideally, but not necessary.

Mentor: Marcus D. Hanwell (marcus dot hanwell at kitware dot com).

Project: Scripting Bindings

Brief explanation: Implement an embedded scripting language (e.g., Python or JavaScript) in Avogadro 2

Expected results: Create bindings for the C++ libraries in Python or JavaScript / QtScript. This should allow an embedded scripting console as well as support for implementing modular extensions (tools, rendering, etc.) in Python or JavaScript. A Boost.Python implementation existed in Avogadro v1, but has not been re-implemented with the new code base. An ideal solution would connect to QML and Qt to allow scripting to add menu items, windows, etc. and provide documentation and example scripts. The interface should be maintainable as new classes and methods are added.

Prerequisites: Experience in C++ and Python or JavaScript, some experience with SWIG, Boost.Python

Mentor: Geoff Hutchison (geoffh at pitt dot edu)

Project: Point Group Symmetry

Brief explanation: Ability to identify and impose point group symmetry.

Expected Results: Add support for point group identification and to impose symmetry while building a molecule to Avogadro. Preparing Quantum Chemistry input geometries with symmetry can be a difficult task due to different input formats and expectations between many of the widely used packages. This development will interface symmetry detection and building with the Quantum Chemistry input generators already present in Avogadro to allow for more sophisticated calculations across all currently supported packages, for example, GAMESS, Gaussian, Molpro and QChem. (link to published symmetry detection algorithms would be nice)

Prerequisites: Experience in C++, some experience with OpenGL and molecular point group theory, but not necessary.

Mentor: Albert DeFusco (defusco AT pitt DOT edu)

cclib Project Ideas

cclib is an open source library, written in Python, for parsing and interpreting the results of computational chemistry packages. The goals of cclib are centered around the reuse of data obtained from these programs and contained in output files.

Project: Data Export (No longer appropriate?)

Brief explanation: Support for exporting the results from parsing a log file to a standard format such as CML.

Expected results: Add an export module to save the ccData object as an external file.

Prerequisites: Experience in Python, some experience with XML and chemistry ideal, but not necessary.

Mentor: Adam Tenderholt (atenderholt at gmail dot com), possibly Karol Langner (karol.langner at gmail dot com).

Notes:

Eric Berquist has already done a fair amount of work on this project. See https://github.com/cclib/cclib/tree/writer/src/cclib/writer.

This project could be tied into the repository/tracker in Miscellaneous if a database export would be useful.

Project: Integrate with Avogadro

Brief explanation: Allow Avogadro to parse cclib-supported formats.

Expected results: Call python scripts to attempt parsing QM file formats with cclib, handle calls into cclib, and convert from Python Objects to C++ objects or to a format supported by Avogadro (e.g, CML or Chemical JSON).

Prerequisites: Experience in C++, some experience with the Python C-bindings ideal, but not necessary.

Mentor: Adam Tenderholt (atenderholt at gmail dot com) and Marcus D. Hanwell (marcus dot hanwell at kitware dot com), possibly Karol Langner (karol.langner at gmail dot com).

Open Babel Project Ideas

Open Babel is an open toolbox for chemistry, designed to speak the many languages of chemical data. It's an open, collaborative project allowing anyone to search, convert, analyze, or store data from molecular modeling, chemistry, solid-state materials, biochemistry, or related areas.

Project: Efficient Parallel Maximum Weighted Matching Graphs

Brief explanation: Improve performance of bond perception (and aromaticity detection) using maximum weighted matching algorithms

Expected results: Currently the performance of ring perception and aromaticity detection in Open Babel is extremely poor, particularly on structures with many fused rings. The current implementation can be exponential in the number of fused rings. Improved implementations exist, mostly using high-performance or parallel implementations of maximum weighted matching graph algorithms from combinatorial optimization. Implementing an improved chemical graph library would dramatically benefit multiple areas of Open Babel.

Prerequisites: Experience in C++, some experience with OpenMP or OpenCL ideally.

Mentor: Geoff Hutchison (geoffh at pitt dot edu).

Project: Fragment-Based Coordinate Generation

Brief explanation: A key problem is generating 3D coordinates for a known molecule. Implement a fragment-based generator to supplement the rule-based algorithm.

Expected results: Currently Open Babel uses a combination of a rule-based approach (i.e., expected geometries) to generate atom-by-atom the 3D coordinates of molecules. Fragments are only used for some ring-based structures. For inorganic and organometallic molecules, the rules may fail. Importantly, the approach is highly inefficient, since fragments can set many atoms at once. The project should generate a library reflecting a balance between efficiency (i.e., many common fragments) and size, as well as an efficient, parallel algorithm for connecting fragments. A knowledge-based fragment approach can also supplement and minimize the need for conformer sampling.

Prerequisites: Experience in C++ and linear algebra. Knowledge of statistics (e.g., Bayesian inference, data mining), OpenMP or OpenCL ideal.

Mentor: Geoff Hutchison (geoffh at pitt dot edu) or David Koes (dkoes at pitt dot edu)

3Dmol.js Project Ideas

3Dmol.js is a modern, object-oriented JavaScript library for visualizing molecular data that is forked from GLmol. A particular emphasis is placed on performance.

Project: Add support for imposters

Brief explanation: WebGL 2.0 provides the functionality needed to implement imposters which can be used to dramatically accelerate the rendering of molecular data.

Expected results: Spheres and cylinders will be implemented as imposters within 3Dmol.js. This code will be used whenever a browser properly supports the gl_fragdepth extension.

Prerequisites: Familiarity with JavaScript, WebGl and/or OpenGL, and basic matrix algebra.

Mentor: David Koes (dkoes@pitt.edu)

Project: Implement volumetric rendering in 3Dmol.js

Brief explanation: Volumetric rendering provides a way to visualize volumetric data in more detail than simple isosurfaces.

Expected results: A number of different volumetric rendering techniques will be implemented and evaluated for a variety of molecular data types.

Prerequisites: Familiarity with JavaScript, WebGl and/or OpenGL, and basic matrix algebra.

Mentor: David Koes (dkoes@pitt.edu)

Miscellaneous Project Ideas

These ideas would likely benefit two or more projects.

Project: High Performance Force Field Calculations

Brief explanation: Add integrated molecular mechanics force field simulations in Avogadro 2

Expected results: Currently, Avogadro 2 relies on command-line calls to Open Babel to optimize geometries or perform conformer searching. The Open Babel code supports multiple force fields, but has poor performance. A modern implementation of a force field library would be welcome, including OpenMP and/or OpenCL support for highly parallel calculations. The architecture should support constrained geometry optimizations and multiple optimization techniques (i.e., steepest descent, conjugate gradients, quasi-Newton like L-BFGS) and be modular enough to allow new force field implementations as plugins.

Ideally the code would be implemented in a new library so it can be used by Avogadro, Open Babel, and other codes

Prerequisites: Experience in C++, some experience with OpenMP or OpenCL ideally.

Mentor: Geoff Hutchison (geoffh at pitt dot edu).

Project: GPU Accelerated Calculation of Molecular Surfaces and QM Data

Brief explanation: Leverage generic GPU/CPU language (e.g., OpenCL) to generate surface data such as molecular orbitals or electron (spin) density. (Note: OpenCube already seems to leverage multiple cores effectively, but a former colleague of mine developed Lumo using OpenCL: http://www.kieber-emmons.com/Lumo/. I remember near-instantaneous rendering of MOs without any pre-calculation tricks.) Similar code exists in VMD.

Additional performance improvements may come through efficient surface generation techniques used in other work (e.g., using the Euclidian Distance Transform).

Expected results: Generate appropriate kernels that can be used in any language that supports OpenCL (C, C++, Python, etc.) across multiple platforms.

Prerequisites: General programming experience, and ideally experience in chemistry and matrix manipulations.

Suggested Readings:

Mentor: Adam Tenderholt (atenderholt at gmail dot com) or Geoffrey Hutchison (geoffh at pitt dot edu)

Project: repository/tracker of computational chemistry results

Brief explanation: Combine a number of existing tools (openbabel, cclib, MongoChem, JUMBO/Quixote, crawlers, databases) in order to index and/or track computational chemistry results. There are already thousands of raw logfiles available online, and it is not an unreasonable idea today to gather many more for domain-specific applications (for example for drug candidates, metabolites, materials, etc.). Having such a resource, especially with search capabilities, would be valuable. It would enhance reproducibility, data re-use and at the appropriate scale would enable new kinds of analyses.

Expected results: Prototype services that would discover, parse, track, index and search computational chemistry results on the web. All the pieces for this exist in some form, and the majority of the work would involve integrating into a system that combines them into a working whole and producing a registry and server component.

Prerequisites: Experience with scripting, and web services. Interest and experience with databases like MongoDB or DSpace very helpful.

Mentor: Adam Tenderholt (atenderholt at gmail dot com), possibly Karol Langner (karol.langner at gmail dot com) or Geoffrey Hutchison (geoffh at pitt.edu)