Personal tools
You are here: Home Media Participant's Contributions E.L. Willighagen - Distributing molecular information over the Internet
Document Actions

E.L. Willighagen - Distributing molecular information over the Internet

This poster introduces methods developed in the last 10 years that allow distributing molecular content over the internet, starting with Chemical MIME types and ending with chemistry enriched blogs and blog aggregators. During the conference I will add more and more detail to this overview, so questions are most welcome!

E.L. Willighagen, CUBIC, Cologne, Germany


Distributing molecular information over the Internet has seen a steep curve in reaching maturity; it took more than 10 years to reach the current situation, where molecular data on the Internet has just started to become machine interpretable. In this poster I will highlight a number of important technological breakthroughs in the past decade. I am deliberately skipping technologies prior to the world wide web (WWW) like gopher. Additionally, I do not feel that dynamic generation of web pages via CGI-BIN, PHP, or even Ajax changed the experience of molecular data on that Internet significantly, and will skip that too.

Chemical MIME types

The introduction of the WWW was an important step in making the Internet practically useful for non-IT specialists, providing a uniform, graphical interface to other sites. These well-known web pages were, and still are, written in HTML, which has seen a long evolution, but its loose syntax allowed people to quickly hack up pages with information. Periodic tables were one of the first indications of presence of chemistry on the Internet. However, HTML only allowed to put bitmap pictures on those pages, making chemical diagrams as useless as in print. Because this problem did not just apply to chemistry, MIME types were used. The web browser used to MIME type to see whether it could show the content in the browser itself, or whether it had to fire up an external viewer. Rzepa picked this up, and defined MIME types for chemistry [Rzepa1994]. It has taken up to this year, however, that the first operating systems are providing these, while in the past decade people had to add them manually. Rasmol was often used as external viewer, though in Nijmegen we hooked up CACTVS in a similar way for 2D molecular diagrams. And all this worked for both small molecules as well as protein structures.

Applets

When Sun released Java, one of the demo's in the distribution was an applet that was able to render molecules. The applet could read XYZ files and render the atoms in it. It user could rotate this molecule and look at it from any angle it would like. At that time, in 1995, I was a chemoinformatics student and quite interested in chemistry on the Internet, and actually worked on a web page containing a (Dutch) dictionary on organic chemistry, and this applet would allow the users to get a much better view on molecular data. Two years later Gezelter started an opensource molecular viewer applet called Jmol [Gezelter1997]. At about the same time, alternatives were developed to provide interactive components inside HTML pages. For molecular content Chime has been quite popular, which has a rich scripting language, extending the possibilities of user interaction considerably. For a full comparison of Chime versus Jmol I refer to [Herraez2006].

Chemical Markup Language

However, it took large website years to pick up these technologies and the last five years did not see much changes. Meanwhile, Rzepa and Murray-Rust worked on the Chemical Markup Language (CML), which is an XML based language to mark up molecular data [Murray-Rust1998], and later extended for e.g. reactions and pathways [Holliday2006]. Because XML Namespaces allows combining XML languages, CML allows embedding molecular content to other XML formats like XHTML.

Blogs

A much more recent interesting change is the uptake of blog technologies in chemistry. Blogs are RSS or Atom based feeds that RSS clients download every now and then. Dedicated clients are available, like Akregator or RSSOwl, but most web browsers have in-built RSS clients now too. News sites often use RSS feeds to send around headlines, but quickly people started blogging personal stories too, sending around there experiences via such feeds. RSS and Atom are, however, XML based, and can therefore be extended with molecular content by embedding CML [Murray-Rust2004]. However, general RSS clients do not understand embedded XML yet, though dedicated feed clients do. Examples here are Jmol (version 9), JChemPaint and Bioclipse.



The above screenshot shows an RSS feed enriched with a molecular structure, and the carbon and proton NMR spectra for that compound, both using CML. The PDB.org database could, for example, to enhance there 'New structures this week' feed, by embedding the actual new PDB entries. The feed would increase in size of course, but would greatly enrich the user experience.

Additionally, small computer programs could read such feeds and detect interesting chemistry, and provide the user with just those items from blogs and other feeds that are of specific interest to a certain scientist. This allows unprecedented powerful methods to dissect scientific literature. However, publishers are not yet fully aware of the power of these technologies.

Chemists and biologists, however, are also keen bloggers who write up there experiences. PostGenomic.com is a powerful blog aggregator which act like such a small computer program, and summarizes what literature and conferences bloggers are talking about. The website provides a Bioinformatics and a Chemistry sections to tune the interest of the reader.

IUPAC International Chemical Identifier (InChI)

The InChI is a unique identifier for molecules, primarily aimed at small molecules, like metabolites, but actually working for single strands proteins too. If full molecular details using CML cannot be embedded in web pages, the InChI, at least, allows to specify in details about what chemistry that webpage is. This enables googling for specific metabolites, allowing for aggregation of Internet content about a specific molecule. This content might be regular content, but just as well literature. But the latter requires that published publish the InChI's of the molecules in papers; while this is not yet common place, publishers are starting to get warm for this.

Conclusion

This poster gives a brief overview of world wide web technologies that are changing the distribution of molecular data over the internet. It shows how Chemical MIME, applets, CML, (CML enriched) blogs, and the InChI are changing the way we receive and experience such molecular content.

Literature

[Gezelter1997] D. Gezelter, Jmol 1997

[Herraez2006] A. Herráez, Biomolecules in the computer: Jmol to the rescue, Biochemistry and Molecular Biology Education, 2006, 34(4): 255-261

[Holliday2006] Holliday GL, Murray-Rust P, Rzepa HS, Chemical markup, XML, and the world wide web. 6. CMLReact, an XML vocabulary for chemical reactions, J Chem Inf Model. 2006;46(1):145-57. PMID: 16426051

[Murray-Rust1994] Murray-Rust P, Rzepa HS, Williamson MJ, Willighagen EL, Chemical markup, XML, and the World Wide Web. 5. Applications of chemical metadata in RSS aggregators. J Chem Inf Comput Sci. 2004;44(2):462-9. PMID: 15032525

[Murray-Rust1998] Murray-Rust P, Rzepa HS, Chemical markup, XML, and the world wide web, J Chem Inf Model, 1998

[Rzepa1994] H. Rzepa, The Chemical MIME Homepage 1994


Changelog

1. first draft - 2006-12-05


Powered by Plone CMS, the Open Source Content Management System