The problem
Bioinformaticians have some very powerful toolsets, like BioPerl, Biopython, Biojava or the the NCBI web services, but lack a flexible software solution for the integrated management of biosequence data. We think that Plone can be the right platform to build it and that it has great potential to become a software of choice for the management of biological data.
Sequences as first-class citizens
Plone4bio is a bunch of Plone products developed to bridge this gap: they let people add sequence records as first-class citizens to the Plone CMS, stored either in the ZODB or in any BioSQL compliant database. Sequences can be quite complex objects, with lots of metadata that need to be displayed pictorially, but all the complexity is hidden to the end-user.
The management of biological data is a complex task, that faces many integration issues. As of today, researchers have had no choice but either using public services such as Ensembl (which have lots of features, but which lack powerful content management capabilities), or store their data on local databases, accessing it with programs (probably written using BioPerl, BioPython or Biojava). Or worse, storing data and partial results in textual files, spreadsheets, and so on. Moreover, for several institutions, privacy and secrecy of research require that no public services can be used.
Use BioSQL to organize your data …
A standard has been developed, called BioSQL, to define a database-independent schema to store biological sequence data, and all the major Bio* libraries have modules to access data stored in a BioSQL database. What researchers are still lacking are a way of accessing these sequences which does not require writing SQL queries, and do not require programming expertise to just have a plot of the sequence features.
... and Plone to explore them !
Plone4bio let people browse BioSQL databases from within Plone and allows integrating data which are stored in a database system, showing their metadata, indexing them and drawing pictures of features, together with the native data of a state-of-the art CMS . For the first time researchers have a place where all the relevant information about their research can be stored, organized, searched and managed, regardless them being PDF or Office files, images, sequence records or predictions.
In the future
In BioDec we are expanding the Plone4bio capabilities with further specialized content types, like those required to manage antibodies and PCRs: of course the really cool feature is the integration with sequence data in BioSQL format. If we succeed we could have the first open-source Bioinformation Management System used by academia and companies, and it will be based on Plone, too !
