Step Changes in Life Sciences
The SIMDAT Pharma Grid Business Paradigm
- Fig. 1: Drug Discovery Pipeline The figure provides a high-level view of the drug discovery pipeline. This R&D pipeline requires the collaboration of many scientists including biologists, chemists and clinical researchers. It can be grouped into four key stages: Gene-to-Target, Target-to-Lead, Candidate Selection and FTIM (First Time in Man)
- Fig. 2: GSK Master Sequence Annotation Pipeline (MSAP) connects services from internal and external groups.
- Fig. 3: Overview of top-level workflow for MSAP The particular workflow shown in this figure is a dataflow using a Gene Service, and Polymorphism Service and Expression service deployed internally at different sites at GSK together with a Resource Service deployed at ULB. At this level, the workflow builder interface does not reveal that the implementation of any of the tools is based on GRIA services which simplifies the role of the laboratory scientist/informatician.
Within the European SIMDAT project at GlaxoSmithKline (GSK) substantial progress has been achieved in the analysis of drug targets enabling pharmaceutical companies to virtualize and globalize their R&D chain, lowering costs as well as considerably improving knowledge exchange between industrial and academic partners.
Traversing the Drug Discovery Pipeline
In pharmaceutical Research and Development (R&D) it takes an average 5 to 10 years of laboratory experiments or computational analysis to traverse the drug discovery pipeline and only a small number of projects actually achieve market status. Its costs could run into more than US-$ 1 billion for generating a successful therapeutic compound . The ability to reduce the long cycles and being able to catch failures earlier would result in the availability of more treatments at lower costs for more conditions. It is therefore essential to collect as much data as possible before proceeding with the development of a particular gene or molecular entity.
Only a few large pharmaceutical companies have the human, financial and intellectual capital resources to conduct all activities related to the discovery of a particular drug internally, and even so, they typically focus only on particular diseases or conditions . It is therefore not surprising that pharmaceutical companies seek to develop partnerships with smaller biotechnology companies and contract research organizations (CROs) who provide specialized expertise in specific areas. Tapping into external expertise can significantly improve efficiency with a network of external alliances sharing the risk and reward.
The collaborative model enables pharmaceutical companies to focus on their core expertise outsourcing non-essential aspects of their work, as stated in GSK's R&D Strategy Plan 2008-2010: "Our immediate priority is the short term delivery of new medicines. The next objectives are to rebalance drug discovery towards areas with greatest scientific promise for increased novelty and to ensure that R&D is able to access the wealth of ideas and approaches across the world."
Building Data Analysis Applications in a Virtual Environment - the Virtual Protein Annotation Pipeline
The comparison of disease and non-disease groups to identify possible targets often results in a set of gene sequences that need to be further investigated and annotated using specific algorithms, predictive models and biological content.
GSK had traditionally performed these operations from a central resource centre and it was decided that building a virtual Master Sequence Annotation Pipeline (MSAP) would be a challenging test of the SIMDAT technology.
To test these capabilities a demonstration system was developed at GSK for the analysis of biological data. It involved setting up a secure Grid-enabled environment to support the outsourcing of data analysis and annotation tasks to 3rd parties.
At GSK the InforSense data analytics platform was chosen as it allows analytical applications to be constructed visually without the need for coding. This type of visual programming enables non-software developer individuals to build and deliver analytical applications. The applications can be then deployed to a wide user base, via interactive web pages.
Workflow Deployment at GSK
Workflows play an important role within a service-oriented paradigm. They provide the languages and execution mechanisms that enable users to orchestrate the execution of available services and to develop new aggregated services.
An InforSense workflow was developed to retrieve sequence annotation, SNP data, antigenic and expression information from GSK sites in Stevenage (UK), Research Triangle Park (NC, USA), and Upper Merrion (PA, USA).
Various test applications use the InforSense platform that also incorporates external data services provided by Université Libre des Bruxelles (ULB) and BioFocus DPI (a Galapagos company). In the workflow builder environment users are able to construct applications with the system being able to incorporate data from data warehouses and marts and from internal and external web services due to developments made as part of the SIMDAT project.
The InforSense system also provides layering of workflows, i.e. a component in a high-level workflow itself can be implemented as another workflow. The layering simplifies changes and upgrades to each layer to be made with minimal impact to the other layers.
Within SIMDAT Pharma, GRIA from IT Innovation has been integrated into the InforSense data analytics environment to enable external data analysis and annotation services to be easily used in workflows. With the GSK pilot project, GRIA ensures that all data communication is stable and fail safe, and can recover from any data loss or interruptions to communication.
Intellectual property and security is one of the reasons it takes so long to set up partnerships in the pharmaceutical sector, but the SIMDAT portfolio of tools carefully manages access to sensitive data, overcoming one of the major obstacles to rapid Grid deployment. Especially, for security conscious Pharma, the inclusion of enterprise strength security is paramount. NEC's E2E Security provides internet security services including encryption, integrity protection, and authentication. GRIA and E2E were used to make accessible services from ULB (Antigenic), BioFocus DPI (Drugability), and EMBL (Drugability).
The demonstrator was able to show that the virtualized MSAP, and also other applications similar to it, could be constructed easily by end-users to access data and resources that are geographically distributed at different organizations. At GSK more than 30 personnel were involved in the evaluation - scientists, informaticians, IT developers. The qualitative studies conducted led to the projection that this would achieve reducing the average requirements of overall operational support personnel by factor 10, for every Drug Discovery functional group that uses this sort of Grid technology instead of data exchange by FTP, email or sending data disc by post. The workflow demonstrator activity has also been extremely successful in taking the technology to end-user scientists, informaticians and business users.
With the successful completion of the SIMDAT project GSK is now looking at a number of follow on projects to further validate the use of the technologies including further Business to Business, Business to Academia and Business to Vendor opportunities allowing skills and technologies to be more quickly and effectively implemented without the need to ship large amounts of code, development and external knowledge to within the current GSK system. This virtualization aligns directly to the goals of the organization as it moves forward.
SIMDAT has received research funding of the European Commission under the Information Society Technologies Program (IST), contract number IST-2004-511438.
 GSK spends more than US-$ 4 billion annually on R&D and employs more than 15,000 scientists.
 GSK focuses on medicines to treat six major disease areas - asthma, virus control, infections, mental health, diabetes and digestive conditions. GSK is also a leader in the important area of vaccines and develops new treatments for cancer.
Yvonne Havertz, Fraunhofer Institute for Algorithms and Scientific Computing SCAI, Sankt Augustin, Germany
Dr. Li Du, Enterprise Architect with R&D IT, GlaxoSmithKline, Stevenage, UK
Dr. Moustafa Ghanem, Research Director, InforSense Ltd., London, UK
Dr. Robert Gill, Director of Bio-IT at Pronota nv, Zwijnaarde, Belgium. Formally GSK Director & project lead for SIMDAT
Simon Beaulah, Life Science and Healthcare Marketing, InforSense Ltd., London, UK