Data Analysis and BioInformatics in real-time qPCR (3)
main page
subpage 1
subpage 2
subpage 3
subpage 4  -- integrative data analysis
subpage 5  --  latest paper updates

Molecular Regulatory Networks
Big Data in Transcriptomics & Molecular Biology


How to do successful gene expression analysis using real-time PCR
Stefaan Derveaux, Jo Vandesompele, Jan Hellemans

Methods Vol 50, Issue 4, April 2010,  in The ongoing Evolution of qPCR edited by Michael W. Pfaffl,  Pages 227-230

Reverse transcription quantitative PCR (RT-qPCR) is considered today as the gold standard for accurate, sensitive and fast measurement of gene expression. Unfortunately, what many users fail to appreciate is that numerous critical issues in the workflow need to be addressed before biologically meaningful and trustworthy conclusions can be drawn. Here, we review the entire workflow from the planning and preparation phase, over the actual real-time PCR cycling experiments to data-analysis and reporting steps. This process can be captured with the appropriate acronym PCR: plan/prepare, cycle and report. The key message is that quality assurance and quality control are essential throughout the entire RT-qPCR workflow; from living cells, over extraction of nucleic acids, storage, various enzymatic steps such as DNase treatment, reverse transcription and PCR amplification, to data-analysis and finally reporting.

Quantitative real-time RT-PCR data analysis: current concepts and the novel "gene expression's CT difference" formula.
Schefe JH, Lehmann KE, Buschmann IR, Unger T, Funke-Kaiser H.
Center for Cardiovascular Research (CCR)/Institute of Pharmacology and Toxicology, Charité-Universitätsmedizin Berlin, Hessische Strasse 3-4, 10115, Berlin, Germany.
J Mol Med. 2006 4(11):901-10. Epub 2006 Sep 14.

For quantification of gene-specific mRNA, quantitative real-time RT-PCR has become one of the most frequently used methods over the last few years. This article focuses on the issue of real-time PCR data analysis and its mathematical background, offering a general concept for efficient, fast and precise data analysis superior to the commonly used comparative CT (DeltaDeltaCT) and the standard curve method, as it considers individual amplification efficiencies for every PCR. This concept is based on a novel formula for the calculation of relative gene expression ratios, termed GED (Gene Expression's CT Difference) formula. Prerequisites for this formula, such as real-time PCR kinetics, the concept of PCR efficiency and its determination, are discussed. Additionally, this article offers some technical considerations and information on statistical analysis of real-time PCR data.

Multiway real-time PCR gene expression profiling in yeast Saccharomyces cerevisiae reveals altered transcriptional response of ADH-genes to glucose stimuli.
Ståhlberg A, Elbing K, Andrade-Garda JM, Sjögreen B, Forootan A, Kubista M.
TATAA Biocenter, Odinsgatan 28, 411 03 Göteborg, Sweden.
BMC Genomics. 2008 9:170.

BACKGROUND: The large sensitivity, high reproducibility and essentially unlimited dynamic range of real-time PCR to measure gene expression in complex samples provides the opportunity for powerful multivariate and multiway studies of biological phenomena. In multiway studies samples are characterized by their expression profiles to monitor changes over time, effect of treatment, drug dosage etc. Here we perform a multiway study of the temporal response of four yeast Saccharomyces cerevisiae strains with different glucose uptake rates upon altered metabolic conditions.
RESULTS: We measured the expression of 18 genes as function of time after addition of glucose to four strains of yeast grown in ethanol. The data are analyzed by matrix-augmented PCA, which is a generalization of PCA for 3-way data, and the results are confirmed by hierarchical clustering and clustering by Kohonen self-organizing map. Our approach identifies gene groups that respond similarly to the change of nutrient, and genes that behave differently in mutant strains. Of particular interest is our finding that ADH4 and ADH6 show a behavior typical of glucose-induced genes, while ADH3 and ADH5 are repressed after glucose addition.
CONCLUSION: Multiway real-time PCR gene expression profiling is a powerful technique which can be utilized to characterize functions of new genes by, for example, comparing their temporal response after perturbation in different genetic variants of the studied subject. The technique also identifies genes that show perturbed expression in specific strains.

Statistical aspects of quantitative real-time PCR experiment design
Robert R. Kitchen, Mikael Kubista, Ales Tichopad
Methods Vol 50, Issue 4, April 2010,  in The ongoing Evolution of qPCR edited by Michael W. Pfaffl, Pages 231-236

Experiments using quantitative real-time PCR to test hypotheses are limited by technical and biological variability; we seek to minimise sources of confounding variability through optimum use of biological and technical replicates. The quality of an experiment design is commonly assessed by calculating its prospective power. Such calculations rely on knowledge of the expected variances of the measurements of each group of samples and the magnitude of the treatment effect; the estimation of which is often uninformed and unreliable. Here we introduce a method that exploits a small pilot study to estimate the biological and technical variances in order to improve the design of a subsequent large experiment. We measure the variance contributions at several 'levels' of the experiment design and provide a means of using this information to predict both the total variance and the prospective power of the assay. A validation of the method is provided through a variance analysis of representative genes in several bovine tissue-types. We also discuss the effect of normalisation to a reference gene in terms of the measured variance components of the gene of interest. Finally, we describe a software implementation of these methods, powerNest, that gives the user the opportunity to input data from a pilot study and interactively modify the design of the assay. The software automatically calculates expected variances, statistical power, and optimal design of the larger experiment. powerNest enables the researcher to minimise the total confounding variance and maximise prospective power for a specified maximum cost for the large study.

The Prime Technique - Real-time PCR Data Analysis
Mikael Kubista, Institute of Molecular Genetics and TATAA Biocenter, Sweden
Radek Sindelka, Institute of Molecular Genetics, Czech Republic
G.I.T. Laboratory Journal 9-10/2007, pp 33-35, GIT VERLAG GmbH & Co. KG, Darmstadt

For measuring gene expression there is only one technique: PCR.   But how can it be used with maximum efficiency?   This article tries to give the answer to that question.

Gene expression profiling – Clusters of possibilities
Anders Bergkvist, Vendula Rusnakova, Radek Sindelka, Jose Manuel Andrade Garda, Björn Sjögreen, Daniel Lindh, Amin Forootan, Mikael Kubista

Methods Vol 50, Issue 4, April 2010,  in The ongoing Evolution of qPCR edited by Michael W. Pfaffl, Pages 323-335

Advances in qPCR technology allow studies of increasingly large systems comprising many genes and samples. The increasing data sizes allow expression profiling both in the gene and the samples dimension while also putting higher demands on sound statistical analysis and expertise to handle and interpret its results. We distinguish between exploratory and confirmatory statistical studies. In this paper we demonstrate several techniques available for exploratory studies on a system of Xenopus laevis development from egg to tadpole. Techniques include hierarchical clustering, heatmap, principal component analysis and self-organizing maps. We stress that even though exploratory studies are excellent for generating hypotheses, results have not been proven statistically significant until an independent confirmatory study has been performed. An exploratory study may certainly be valuable in its own right, and there are often not enough resources to report both an exploratory and a confirmatory study at the same time. However, exploratory and confirmatory studies are intimately connected and we would like to raise that awareness among qPCR practitioners. We suggest that scientific reports should always have a hypothesis focus. Reports are either hypothesis generating, from an exploratory study, or hypothesis validating, from a confirmatory study, or both. In either case, we suggest the generated or validated hypotheses be specifically stated.
Download latest Genex version here =>

Validation of differential gene expression algorithms: application comparing fold-change estimation to hypothesis testing.
Yanofsky CM, Bickel DR.
Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology, and Immunology, University of Ottawa, Ottawa, Ontario, Canada.
BMC Bioinformatics. 2010 Jan 28;11:63.

BACKGROUND: Sustained research on the problem of determining which genes are differentially expressed on the basis of microarray data has yielded a plethora of statistical algorithms, each justified by theory, simulation, or ad hoc validation and yet differing in practical results from equally justified algorithms. Recently, a concordance method that measures agreement among gene lists have been introduced to assess various aspects of differential gene expression detection. This method has the advantage of basing its assessment solely on the results of real data analyses, but as it requires examining gene lists of given sizes, it may be unstable.
RESULTS: Two methodologies for assessing predictive error are described: a cross-validation method and a posterior predictive method. As a nonparametric method of estimating prediction error from observed expression levels, cross validation provides an empirical approach to assessing algorithms for detecting differential gene expression that is fully justified for large numbers of biological replicates. Because it leverages the knowledge that only a small portion of genes are differentially expressed, the posterior predictive method is expected to provide more reliable estimates of algorithm performance, allaying concerns about limited biological replication. In practice, the posterior predictive method can assess when its approximations are valid and when they are inaccurate. Under conditions in which its approximations are valid, it corroborates the results of cross validation. Both comparison methodologies are applicable to both single-channel and dual-channel microarrays. For the data sets considered, estimating prediction error by cross validation demonstrates that empirical Bayes methods based on hierarchical models tend to outperform algorithms based on selecting genes by their fold changes or by non-hierarchical model-selection criteria. (The latter two approaches have comparable performance.) The posterior predictive assessment corroborates these findings.
CONCLUSIONS: Algorithms for detecting differential gene expression may be compared by estimating each algorithm's error in predicting expression ratios, whether such ratios are defined across microarray channels or between two independent groups.According to two distinct estimators of prediction error, algorithms using hierarchical models outperform the other algorithms of the study. The fact that fold-change shrinkage performed as well as conventional model selection criteria calls for investigating algorithms that combine the strengths of significance testing and fold-change estimation.

Automated validation of polymerase chain reaction amplicon melting curves.
Mann TP, Humbert R, Stamatoyannopolous JA, Noble WS.
Department of Genome Sciences, University of Washington, Seattle, WA, USA
J Bioinform Comput Biol. 2006 4(2):299-315.

The polymerase chain reaction (PCR) is a fundamental tool of molecular biology. Quantitative PCR is the gold-standard methodology for determination of DNA copy numbers, quantitating transcription, and numerous other applications. A major barrier to large-scale application of PCR for quantitative genomic analyses is the current requirement for manual validation of individual PCRs to ensure generation of a single product. This typically requires visual inspection either of gel electrophoreses or temperature dissociation ("melting") curves of individual PCRs--a time-consuming and costly process. Here we describe a robust computational solution to this fundamental problem. Using a training set of 10 080 reactions comprising multiple quantitative PCRs from each of 1728 unique human genomic amplicons, we developed a support vector machine classifier capable of discriminating single-product PCRs with better than 99% accuracy. This approach has broad utility, and eliminates a major bottleneck to widespread application of PCR for high-throughput genomic applications.

Statistical models in assessing fold change of gene expression in real-time RT-PCR experiments
Fu WJ, Hu J, Spencer T, Carroll R, Wu G.
Department of Epidemiology, Michigan State University, East Lansing, MI 48824, USA.
Comput Biol Chem. 2006 30(1): 21-6.

Real-time RT-PCR has been frequently used in quantitative research in molecular biology and bioinformatics. It provides remarkably useful technology to assess expression of genes. Although mathematical models for gene amplification process have been studied, statistical models and methods for data analysis in real-time RT-PCR have received little attention. In this paper, we briefly introduce current mathematical models, and study statistical models for real-time RT-PCR data. We propose a generalized estimation equations (GEE) model that properly reflects the structure of repeated data in RT-PCR experiments for both cross-sectional and longitudinal data. The GEE model takes the correlation between observations within the same subjects into consideration, and prevents from producing false positives or false negatives. We further demonstrate with a set of actual real-time RT-PCR data that different statistical models yield different estimations of fold change and confidence interval. The SAS program for data analysis using the GEE model is provided to facilitate easy computation for non-statistical professionals.

The Importance of Quality Control During qPCR Data Analysis
Barbara D’haene, Ph.D. & Jan Hellemans, Ph.D.Biogazelle & Ghent University
Drug Discovery - August/September 2010

IntroductionSince its introduction in 1993, qPCR has paved its way towards one of the most popular techniques in modern molecular biology [1]. Despite its apparent simplicity, which makes qPCR such an attractive technology for many researchers, final results are often compromised due to unsound experimental design, a lack of quality control, improper data analysis, or a combination of these. To address the concerns that have been raised about the quality of published qPCR-based research, specialists in the qPCR field have introduced the MIQE guidelines for publication of qPCR-based results [2]. The main purpose of this initiative is to make qPCR-based research transparent, but the MIQE guidelines may also serve as a practical framework to obtain high-quality results. Within the guidelines, quality control at each step of the qPCR workflow, from experimental design to data analysis, is brought to the attention as a necessity to ensure trustworthy results. Numerous papers have been written about assay and sample quality control [3], but less attention has been spent on quality control on post-qPCR data. This article summarizes recommendations for this latter type of quality control including: detection of abnormal amplification, inspection of melting curves, control on PCR replicate variation, assessment of positive and negative control samples, determination of reference gene expression stability, and evaluation of deviating sample normalization factors.

Error bars in experimental biology
Geoff Cumming,1  Fiona Fidler,1  and  David L. Vaux2
1School of Psychological Science and 2Department of Biochemistry, La Trobe University, Melbourne, Victoria, Australia 3086

Error bars commonly appear in fi gures in publications, but experimental biologists are often unsure how they should be used and interpreted. In this article we illustrate some basic features of error bars and explain how they can help communicate data and assist correct interpretation. Error bars may show confi dence intervals, standard errors, standard deviations, or other quantities. Different types of error bars give quite different information, and so fi gure legends must make clear what error bars represent. We suggest eight simple rules to assist with effective use and interpretation of error bars.

Automatic Genomics: a user-friendly program for the automatic designing and plate loading of medium-throughput qPCR experiments
Callejas S, Alvarez R, Dopazo A.
Genomics Unit, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain.
Biotechniques. 2011 50(1):46-50.

Quantitative PCR (qPCR) remains the method of choice for gene and microRNA (miRNA) expression studies. Many laboratories wish to automate some or all of the steps of medium-throughput qPCR experiments through the use of various types of liquid handling robots. However, it is not uncommon to find cases in which scripts provided by the robot supplier are too rigid for user-specific applications, do not include all the desired options, or are too complicated to be modified by a nonprofessional programmer. Here, we present Automatic Genomics, a program that allows users with a limited programming background to automate medium-throughput qPCR experiments by using commercially available liquid-handling robots. The user is able to optimize the plate design in terms of number of genes, number of samples, and controls.

Interactive analysis of systems biology molecular expression data
Zhang M, Ouyang Q, Stephenson A, Kane MD, Salt DE, Prabhakar S, Burgner J, Buck C, Zhang X.
Bindley Bioscience Center, Purdue University, West Lafayette, IN 47907, USA.
BMC Syst Biol. 2008 2:23.

BACKGROUND: Systems biology aims to understand biological systems on a comprehensive scale, such that the components that make up the whole are connected to one another and work through dependent interactions. Molecular correlations and comparative studies of molecular expression are crucial to establishing interdependent connections in systems biology. The existing software packages provide limited data mining capability. The user must first generate visualization data with a preferred data mining algorithm and then upload the resulting data into the visualization package for graphic visualization of molecular relations.
RESULTS: Presented is a novel interactive visual data mining application, SysNet that provides an interactive environment for the analysis of high data volume molecular expression information of most any type from biological systems. It integrates interactive graphic visualization and statistical data mining into a single package. SysNet interactively presents intermolecular correlation information with circular and heatmap layouts. It is also applicable to comparative analysis of molecular expression data, such as time course data.
CONCLUSION: The SysNet program has been utilized to analyze elemental profile changes in response to an increasing concentration of iron (Fe) in growth media (an ionomics dataset). This study case demonstrates that the SysNet software is an effective platform for interactive analysis of molecular expression information in systems biology.

Roadmap for developing and validating therapeutically relevant genomic classifiers
Simon R.
National Cancer Institute, 9000 Rockville Pike, MSC 7434, Bethesda, MD 20892, USA
J Clin Oncol. 2005 23(29):7332-41. Epub 2005 Sep 6.

Oncologists need improved tools for selecting treatments for individual patients. The development of therapeutically relevant prognostic markers has traditionally been slowed by poor study design, inconsistent findings, and lack of proper validation studies. Microarray expression profiling provides an exciting new technology for relating tumor gene expression to patient outcome, but it also provides increased challenges for translating initial research findings into robust diagnostics that benefit patients and physicians in therapeutic decision making. This article attempts to clarify some of the misconceptions about the development and validation of multigene expression signature classifiers and highlights the steps needed to move genomic signatures into clinical application as therapeutically relevant and robust diagnostics.

Cluster analysis and display of genome-wide expression patterns
Eisen MB, Spellman PT, Brown PO, Botstein D.
Department of Genetics, Stanford University School of Medicine, 300 Pasteur Avenue, Stanford, CA 94305, USA.
Proc Natl Acad Sci U S A. 1998 95(25):14863-8.

A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be interpreted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly characterized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.
Biomarker Discovery via RT-qPCR and Bioinformatical Validation
Christiane Becker, Irmgard Riedmaier, and Michael W. Pfaffl
Book chapter 18 in PCR Technology - Current Innovations, Third Edition, Pages 259–270
Editors: Tania Nolan and Stephen A. Bustin; CRC Press 2013, Print ISBN: 978-1-4398-4805-0

There is a growing interest in life science research in the use of expressed transcripts that form the basis of biological markers (biomarkers) and in addressing some of the challenging statistical issues that arise when attempting to validate them. Biomarkers have extensively been used across diagnostic and therapeutic areas of many life science disciplines, including clinical, physiological, biochemical, developmental, morphological, and molecular applications. Biomarkers have been defined as “cellular, biochemical or molecular alterations that are measurable in biological media such as human tissues, cells, or fluids.” The official definition, developed by the “Biomarkers definitions working group” of the NIH is: “A biomarker is a characteristic that is objectively measured and evaluated as an indicator of normal biologic processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention.” More recently the definition has been broadened to include more biological characteristics that can be objectively measured and evaluated as a biological indicator. A biomarker can refer to any measurable molecular, biochemical, cellular, or morphological alternations in biological media such as human tissues, cells, or fluids.

How to perform RT-qPCR accurately in plant species? A case study on flower colour gene expression in an azalea (Rhododendron simsii hybrids) mapping population.
De Keyser E, Desmet L, Van Bockstaele E, De Riek J.
Institute for Agricultural and Fisheries Research (ILVO)-Plant Sciences Unit, Caritasstraat 21, 9090, Melle, Belgium.
BMC Mol Biol. 2013 Jun 24;14(1): 13

BACKGROUND: Flower colour variation is one of the most crucial selection criteria in the breeding of a flowering pot plant, as is also the case for azalea (Rhododendron simsii hybrids). Flavonoid biosynthesis was studied intensively in several species. In azalea, flower colour can be described by means of a 3-gene model. However, this model does not clarify pink-coloration. The last decade gene expression studies have been implemented widely for studying flower colour. However, the methods used were often only semi-quantitative or quantification was not done according to the MIQE-guidelines. We aimed to develop an accurate protocol for RT-qPCR and to validate the protocol to study flower colour in an azalea mapping population.
RESULTS: An accurate RT-qPCR protocol had to be established. RNA quality was evaluated in a combined approach by means of different techniques e.g. SPUD-assay and Experion-analysis. We demonstrated the importance of testing noRT-samples for all genes under study to detect contaminating DNA. In spite of the limited sequence information available, we prepared a set of 11 reference genes which was validated in flower petals; a combination of three reference genes was most optimal. Finally we also used plasmids for the construction of standard curves. This allowed us to calculate gene-specific PCR efficiencies for every gene to assure an accurate quantification. The validity of the protocol was demonstrated by means of the study of six genes of the flavonoid biosynthesis pathway. No correlations were found between flower colour and the individual expression profiles. However, the combination of early pathway genes (CHS, F3H, F3'H and FLS) is clearly related to co-pigmentation with flavonols. The late pathway genes DFR and ANS are to a minor extent involved in differentiating between coloured and white flowers. Concerning pink coloration, we could demonstrate that the lower intensity in this type of flowers is correlated to the expression of F3'H.
CONCLUSIONS: Currently in plant research, validated and qualitative RT-qPCR protocols are still rare. The protocol in this study can be implemented on all plant species to assure accurate quantification of gene expression. We have been able to correlate flower colour to the combined regulation of structural genes, both in the early and late branch of the pathway. This allowed us to differentiate between flower colours in a broader genetic background as was done so far in flower colour studies. These data will now be used for eQTL mapping to comprehend even more the regulation of this pathway.

External oligonucleotide standards enable cross laboratory comparison and exchange of real-time quantitative PCR data.
Vermeulen J, Pattyn F, De Preter K, Vercruysse L, Derveaux S, Mestdagh P, Lefever S, Hellemans J, Speleman F, Vandesompele J.
Center for Medical Genetics, Ghent University Hospital, Belgium.
Nucleic Acids Res. 2009 Nov;37(21):e138

The quantitative polymerase chain reaction (qPCR) is widely utilized for gene expression analysis. However, the lack of robust strategies for cross laboratory data comparison hinders the ability to collaborate or perform large multicentre studies conducted at different sites. In this study we introduced and validated a workflow that employs universally applicable, quantifiable external oligonucleotide standards to address this question. Using the proposed standards and data-analysis procedure, we obtained a perfect concordance between expression values from eight different genes in 366 patient samples measured on three different qPCR instruments and matching software, reagents, plates and seals, demonstrating the power of this strategy to detect and correct inter-run variation and to enable exchange of data between different laboratories, even when not using the same qPCR platform.

SPUD qPCR assay confirms PREXCEL-Q softwares ability to avoid qPCR inhibition.
Gallup JM, Sow FB, Van Geelen A, Ackermann MR.
Department of Veterinary Pathology, Iowa State University, Ames, 50011-1250, USA.
Curr Issues Mol Biol. 2010; 12(3): 129-134

Real-time quantitative polymerase chain reaction is subject to inhibition by substances that co-purify with nucleic acids during isolation and preparation of samples. Such materials alter the activity of reverse transcriptase (RT) and thermostable DNA polymerase enzymes on which the assay depends. When removal of inhibitory substances by column or reagent-based methods fails or is incomplete, the remaining option of appropriately, precisely and differentially diluting samples and standards to non-inhibitory concentrations is often avoided due to the logistic problem it poses. To address this, we invented the PREXCEL-Q software program to automate the process of calculating the non-inhibitory dilutions for all samples and standards after a preliminary test plate has been performed on an experimental sample mixture. The SPUD assay was used to check for inhibition in each PREXCEL-Q-designed qPCR reaction. When SPUD amplicons or SPUD amplicon-containing plasmids were spiked equally into each qPCR reaction, all reactions demonstrated complete absence of qPCR inhibition. Reactions spiked with about 15,500 SPUD amplicons yielded a Cq of 27.39 plus/minus 0.28 (at about 80.8% efficiency), while reactions spiked with about 7,750 SPUD plasmids yielded a Cq of 23.82 plus/minus 0.15 (at about 97.85% efficiency). This work demonstrates that PREXCEL-Q sample and standard dilution calculations ensure avoidance of qPCR inhibition.

Mathematical analysis of the Real Time Array PCR (RTA PCR) process.
J. Frits Dijksman and Anke Pierik
Chemical Engineering Science (2012) vol. 71 March 26, 2012. p. 496-506

Real Time Array PCR is a recently developed biochemical technique that measures amplification curves (like quantitative real time Polymerase Chain Reaction (qPCR)) of a multitude of different templates ina sample. It combines two different techniques to profit from theadvantages of both techniques, namely qPCR (real time quantitative detection) with microarrays (high multiplex capability). This enablesthe quantitative detection of many more target sequences than can be done by qPCR. Thereby, the concentration of the many different target molecules originally present in a sample can be measured. Labeled primers are used that are first elongated to form labeled amplicons in the bulk and these can hybridize to capture probes immobilizedon the surface of the microarray. During each PCR cycle, there is atime window available during which the formed labeled amplicons canhybridize to the target sequences on the microarray surface. By detection of the fluorescence of the spots on the microarray, amplification curves comparable to real time PCR can be obtained, which can be used to deduce the information needed on the presence and the amount of targets originally present in the sample. We present a mathematical model that provides fundamental insights in the different steps of Real Time Array PCR and that can be used to optimize the different biochemical processes taking place. At the microarray surface specific molecules are captured and taken away from the solution, causing a concentration gradient that powers a material flow towards themicroarray surface. Only the labeled strand of the amplicon is captured by the probes on the microarray surface and as a result locallythe PCR process is not symmetric anymore. Moreover, in course of the process more and more ssDNA renatures, leaving relatively less strands and complexes available for hybridization. We found that to a large extent, however, the surface fluorescence scales with the bulkconcentration. Important parameters to optimize are the enzyme concentration and degradation, the primer concentration and the capture probe decay rate. Also the surface hybridization time is critical since the time to reach a steady state is at least one order of magnitude longer compared to the timing of the bulk processes in qPCR.

Selecting control genes for RT-QPCR using public microarray data.
Popovici V, Goldstein DR, Antonov J, Jaggi R, Delorenzi M, Wirapati P.
Bioinformatics Core Facility, Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland.
BMC Bioinformatics. 2009 Feb 2;10:42

BACKGROUND: Gene expression analysis has emerged as a major biological research area, with real-time quantitative reverse transcription PCR (RT-QPCR) being one of the most accurate and widely used techniques for expression profiling of selected genes. In order to obtain results that are comparable across assays, a stable normalization strategy is required. In general, the normalization of PCR measurements between different samples uses one to several control genes (e.g. housekeeping genes), from which a baseline reference level is constructed. Thus, the choice of the control genes is of utmost importance, yet there is not a generally accepted standard technique for screening a large number of candidates and identifying the best ones.
RESULTS: We propose a novel approach for scoring and ranking candidate genes for their suitability as control genes. Our approach relies on publicly available microarray data and allows the combination of multiple data sets originating from different platforms and/or representing different pathologies. The use of microarray data allows the screening of tens of thousands of genes, producing very comprehensive lists of candidates. We also provide two lists of candidate control genes: one which is breast cancer-specific and one with more general applicability. Two genes from the breast cancer list which had not been previously used as control genes are identified and validated by RT-QPCR. Open source R functions are available at
CONCLUSION: We proposed a new method for identifying candidate control genes for RT-QPCR which was able to rank thousands of genes according to some predefined suitability criteria and we applied it to the case of breast cancer. We also empirically showed that translating the results from microarray to PCR platform was achievable.