Sunday, May 06, 2012

Inferring the Regulatory Interaction Models of Transcription Factors in Transcriptional Regulatory Networks

Living cells are realized by complex gene expression programs that are moderated by regulatory proteins called transcription factors (TFs). The TFs control the diff erential expression of target genes in the context of transcriptional regulatory networks (TRNs), either individually or in groups. To decipher the mechanisms of how the TFs control the di fferential expression of a target gene in a TRN is challenging, especially when multiple TFs collaboratively participate in the transcriptional regulation. To unravel the roles of the TFs in the regulatory networks, we model the underlying regulatory interactions in terms of the TF-target interactions' directions (activation or repression) and their corresponding logical roles (necessary and/or su cient). We design a set of constraints that relate gene expression patterns to regulatory interaction models, and develop TRIM (Transcriptional Regulatory Interaction Model Inference), a new hidden Markov model, to infer the models of TF-target interactions in large-scale TRNs of complex organisms. Besides, by training TRIM with wild-type time-series gene expression data, the activation timepoints of each regulatory module can be obtained. To demonstrate the advantages of TRIM, we applied it on yeast TRN to infer the TF-target interaction models for individual TFs as well as pairs of TFs in collaborative regulatory modules. By comparing with TF knockout and other gene expression data, we were able to show that the performance of TRIM is clearly higher than DREM (the best existing algorithm). In addition, on an individual Arabidopsis binding network, we showed that the target genes' expression correlations can be signi ficantly improved by incorporating the TF-target regulatory interaction models inferred by TRIM into the expression data analysis, which may introduce new knowledge in transcriptional dynamics and bioactivation.

by Sherine Awad, Nicholas Panchy, See-Kiong Ng, Jin Chen. Journal of Bioinformatics and Computational Biology. 2012. In Press

Frequency-based time-series gene expression recomposition using PRIISM

Circadian rhythm pathways influence the expression patterns of as much as 31% of the Arabidopsis genome through complicated interaction pathways, and have been found to be significantly disrupted by biotic and abiotic stress treatments, complicating treatment-response gene discovery methods due to clock pattern mismatches in the fold change-based statistics. The PRIISM (Pattern Recomposition for the Isolation of Independent Signals in Microarray data) algorithm outlined in this paper is designed to separate pattern changes induced by different forces, including treatment-response pathways and circadian clock rhythm disruptions. Using the Fourier transform, high-resolution time-series microarray data is projected to the frequency domain. By identifying the clock frequency range from the core circadian clock genes, we separate the frequency spectrum to different sections containing treatment-frequency (representing up- or down-regulation by an adaptive treatment response), clock-frequency (representing the circadian clock-disruption response) and noise-frequency components. Then, we project the components’ spectra back to the expression domain to reconstruct isolated, independent gene expression patterns representing the effects of the different influences. By applying PRIISM on a high-resolution time-series Arabidopsis microarray dataset under a cold treatment, we systematically evaluated our method using maximum fold change and principal component analyses. The results of this study showed that the ranked treatment frequency fold change results produce fewer false positives than the original methodology, and the 26-hour timepoint in our dataset was the best statistic for distinguishing the most known cold-response genes. In addition, six novel cold-response genes were discovered. PRIISM also provides gene expression data which represents only circadian clock influences, and may be useful for circadian clock studies. PRIISM is a novel approach for overcoming the problem of circadian disruptions from stress treatments on plants. PRIISM can be integrated with any existing analysis approach on gene expression data to separate circadian-influenced changes in gene expression, and it can be extended to apply to any organism with regular oscillations in gene expression patterns across a large portion of the genome.

by Bruce A. Rosa, Yuhua Jiao, Sookyung Oh, Beronda L. Montgomery, Wensheng Qin, Jin Chen. BMC Systems Biology. 2012. In Press

Draft genome sequence of Rubrivivax gelatinosus CBS

Rubrivivax gelatinosus CBS, a purple nonsulfur photosynthetic bacterium, can grow photosynthetically using CO and N2 as the sole carbon and nitrogen nutrients, respectively. R. gelatinosus CBS is of particular interest due to its ability to metabolize CO and yield H2. We present the 5-Mb draft genome sequence of R. gelatinosus CBS with the goal of providing genetic insight into the metabolic properties of this bacterium.

by Pingsha Hu, Juan Lang, Karen Wawrousek, Jianping Yu, Pin-Ching Maness, Jin Chen. Journal of Bacteriology. 2012. In Press

Wireless Spectrum Occupancy Prediction Based on Partial Periodic Pattern Mining

Cognitive radio appears as a promising technology to allocate wireless spectrum between licensed and unlicensed users in an efficient way. The availability of spectrum holes vastly affects the throughput and delay of unlicensed users. Predictive methods for inferring the availability of spectrum holes can help to improve channel utilization and reduce collision rate. In this paper, a spectrum occupancy prediction method based on Partial Periodic Pattern Mining (PPPM) is introduced. The mining aims to identify frequent occupancy patterns that are hidden in the spectrum usage of a channel, and then the mined frequent patterns are used to predict future channel states. By further extending our three states PPPM to N-states PPPM, the duration of high/low utilization on a channel is also predicted. The frequent patterns of channel utilization duration are critical in optimizing channel switching strategies. PPPM outperforms traditional Frequent Pattern Mining (FPM) by considering patterns that may not repeat perfectly due to noise, sensing errors, and irregular behaviors. Using real life network activities we show a significant reduction in miss rate. In addition, we observed that distinguishing low utilization periods from high utilization periods and mining rules in corresponding utilization periods significantly improve the prediction performance. With prediction mechanism, we show the performance of dynamic spectrum access is substantially improved. The high accuracy of duration prediction is also validated with data collected in the paging bands.

 by Pei Huang, Chin-Jung Liu, Li Xiao, Jin Chen, proceedings of IEEE / ACM 20th Intl Workshop on Quality of Service (IWQoS), Coimbra, Portugal, Jun. 2012

A membrane protein/signaling protein interaction network for Arabidopsis version AMPv2

Interactions between membrane proteins and the soluble fraction are essential for signal transduction and for regulating nutrient transport. To gain insights into the membrane-based interactome, 3,852 open reading frames (ORFs) out of a target list of 8,383 representing membrane and signaling proteins from Arabidopsis thaliana were cloned into a Gateway-compatible vector. The mating-based split ubiquitin system was used to screen for potential protein–protein interactions (pPPIs) among 490 Arabidopsis ORFs. A binary robotic screen between 142 receptor-like kinases (RLKs), 72 transporters, 57 soluble protein kinases and phosphatases, 40 glycosyltransferases, 95 proteins of various functions, and 89 proteins with unknown function detected 387 out of 90,370 possible PPIs. A secondary screen confirmed 343 (of 386) pPPIs between 179 proteins, yielding a scale-free network (r2 = 0.863). Eighty of 142 transmembrane RLKs tested positive, identifying 3 homomers, 63 heteromers, and 80 pPPIs with other proteins. Thirty-one out of 142 RLK interactors (including RLKs) had previously been found to be phosphorylated; thus interactors may be substrates for respective RLKs. None of the pPPIs described here had been reported in the major interactome databases, including potential interactors of G-protein-coupled receptors, phospholipase C, and AMT ammonium transporters. Two RLKs found as putative interactors of AMT1;1 were independently confirmed using a split luciferase assay in Arabidopsis protoplasts. These RLKs may be involved in ammonium-dependent phosphorylation of the C-terminus and regulation of ammonium uptake activity. The robotic screening method established here will enable a systematic analysis of membrane protein interactions in fungi, plants and metazoa.

by Lalonde S, Sero A, Pratelli R, Pilot G, Chen J, Sardi M, Parsa S, Kim DY, Acharya B, Stein E, Hu HC, Villiers F, Takeda K, Yang Y, Han Y, Schwacke R, Chiang W, Kato N, Loqué D, Assmann S, Kwak J, Schroeder J, Rhee S, Frommer W., Frontiers Plant Physiol, Vol.1 Num 24, pp. 1-14, 2010