Scientists create program to help model gene expression

Credit: Michael Setzer/SciTech Editor Credit: Michael Setzer/SciTech Editor

In recent decades, scientists have taken an interdisciplinary approach that integrates computer science and biology to develop a deeper understanding of the intricate details that contribute to a healthy organism. Kriti Puniyani, a Ph.D. student in the School of Computer Science, working under the guidance of professor Eric Xing, developed GINI, a tool that analyzes images of biological organisms at the molecular level using in situ hybridization and determines the expression of mRNA to develop gene regulation networks. Scientists at Carnegie Mellon’s School of Computer Science and the Lane Center for Computational Biology have been actively developing tools at the forefront of this revolution in biology.

President Subra Suresh mentioned in his lecture last week that the combination of biology and other disciplines of engineering, math, and computers — such as the modeling of blood flow through the heart — is not a novel advancement. Modern biological research, however, has focused on understanding biological complexity at the molecular level involving cells, DNA, and proteins. A new, growing field of research applies computational modeling and engineering to this analysis to help answer questions biologists have about the miniscule interactions inside a living organism.

In a multicellular organism, every cell contains the same exact copy of DNA. DNA is like a cookbook of genes that give the cell instructions on how to build a variety of proteins that perform different functions inside the cell. However, at any given time, each cell produces only a specific set of proteins that it needs. Different cell types, such as heart cells versus hair cells, produce a unique set of proteins that are specific toward the function of the cell.

A large portion of biological research involves understanding the functions of each protein and, more importantly, the interactions between proteins and other molecules such as DNA. Cells have a complex system by which one protein can regulate the expression of other proteins. Biologists want to model this system, but the task is challenging because the regulatory networks are very complex.

Modern computational techniques have attempted to address this issue by utilizing mRNA expression data. When building a protein, a cell converts the gene in the form of DNA into an intermediate molecule of mRNA that is directly used to build the protein. By measuring the specific mRNA present in a cell at a given time, scientists can get a good idea of the different proteins that are being expressed in that cell.

Based on this data, computational biologists can generate a graph or map known as a Markov model in where the nodes of the graph are proteins and the edges of the graph are connections that exist between proteins. Two proteins are connected if one impacts another’s gene expression.

Previously, scientists favored microarray technology to determine mRNA expression of a specific gene in an organism and to generate a map similar to a gene regulatory network. However, there are limitations with this method.

According to Puniyani, microarray data only present a binary, on-or-off status of a gene and is ill-suited for multicellular organisms because it only represents the average expression in a sample and can lead to severe information loss and inaccuracy in further experiments.

Puniyani explained ISH technology reveals a more holistic view of the activities and functions of genes. In this technique, fluorescent tags bind to a specific mRNA sequence in an organism. It can then be imaged as a microscope identifies where and how much mRNA is being expressed at a given time.

SPEX, a previously developed computer vision tool, analyzes the images and determines gene expression. With this data, scientists determine the collection of images or “bags” in which each gene appears and constructs a Markov random field graph structure. Unlike microarray data, which produces a unique value for each gene expression, this model is represented by a vector-valued spatial pattern. The program then determines similarity between bags and estimates a probable gene interaction network.

To test GINI, Puniyani and Xing produced a small artificial data set and found that their program generated a reasonable gene network. The ability to understand gene networks is extremely useful in understanding how cells differentiate into distinct types. For example, multicellular organisms begin as a single cell. As the cell divides and the embryo grows, cells specialize and different proteins are expressed into different cells.

The uniqueness of protein expression results in some cells becoming part of the head of the fly while others become part of its posterior end, for example. As the embryo grows, cells differentiate into 14 segments of the adult fly and the gene interaction network changes as new genes are expressed in different cell types.

Puniyani found that the Berkeley Drosophila Genome Project had generated 110,000 ISH images of fruit fly embryos capturing patterns of 7,516 genes and provided an ideal data set to test the viability of GINI. Puniyani was glad to find that GINI returned a very probable network and displayed certain gene interactions that had already been mentioned in the literature.

GINI can be applied to a variety of experiments and provides biologists with an efficient tool to build gene interaction networks. It surpasses many of the limitations of the previous microarray method. Following the completion of their work, Puniyani and Xing’s research was published in the October issue of PLOS Computational Biology.

Looking forward, Puniyani hopes to expand the tool to combine data from multiple states and have GINI produce a time varying Markov model. This idea has been successfully implemented with microarray data, but Puniyani added that it is much more difficult to implement with image data.