Gold University of Minnesota M. Skip to main 
  content.University of Minnesota. Home page.

Funded by

NSF logo NIH logo


dtc logo

Bioinformatics Summer Institute

Faculty Projects for the 2007 Summer Bioinformatics Institute

we will be adding new projects as they become available

John Crow

Genetic association studies and statistical learning techniques
Important to the practice and development of personalized medicine is the premise that knowledge of a patient's genotype can help guide his or her treatment. Data sets are analyzed to infer rules and functional relationships between genotype and phenotype (e.g., patient health, treatment outcome); the discovered patterns of association can validated scientifically and subsequently used for guidance in the clinical setting. From a different perspective, exploring empirical genotype-phenotype relationships can help identify those genes and molecular mechanisms playing a significant role, for example, in cancer remission.

In this project we will apply statistical learning techniques to discover and characterize patterns of association between genotype and observed myeloma survival. We will apply machine learning techniques, specifically boosting and random forests, to data sets produced by the Van Ness lab's "Bank-On-A-Cure SNP Chip." Main requirements are an interest in applying mathematical ideas and a good working knowledge of a programming language, preferably Java, Ruby, or Perl.

Semantic web technologies for distributed research informatics
A highly visible area of bioinformatics involves the development of software tools used directly by researchers to explore their data sets and to look up specialized reference information. For the most part, these tools attempt to link your data to existing information. But where does the underlying information come from?

A distributed information model views its world as a community of autonomous information providers and consumers, and there the software tool a researcher is using is a consumer of information. Semantic web technologies are useful in distributed information models. In this project we will explore the use of semantic web technologies to support the informatics needs of small research collaborations. Information providers and consumers will be created, and the roles of ontologies, metadata, and queries examined. Due to nature of this effort, a good background in Java or Ruby is required.

Kevin Dorfman

Graphical User Interface for Brownian Dynamics Simulations of Polymers and Biomolecules
Brownian dynamics is a powerful method for simulating the motion of polymers and biomolecules (such as DNA) as they move through complicated geometries, such as a gel. Our group is interested in developing new methods for separating DNA in very small scale structures, and Brownian dynamics is one of the tools that we use to theoretically investigate possible separation techniques. The goal of this project is to develop a graphical user interface (GUI) that will allow users that are unfamiliar with the simulation code to still take advantage of the method. The interface will allow the user to construct the biomolecule and the surrounding environment, input the force fields governing the motion, and then visualize the dynamical results.
Required Skills: Familiarity with some structured programming language.

Low Copy Number PCR in Natural Convection Cells
Polymerase Chain Reaction (PCR) is a standard biochemical method for making many copies of a particle sequence of DNA. The reaction requires cycling the temperature of the PCR mixture to perform the various steps of the reaction. Recently, there has been interest in putting the PCR mixture between a hot plate (on the bottom) and a cold plate (on the top) to drive a natural convection flow between the plates, which could be used for the temperature cycling in a portable PCR device. The goal of this project is to study PCR in a natural convection cell when there are very few starting molecules of DNA, which could be used to detect a rare pathogen. The intern will use a reactive Brownian dynamics simulation model to study the process and determine the parameters that play a key role in the ability to detect very small initial concentrations of DNA.
Required Skills: Prior course on chemical kinetics (e.g., first order reactions) and familiarity with some structured programming language.

Lynda Ellis

Encoding Metabolic Logic
Prediction of microbial metabolism is important for annotating genome sequences and for understanding the fate of chemicals in the environment.

A metabolic Pathway Prediction System has been developed that is freely available on the world wide web (http://umbbd.msi.umn.edu/predict/). It recognizes the organic functional groups found in a compound and predicts transformations based on metabolic rules. These rules are based on reactions catalogued in the University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD). The rule-based nature of the Pathway Prediction System makes it transparent, expandable, and adaptable. Join with us to expand the UM-BBD and its predictive system; learn metabolic logic, and user interface design. Requires knowledge of college-level organic chemistry and computer programming (Java and/or Perl).

Yiannis Kaznessis

Design of genetic regulatory networks
We want to learn to command cells to make specific proteins. These proteins can be catalysts, synthesizing specialty chemicals like pharmaceuticals, or sensor proteins for biological weapons like anthrax, or therapeutic proteins like insulin. To do this we need to understand dynamic gene regulation (when and how DNA gives protein) and to design gene networks (genes influencing the expression of other genes) that perform the tasks at hand, in response to our signals. We have written a code that simulates gene networks and can be used to design novel gene circuits, such as the oscillator, the digital clock and The student’s challenge would be to construct interesting designs of genetic circuits that perform specific tasks. Two recent examples of successful designs have been a switch and an oscillator. Future applications of genetic circuits might include biosensors, targeted drug delivery, molecular machines, and biochemical factories.

Design of Antimicrobial Peptides
Antimicrobial peptides are molecules produced by the immune system of animals and plants are being considered potential novel antibiotic candidates to combat emerging drug-resistant bacterial strains. The peptides are known to kill bacterial cells by direct membrane attack. Most of known AMPs are also toxic killing mammalian cells, again by direct membrane attack. The mechanism of action of these peptides is not yet clear. We work towards designing and implementing computational solutions to fill the void. We use molecular dynamics simulations of peptides in mammalian and bacterial model membranes to determine the structural characteristics responsible for activity and toxicity. We use this knowledge in designing new peptides that retain their antimicrobial activity but are not toxic.

Develop Wikigene
Using information in SQL the goal of this project is develop the Wikigene as a community-wide effort to catalogue gene regulatory thermodynamic and kinetic constant interactions. Working knowledge of SQL, HTML, Java and an understanding of Wikis is necessary.

Synthetic Biology
The University of Bioinformatics Bioinformatics Summer Institute will participate in the Summer 2007 International Genetically Engineered Machine Competition (www.igem2007.com). iGEM is an undergraduate Synthetic Biology competition. Student teams are given a kit of biological parts at the beginning of the summer. Working at their own schools over the summer, they use these parts and new parts of their own design to build biological systems and operate them in living cells. During the first weekend of November, they present their work at the iGEM Competition Jamboree at MIT and have a chance to win prizes. They add their new parts to the Registry of Standard Biological Parts for the students in the next year's competition.

Vipin Kumar

Advances in high throughput experimental methodologies, have created a large variety and quantity of biological data. This data is key for tasks such as identification of relationships among groups of genes within a specific genome, prediction of the functions of anonymous genes, construction of functional networks from these relationships, and differential analysis across genomes of related, but distinct organisms. Data mining is useful for discovering interesting patterns in large data sets. We propose two projects to apply data mining for extracting biologically meaningful patterns from SNP data and analyzing differences in codon frequency between different species of yeast.

Data Mining for Connecting SNPs and Disease
One of the important potential benefits of the genetic revolution is the possibility of personalized medicine, i.e., using detailed genomic information about a person for the detection, treatment, or prevention of disease. The recent availability of individual genomic information typically in the form of Single Nucleotide Polymorphisms (SNPs) offers one route for making this possibility a reality. In particular, the increasing availability of SNP data has created opportunities for discovering important connections between disease and genomic factors. Although there has been some success in finding such connections with currently available techniques, these approaches have a number of limitations and are most useful for finding connections involving only one or two SNPs. This project would investigate the use of data mining techniques to find more general patterns that capture connections between SNPs and disease, including patterns that may involve a relatively large number of SNPs and patterns that show variation from patient to patient, either because of missing data or natural variation.

Vipin Kumar with Judith Berman
Analysis of Codon Usage in Yeast
This project involves the study of the genome sequences of several yeast species to determine the effects of the ambiguity in the genetic code on the genes themselves. It will also investigate to what extent these effects may differ depending on the functions of the affected genes. More specifically, yeasts are simple single-celled fungi that are the primary workhorses for a variety of industrial processes, such as the fermentation of beer and wine and the production of yogurt and other fermented milk products. Yeasts are also premier genetic models with abundant resources available for genome manipulation, and extensive sequence availability for a variety of related species. Among several sequenced yeasts, an unusual ancestral alteration has occurred, resulting in fundamental changes in the way these organisms translate genetic information into proteins, the versatile molecules that directly carry out the tasks of most cellular processes. The extent of this deviation from the standard genetic code varies throughout the family of yeasts. This makes this an ideal system to study the flexibility of organisms in responding to a change that, in principle, could systematically affect every protein the organism produces.

Nathan Springer

Understanding the mechanisms of intra-specific regulatory variation
The goal of this research project is to understand the molecular basis of phenotypic variation within a species. In other words, why do different strains or breeds of a species, i.e. a poodle and a pitbull, exhibit different phenotypes? This variation can be either quantitative (affecting the amount of a gene produced) or qualitative (affecting the nature of the gene produced). My lab uses allele-specific expression assays to study the prevalence and mechanisms of quantitative variation. We are interested in understanding how novel gene expression states arise and how they are maintained. My lab is studies the mechanisms that lead to quantitative variation. We are gathering data on the relative expression of two alleles in a heterozygote for a set of 500 genes. The project would involve the construction of a database for data handling and analysis. In addition, the student would use bioinformatics tools to characterize the genes that are being assayed and would have opportunities for lab work.

Nevin Young

Assembling a genome sequence and discovering novel genes
The foundation for most bioinformatics research is genome sequence. Efficiently assembling a coherent genome sequence out of thousands of short "reads" remains a challenging informatics problem. Sequencing projects for many complex organisms are now underway, including a model plant sequencing project at the University of Minnesota called "Medicago truncatula." In this project, we direct and manage sequencing data coming from several international sequencing centers and use that data to synthesize a reference genome sequence. Using the assembled genome sequence, our partners and we carry out gene annotation — the discovery of genes and gene features along the sequence — and present the sequence through a rich web interface. Of particular interest is the discovery of genes that have not been previously described. Potentially, these genes hold the key to new and novel biological functions. In addition to laboratory-based experiments to explore function, we take advantage of diverse and powerful software to reveal the properties of these new and novel genes.