Faculty Projects for the 2007 Summer Bioinformatics Institute
we will be adding
new projects as they become available
John Crow
Genetic association studies and statistical learning techniques
Important to the practice and development of personalized medicine is the
premise that knowledge of a patient's genotype can help guide his or her
treatment. Data sets are analyzed to infer rules and functional
relationships between genotype and phenotype (e.g., patient health,
treatment outcome); the discovered patterns of association can validated
scientifically and subsequently used for guidance in the clinical setting.
From a different perspective, exploring empirical genotype-phenotype
relationships can help identify those genes and molecular mechanisms playing
a significant role, for example, in cancer remission.
In this project we will apply statistical learning techniques to discover
and characterize patterns of association between genotype and observed
myeloma survival. We will apply machine learning techniques, specifically
boosting and random forests, to data sets produced by the
Van Ness lab's "Bank-On-A-Cure SNP
Chip." Main
requirements are an interest in applying mathematical ideas and a good
working knowledge of a programming language, preferably Java, Ruby, or
Perl.
Semantic web technologies for distributed research
informatics
A highly visible area of bioinformatics involves the development of software
tools used directly by researchers to explore their data sets and to look up
specialized reference information. For the most part, these tools attempt to
link your data to existing information. But where does the underlying
information come from?
A distributed information model views its world as a community of autonomous
information providers and consumers, and there the software tool a
researcher is using is a consumer of information. Semantic web technologies
are useful in distributed information models. In this project we will
explore the use of semantic web technologies to support the informatics
needs of small research collaborations. Information providers and consumers
will be created, and the roles of ontologies, metadata, and queries
examined. Due to nature of this effort, a good background in Java or Ruby is
required.
Kevin Dorfman
Graphical User Interface for Brownian
Dynamics Simulations of Polymers and Biomolecules
Brownian dynamics is a powerful method for simulating the motion of polymers
and biomolecules (such as DNA) as they move through complicated geometries,
such as a gel. Our group is interested in developing new methods for
separating DNA in very small scale structures, and Brownian dynamics
is one of the tools that we use to theoretically investigate possible
separation techniques. The goal of this project is to develop a graphical
user interface (GUI) that will allow users that are unfamiliar with the
simulation code to still take advantage of the method. The interface will
allow the user to construct the biomolecule and the surrounding environment,
input the force fields governing the motion, and then visualize the dynamical
results.
Required Skills: Familiarity with some structured
programming language.
Low Copy Number PCR in Natural Convection Cells
Polymerase Chain Reaction (PCR) is a standard biochemical method for making
many copies of a particle sequence of DNA. The reaction requires cycling the
temperature of the PCR mixture to perform the various steps of the reaction.
Recently, there has been interest in putting the PCR mixture between a hot
plate (on the bottom) and a cold plate (on the top) to drive a natural
convection flow between the plates, which could be used for the temperature
cycling in a portable PCR device. The goal of this project is to study PCR
in a natural convection cell when there are very few starting molecules of
DNA, which could be used to detect a rare pathogen. The intern will use a
reactive Brownian dynamics simulation model to study the process and
determine the parameters that play a key role in the ability to detect
very small initial concentrations of DNA.
Required Skills: Prior course on chemical kinetics
(e.g., first order reactions) and familiarity with some structured
programming language.
Lynda Ellis
Encoding Metabolic Logic
Prediction of microbial metabolism is important
for annotating genome sequences and for understanding the fate of
chemicals in the environment.
A metabolic Pathway Prediction System has been developed that is freely
available on the world wide web (http://umbbd.msi.umn.edu/predict/).
It recognizes the organic functional groups found in a compound and
predicts transformations based on metabolic rules. These rules are based
on reactions catalogued in the University of Minnesota
Biocatalysis/Biodegradation Database (UM-BBD). The rule-based nature of
the Pathway Prediction System makes it transparent, expandable, and
adaptable. Join with us to expand the UM-BBD and its predictive system;
learn metabolic logic, and user interface design. Requires knowledge of
college-level organic chemistry and computer programming (Java and/or
Perl).
Yiannis Kaznessis
Design of genetic regulatory
networks
We want to learn to command cells to make specific proteins. These
proteins can be catalysts, synthesizing specialty chemicals like
pharmaceuticals, or sensor proteins for biological weapons like
anthrax, or therapeutic proteins like insulin. To do this we need
to understand dynamic gene regulation (when and how DNA gives protein)
and to design gene networks (genes influencing the expression of other
genes) that perform the tasks at hand, in response to our signals. We
have written a code that simulates gene networks and can be used to
design novel gene circuits, such as the oscillator, the digital clock
and The student’s challenge would be to construct interesting
designs of genetic circuits that perform specific tasks. Two recent
examples of successful designs have been a switch and an oscillator.
Future applications of genetic circuits might include biosensors,
targeted drug delivery, molecular machines, and biochemical
factories.
Design of Antimicrobial Peptides
Antimicrobial peptides are molecules produced by the immune system of
animals and plants are being considered potential novel antibiotic
candidates to combat emerging drug-resistant bacterial strains. The
peptides are known to kill bacterial cells by direct membrane attack.
Most of known AMPs are also toxic killing mammalian cells, again by
direct membrane attack. The mechanism of action of these peptides is
not yet clear. We work towards designing and implementing computational
solutions to fill the void. We use molecular dynamics simulations of
peptides in mammalian and bacterial model membranes to determine the
structural characteristics responsible for activity and toxicity. We
use this knowledge in designing new peptides that retain their
antimicrobial activity but are not toxic.
Develop Wikigene
Using information in SQL the goal of this project is develop the
Wikigene as a community-wide effort to catalogue gene regulatory
thermodynamic and kinetic constant interactions. Working knowledge of
SQL, HTML, Java and an understanding of Wikis is necessary.
Synthetic Biology
The University of Bioinformatics Bioinformatics Summer Institute will
participate in the Summer 2007 International Genetically Engineered
Machine Competition (www.igem2007.com).
iGEM is an undergraduate
Synthetic Biology competition. Student teams are given a kit of
biological parts at the beginning of the summer. Working at their own
schools over the summer, they use these parts and new parts of their
own design to build biological systems and operate them in living
cells. During the first weekend of November, they present their work
at the iGEM Competition Jamboree at MIT and have a chance to win
prizes. They add their new parts to the Registry of Standard
Biological Parts for the students in the next year's competition.
Vipin Kumar
Advances in high throughput experimental methodologies,
have created a large variety and quantity of biological data. This data
is key for tasks such as identification of relationships among groups of
genes within a specific genome, prediction of the functions of anonymous
genes, construction of functional networks from these relationships, and
differential analysis across genomes of related, but distinct organisms.
Data mining is useful for discovering interesting patterns in large data
sets. We propose two projects to apply data mining for extracting
biologically meaningful patterns from SNP data and analyzing differences
in codon frequency between different species of yeast.
Data Mining for Connecting SNPs and Disease
One of the important potential benefits of the genetic revolution is the
possibility of personalized medicine, i.e., using detailed genomic
information about a person for the detection, treatment, or prevention
of disease. The recent availability of individual genomic information
typically in the form of Single Nucleotide Polymorphisms (SNPs) offers
one route for making this possibility a reality. In particular, the
increasing availability of SNP data has created opportunities for
discovering important connections between disease and genomic factors.
Although there has been some success in finding such connections with
currently available techniques, these approaches have a number of limitations
and are most useful for finding connections involving only one or two SNPs.
This project would investigate the use of data mining techniques to find more
general patterns that capture connections between SNPs and disease, including
patterns that may involve a relatively large number of SNPs and patterns that
show variation from patient to patient, either because of missing data or
natural variation.
Vipin Kumar with Judith Berman
Analysis of Codon Usage in Yeast
This project involves the study of the genome sequences of several yeast
species to determine the effects of the ambiguity in the genetic code on
the genes themselves. It will also investigate to what extent these effects
may differ depending on the functions of the affected genes. More
specifically, yeasts are simple single-celled fungi that are the primary
workhorses for a variety of industrial processes, such as the fermentation
of beer and wine and the production of yogurt and other fermented milk
products. Yeasts are also premier genetic models with abundant resources
available for genome manipulation, and extensive sequence availability for
a variety of related species. Among several sequenced yeasts, an unusual
ancestral alteration has occurred, resulting in fundamental changes in the
way these organisms translate genetic information into proteins, the
versatile molecules that directly carry out the tasks of most cellular
processes. The extent of this deviation from the standard genetic code
varies throughout the family of yeasts. This makes this an ideal system
to study the flexibility of organisms in responding to a change that, in
principle, could systematically affect every protein the organism
produces.
Nathan Springer
Understanding the mechanisms of
intra-specific regulatory variation
The goal of this research project is to understand the
molecular basis of phenotypic variation within a species. In other
words, why do different strains or breeds of a species, i.e. a poodle
and a pitbull, exhibit different phenotypes? This variation can be
either quantitative (affecting the amount of a gene produced) or
qualitative (affecting the nature of the gene produced). My lab uses
allele-specific expression assays to study the prevalence and
mechanisms of quantitative variation. We are interested in
understanding how novel gene expression states arise and how they are
maintained. My lab is studies the mechanisms that lead to
quantitative variation. We are gathering data on the relative
expression of two alleles in a heterozygote for a set of 500 genes.
The project would involve the construction of a database for data
handling and analysis. In addition, the student would use
bioinformatics tools to characterize the genes that are being assayed
and would have opportunities for lab work.
Nevin Young
Assembling a genome sequence and
discovering novel genes
The foundation for most bioinformatics research is genome sequence.
Efficiently assembling a coherent genome sequence out of thousands of
short "reads" remains a challenging informatics problem. Sequencing
projects for many complex organisms are now underway, including a model
plant sequencing project at the University of Minnesota called "Medicago
truncatula." In this project, we direct and manage sequencing data coming
from several international sequencing centers and use that data to
synthesize a reference genome sequence. Using the assembled genome
sequence, our partners and we carry out gene annotation — the
discovery of genes and gene features along the sequence — and
present the sequence through a rich web interface. Of particular
interest is the discovery of genes that have not been previously described.
Potentially, these genes hold the key to new and novel biological functions.
In addition to laboratory-based experiments to explore function, we take
advantage of diverse and powerful software to reveal the properties of
these new and novel genes.
|