|
| Computational mass-spectrometry |
| |
Significant technological advances have accelerated high-throughput proteomics to the automated generation of millions of tandem
mass spectra on a daily basis.As a result,
the existing approaches that compare spectra against databases are already facing a bottleneck, particularly when interpreting
spectra of modified peptides. The computational challenge is using data redundancy and a concept that allows one to perform an
MS/MS database search without ever comparing a spectrum against a database to make spectra interpretation faster and more accurate.
|
|
|
|
| Fragment assembly in DNA sequencing |
| |
In the last year, high-throughput sequencing technologies have progressed from proof-of-concept to production quality. While these methods produce high-quality reads, they have yet
to produce reads comparable in length to Sanger-based sequencing. Current fragment assembly algorithms have been implemented and optimized for mate-paired Sanger-based reads, and thus
do not perform well on short reads produced by short-read technologies. The challenge is to develop new assembler that generates optimal short read assemblies of genomes.
|
|
|
|
| Molecular modeling of proteins |
| |
Three-dimensional structure of proteins changes during the course of evolution. These changes cause differences between related proteins and eventually
lead to adapt novel biochemical functions. Another source of the structural differences between proteins is their function itself. Different metabolic forms
of the same protein play an essential role in understanding structure-function relationships. The main challenge is in modeling of plausible protein structures
and functions based on known structures and functions.
|
|
|
|
| Disease association in case-control studies |
| |
Accessibility of high-throughput genotyping technology allows genome-wide association studies for common complex diseases.
The main challenges commonly facing such studies are (i) searching an
enormous amount of possible gene interactions and (ii) finding reproducible associations.
These challenges have been traditionally addressed in statistics while we apply computational approaches - optimization and
cross-validation.
|
|
|
|
| Disease susceptibility prediction |
| |
In the last ten years, substantial progress has been made in identifying why some people are particularly susceptible to specific
genetic or infectious diseases. Extensive evidence has now accumulated that host genes and spontaneous mutations are important determinants
of the outcome phenotypes for many common pathogens. The computational challenge is to find and combine disease risk factors present
in the data and accurately predict disease susceptibility.
|
|
|
|
| Tagging: informative SNP selection |
| |
The search for the association between complex diseases and single nucleotide polymorphisms (SNPs) or haplotypes has been recently
received great attention. For these studies, it is essential to use a small subset of informative SNPs (tag SNPs) accurately representing the rest
of the SNPs. Tagging can achieve budget savings by genotyping only a limited number of SNPs and computationally inferring all other SNPs
and compaction of extremely long SNP sequences (obtained, e.g., from Affymetrix Map Array) for further fine genotype analysis.
the challenge is to choose those tags from the SNPs under consideration which predict (or statistically cover) the non-tag SNPs
in the best way.
|
|
|
|
| Phasing: haplotype inference from genotype data |
| |
Phased genotype data are believed to be critical for the genetic analysis of disease susceptibility and other complex traits.
Emerging microarray technologies (Affymetrix) allow genotyping (SNP-typing) of long genome sequences resulting in huge amount
of data generated by individual and international efforts (see HapMap). A key challenge in analyzing of such huge amount of
data is scalable and accurate computational inferring of haplotypes (i.e., splitting of each genotype into a pair of corresponding haplotypes).
|
|