R E S E A R C H  |  << Back  |  

Computational mass-spectrometry
  Significant technological advances have accelerated high-throughput proteomics to the automated generation of millions of tandem mass spectra on a daily basis.As a result, the existing approaches that compare spectra against databases are already facing a bottleneck, particularly when interpreting spectra of modified peptides. The computational challenge is using data redundancy and a concept that allows one to perform an MS/MS database search without ever comparing a spectrum against a database to make spectra interpretation faster and more accurate.



Fragment assembly in DNA sequencing
  In the last year, high-throughput sequencing technologies have progressed from proof-of-concept to production quality. While these methods produce high-quality reads, they have yet to produce reads comparable in length to Sanger-based sequencing. Current fragment assembly algorithms have been implemented and optimized for mate-paired Sanger-based reads, and thus do not perform well on short reads produced by short-read technologies. The challenge is to develop new assembler that generates optimal short read assemblies of genomes.



Molecular modeling of proteins
  Three-dimensional structure of proteins changes during the course of evolution. These changes cause differences between related proteins and eventually lead to adapt novel biochemical functions. Another source of the structural differences between proteins is their function itself. Different metabolic forms of the same protein play an essential role in understanding structure-function relationships. The main challenge is in modeling of plausible protein structures and functions based on known structures and functions.



Disease association in case-control studies
  Accessibility of high-throughput genotyping technology allows genome-wide association studies for common complex diseases. The main challenges commonly facing such studies are (i) searching an enormous amount of possible gene interactions and (ii) finding reproducible associations. These challenges have been traditionally addressed in statistics while we apply computational approaches - optimization and cross-validation.



Disease susceptibility prediction
  In the last ten years, substantial progress has been made in identifying why some people are particularly susceptible to specific genetic or infectious diseases. Extensive evidence has now accumulated that host genes and spontaneous mutations are important determinants of the outcome phenotypes for many common pathogens. The computational challenge is to find and combine disease risk factors present in the data and accurately predict disease susceptibility.



Tagging: informative SNP selection
  The search for the association between complex diseases and single nucleotide polymorphisms (SNPs) or haplotypes has been recently received great attention. For these studies, it is essential to use a small subset of informative SNPs (tag SNPs) accurately representing the rest of the SNPs. Tagging can achieve budget savings by genotyping only a limited number of SNPs and computationally inferring all other SNPs and compaction of extremely long SNP sequences (obtained, e.g., from Affymetrix Map Array) for further fine genotype analysis. the challenge is to choose those tags from the SNPs under consideration which predict (or statistically cover) the non-tag SNPs in the best way.



Phasing: haplotype inference from genotype data
  Phased genotype data are believed to be critical for the genetic analysis of disease susceptibility and other complex traits. Emerging microarray technologies (Affymetrix) allow genotyping (SNP-typing) of long genome sequences resulting in huge amount of data generated by individual and international efforts (see HapMap). A key challenge in analyzing of such huge amount of data is scalable and accurate computational inferring of haplotypes (i.e., splitting of each genotype into a pair of corresponding haplotypes).

     R E S E A R C H  |  << Back  |