Tim Ting Chen's Research Statement
Research Statement
I am interested in the analysis of algorithms which include string algorithms,
graph algorithms, computational geometry, randomized algorithms, and approximation algorithms,
and computational biology which I give some
description below.
I have worked on the problems of reconstructing gene regulatory networks,
sequencing DNA and peptides, reconstructing evolutionary trees
and evolutionary distances, gene-finding and protein analysis via
HPLC-tandem mass spectrometry.
My current research are focused on the following topics:
(1) the analysis of protein interactions, functions and pathways,
(2) the protein identification and sequencing
via HPLC-tandem mass spectrometry, and (3) the analysis of Human SNPs.
The analysis of protein interactions, functions and pathways
Protein interactions and functions are central in most proteomics
projects. We focus on the development of statistical
and computational methods for the analysis of protein interaction
data coming from high-throughput proteomic technologies such as
yeast two-hybrid assays and mass spectrometry, and protein
function data coming from databases of large-scale function
annotations. The research involves the study of the following two
important problems in biology: (1) identifying domain-domain
interactions and protein-domain interactions from a large number
of protein-protein interactions, and (2) assigning functions to
unknown proteins from annotated proteins using information of gene
expression profiles, protein sequence similarities, and
protein-protein interactions. We develop novel mathematical models
for protein-protein interactions based on domain-domain
interactions
and for functional associations based on protein-protein
interactions, gene expression profiles and protein sequence
similarity using the theory of Markovian random field (MRF). Based
on these models, we develop novel computational methods to
estimate domain-domain interactions and to predict protein
functions.
Protein Quantitation and Interaction via HPLC-tandem mass spectrometry
Tandem mass spectrometry,
combined with high performance liquid chromatography (HPLC),
has been widely used
to sequence peptides and analyze protein sequences.
We have developed methods to sequence and identify peptides and proteins
from tandem mass spectrum.
However, more interesting and important applications
are to quantitate proteins expressed in the cell and
to discover protein-protein interactions.
We are working on the relation between mass spectral intensities
and peptide concentrations through a collection of
carefully designed experiments.
Cross-linking technology combined with tandem mass spectrometry
is a powerful method that provides a rapid solution
to the discovery of protein-protein interactions and protein structures.
Previous studies of cross-linking have been able to produce low resolution
interatomic distance constraints, which in conjunction with
threading, has led to the determination of three-dimensional
structure of a model protein.
Similar techniques can be applied to ``dock'' the structures of
two interacting proteins.
We are designning algorithms for detecting the cross-linked peptides
and cross-linked amino acids from tandem mass spectral data.
Analysis of Human SNPs
Single nucleotide polymorphisms (SNPs) are promising markers for population genetic studies and for localizing genetic variations responsible for complex diseases. They are preferred to other genetic markers such as microsatellite markers due to their high abundance, relative low mutation rate, and easy adaptability to automatic genotyping. It is known that studies using haplotype information generally outperform single-marker analysis. Thus it is important to know the haplotype structure of the whole genome in the populations under study. Several groups have carried out the study of the haplotype structure in specific genes and populations. One objective in these studies is to partition the genome into blocks to minimize the total number of representative SNPs that are required to distinguish the majority haplotypes in each block. We designed a dynamic programming algorithm for the optimal haplotype partitioning. Our algorithm improves the result of the current best method by two-fold and is currently used by a company, named PERLEGEN SCIENCES, to select representative SNPs for large-scale screening. The program can be easily adapted to other measures of haplotype information.
Postdoc: Minghua Deng, Lei Zhuge
Students: Shipra Metah, Benjamin Lu, Yunhu Wan
back to Tim Chen's home page.