Abstract
One of the most surprising discoveries made by analyzing the human genome
was the relatively small number of genes that were found. Many explanations
have been given for this, but the one that most intrigues me is the notion
these studies looked primarily for protein coding genes, and may have missed
another class of genes -- non-coding RNA (ncRNA) genes, which are
transcribed into functional RNAs, but not translated into proteins.
In this talk, I will present a set new algorithms for finding and analyzing
ncRNA genes. The first topic is how to design structure-based filters and
sequence-based filters to speed up the search for homologs in the genomes.
State-of-the-art methods for the problem, like covariance models, suffer
from high computational cost, underscoring the need for efficient filtering
approaches that can identify promising sequence segments and speed up the
detection process. Our approach, based on structural and sequence filters
that eliminate a large portion of the database while retaining the true
homologs, allows us to search a typical bacterial database in minutes on a
standard PC with high sensitivity and specificity. The second topic is
a novel framework to predict the common secondary structure for
unaligned RNA sequences. By matching putative stems in RNA
sequences, we make use of both primary sequence information and
thermodynamic stability for prediction at the same time. I will also
describe some of our findings from bacterial genomes, metagenomic
data from the ocean, and mammal genomes by using these methods.
Short Bio:
Shaojie Zhang is an Assistant Professor in the School of Electrical Engineering
and Computer Science at the University of Central Florida. His research
is focused on bioinformatics and comparative genomics, which includes ncRNA
gene finding, ncRNA analysis, and other genome annotation problems. Shaojie Zhang
received his B.S. in Computer Science from Peking University, Beijing, China, M.Eng.
from Nanyang Technological University, Singapore, and Ph.D. in
Computer Science from the University of California, San Diego.