I am a post-doctoral research scientist at Columbia University, collaborating with Chaolin Zhang and other wonderful members in the Zhang Lab. Previously, I earned my Ph.D. in computer science at Washington University in St. Louis under the mentorship of Gary Stormo. My current research focuses on developing computational methods for analyzing protein-DNA/RNA interactions.
A central theme in my research is to consider what representation is suited for DNA sequences that give the most helpful information. Traditionally, there are k-mers, position-weight-matrices, parametric statistical models like HMMs, and more recently, deep neural networks. I am particularly interested in sparse representations, a rich framework that allows us to build techniques to answer challenging inferential questions in regulatory genomics.
For example, in regulatory genomics, we want to infer the following:
The DNA sequences' regulatory elements, e.g., the motifs.
The key regulatory elements in the DNA sequences give rise to the observed phenomenon.
The counterfactuals, e.g., a minimal change to a DNA sequence that turns off the observed effects.
We can obtain a sparse representation of DNA sequences using principled deep learning techniques, e.g., deep unfolding. We use sparse representations to reveal many more hidden motifs not shown on the JASPAR database. Our result shows that sparse representation is a scalable and interpretable approach to biological sequence problems.
My name in Mandarin is Kuei-Hsien Chu (朱桂賢). Hsien pronounced like Shane, and that's how I got Shane. I grew up in Taiwan and have been living in the US since 2008.