I am a post-doctoral research scientist at Columbia University, collaborating with Chaolin Zhang and other wonderful members in the Zhang Lab. Previously, I earned my Ph.D. in computer science at Washington University in St. Louis under the mentorship of Gary Stormo. My current research focuses on deep learning interpretaions, and developing computational methods for analyzing (1) protein-DNA/RNA interactions and (2) high-throughput functional genomics/proteomics screenings.
A central theme in my research is to consider what representation is suited for biological sequences that give the most helpful information. Traditionally, there are k-mers, position-weight-matrices, parametric statistical models like HMMs, and more recently, deep neural networks. I am particularly interested in the sparse representations perspective, a rich view that allows us to build techniques to answer challenging inferential questions in regulatory genomics.
For example, in regulatory genomics, we want to infer the following:
The DNA sequences' regulatory elements, e.g., the motifs.
The key regulatory elements in the DNA sequences give rise to the observed phenomenon.
The counterfactuals, e.g., a minimal change to a DNA sequence that turns off the observed effects.
We can obtain a sparse representation of DNA sequences using principled deep learning techniques, e.g., deep unfolding. We use sparse representations to reveal many more hidden motifs not shown on the JASPAR database. Our result shows that sparse representation is a scalable and interpretable approach to biological sequence problems.
My name in Mandarin is Kuei-Hsien Chu (朱桂賢). Hsien pronounced like Shane, and that's how I got Shane. I grew up in Taiwan and have been living in the US since 2008.