High Performance Computing

Visual Analytics in High Performance Computing (HPC)

The online analysis of abnormal runtime behavior is essential in streaming scientific workflows for performance tuning. Integration of anomaly detection and visualization is necessary to support efficient and scalable human-centered analysis. We propose a visual analytics system for the online performance analysis toward the exascale scenario. Our approach adopts the call stack tree representation, which encodes the structural and temporal information of the function executions. We employ online anomaly detection approaches to identify the candidate anomalies. We also devise a set of visualization tools for the verification and exploration, ICCS 2019 (Accepted)

Machine Learning & Data Science

Prediction for Human Intersexual Attractiveness

Research Assistant @ Data Science Lab, Stony Brook University

With the dataset from the biggest dating service in South Korea; each user profile, such as self-photos, age, college, job, etc., which was numerically rated by other random users of the opposite sex, experimented Ridge, Lasso, Elasticnet regression models and SVM classifiers with PCA reduction, convolutional deep neural networks. Utilized Jupyter Notebook, Pandas, Scikit-Learn, Tensorflow, etc. This project was awarded as the URECA Research Fellowship in 2017. (06.2017-12.2017)

CT X-Ray Reconstruction using Deep Learning

Research Assistant @ Visual Analytics & Imaging Lab, Stony Brook University

The goal was to reconstruct CT X-ray scan images from the incomplete sinogram with the limited angle using deep learning techniques to reduce radiation exposure. Implemented the Deep Convolutional Generative Adversarial Networks (DCGAN) for the image completion tasks. Trained 10K randomly varied X-Ray pictures and generated/completed the missing parts. (02.2017-12.2017)

Sentiment Polarity Classification for Movie Reviews

CSE353: Machine Learning @ Stony Brook University

With sentiment polarity datasets, collections of movie-review documents labeled as positive or negative, experimented machine learning algorithms, such as K-Means, K-Nearest-Neighbor, Naive Bayes Classifier, and Multilayer Perceptron. Utilized Jupyter Notebook, Pandas, etc.

San Francisco Crime Classification

CSE391: Data Science @ Stony Brook University

With crime dataset derived from SFPD Crime Incident Reporting system from 2003 to 2015, processed Bag-of-Words for word embedding, experimented Logistic Regression, Random Forest, and Naive Bayes Classification. Achieved 99% accuracy. Utilized Jupyter Notebook, Pandas, Scikit-Learn, etc.

Gene-Environment Interaction Prediction

AMS315: Data Analysis @ Stony Brook University

With the gene-environment interaction datasets, experimented correlation analysis, normality test, Box-Cox transformation, step-wise regression, and lasso regression and the optimized model to find the best optimized model for prediction. Utilized R, Jupyter Notebook, Pandas, etc.