High Performance Computing
The online analysis of abnormal runtime behavior is essential in streaming scientific workflows for performance tuning. Integration of anomaly detection and visualization is necessary to support efficient and scalable human-centered analysis. We propose a visual analytics system for the online performance analysis toward the exascale scenario. Our approach adopts the call stack tree representation, which encodes the structural and temporal information of the function executions. We employ online anomaly detection approaches to identify the candidate anomalies. We also devise a set of visualization tools for the verification and exploration, ICCS 2019 (Accepted)
Machine Learning & Data Science
Research Assistant @ Data Science Lab, Stony Brook University
With the dataset from the biggest dating service in South Korea; each user profile, such as self-photos, age, college, job, etc., which was numerically rated by other random users of the opposite sex, experimented Ridge, Lasso, Elasticnet regression models and SVM classifiers with PCA reduction, convolutional deep neural networks. Utilized Jupyter Notebook, Pandas, Scikit-Learn, Tensorflow, etc. This project was awarded as the URECA Research Fellowship in 2017. (06.2017-12.2017)
Research Assistant @ Visual Analytics & Imaging Lab, Stony Brook University
The goal was to reconstruct CT X-ray scan images from the incomplete sinogram with the limited angle using deep learning techniques to reduce radiation exposure. Implemented the Deep Convolutional Generative Adversarial Networks (DCGAN) for the image completion tasks. Trained 10K randomly varied X-Ray pictures and generated/completed the missing parts. (02.2017-12.2017)
CSE353: Machine Learning @ Stony Brook University
With sentiment polarity datasets, collections of movie-review documents labeled as positive or negative, experimented machine learning algorithms, such as K-Means, K-Nearest-Neighbor, Naive Bayes Classifier, and Multilayer Perceptron. Utilized Jupyter Notebook, Pandas, etc.
CSE391: Data Science @ Stony Brook University
With crime dataset derived from SFPD Crime Incident Reporting system from 2003 to 2015, processed Bag-of-Words for word embedding, experimented Logistic Regression, Random Forest, and Naive Bayes Classification. Achieved 99% accuracy. Utilized Jupyter Notebook, Pandas, Scikit-Learn, etc.
AMS315: Data Analysis @ Stony Brook University
With the gene-environment interaction datasets, experimented correlation analysis, normality test, Box-Cox transformation, step-wise regression, and lasso regression and the optimized model to find the best optimized model for prediction. Utilized R, Jupyter Notebook, Pandas, etc.