Conducted research project to predict human-attractiveness as numerical scores through Ridge, Logistic regression and SVM classification with PCA dimensional reduction.

Research Advisors

Professor Steven Skiena, Professor Minh Hoai Nguyen, and Professor Martin Radfar

Abstract

The goal of this project was to predict human attractiveness by several machine learning algorithms with user profile data, such as age, height, college, and self-taken pictures, used in cooperation with the biggest dating service in South Korea. In order to join the dating service, each user must be rated based on their attractiveness on a 5.0 scale by other random members (of the opposite sex).

This project was conducted under highly-secured circumstances by cooperating with an authorized colleague located at the corporate branch.

NOTE: The designing and coding of the model was done at Stony Brook University while applying the code with the actual data was safely done only under the conditions that the corporation had provided in South Korea (research advisors only suggested advice/directions and never saw/touched the data).

NOTE: The corporate team and my team arranged not to mention/open any kind of data from the service. Thus, detailed information regarding the data will be skipped in all further explanations.

Results

Ridge Regression

Under advisement by Professor Steven Skiena, Ridge regression with 10K samples: 54 features (age, height, college, etc.) + 1024 features of self-taken picture images (extracted from pre-trained deep learning model: Inception-v3) was conducted.

- R^2 RMSE
Men 0.527 0.548
Women 0.417 0.581
ridge ridge
Men Women

SVM Classifier

Under advisement by Professor Minh Hoai Nguyen, binary classification between Non Attractive (1.0 to 2.5) and Attractive (3.5 to 5.0) classes trough Support Vector Machine with One-vs-Rest strategy with the same data was conducted.

NOTE Internal range of scores between larger than 2.5 and less than 3.5 (most populated) was excluded.

- TP FP TN FN Precision Recall FPR Accuracy F1
Men 1709 791 3209 499 0.69 0.78 0.19 0.80 0.72
Women 3159 704 1296 841 0.82 0.79 0.35 0.75 0.81

Logistic Regression

Under advisement by Professor Martin Radfar, PCA dimensional reduction on 13K face images for men extracted by self-taken profile pictures through the OpenCV face recognition algorithm was conducted. In addition, Logistic regression between Non Attractive (1.0 to 2.5) and Attractive (2.5 to 5.0) classes was conducted.

PCA

- TP FP TN FN Precision Recall FPR Accuracy F1
Men 6505 2162 3256 1434 0.73 0.73 0.39 0.73 0.78

Conclusion

Although this project was intended for IEEE ICMLA 2018, legal issues emerged regarding publication of research on data from South Korea. As a result, all further processes for this project have been unfortunately suspended.