Prediction of Human Intersexual Attractiveness collaborated with the Biggest Dating Service in S.Korea

Institute: Stony Brook University

Advisors

Professor Steven Skiena, Professor Minh Hoai Nguyen, and Professor Martin Radfar

Summary

The goal of this project was to predict human intersexual attractiveness through several machine learning models based on individual profile, such as age, height, college, and self-photos collected by the biggest dating service in South Korea. To be more specific about the data, every user must be rated over 3.0 average scores out of 5.0 in total by other random members (in the opposite sex) in order to join the membership of the service. Hence, every member of the service have their own rating scores which represents their intersexual attractiveness from 1.0 to 5.0 scale. They have more than 1 M members in their service.

Data Credentials and Policy

This project was conducted under highly-secured circumstances by cooperating with an authorized colleague located at the corporate branch. Designing/implementing the model was done at Stony Brook University while applying the actual data was safely done only under the circumstance that the corporation had provided.

Ridge Regression

Approach

Since this is about prediction as numerical scores, I first decided to look into regression models, such as ridge and elastic net regression. There also lots of self-phots corresponding to each user. Psycologically, I also assumed that attractiveness is mainly from the appearance. Thus, I considered the self-photos as a independent variables of my prediction model by converting the images to a meaningful feature vectors. In order to do this, I applied pre-trained model such as inception-v3 to extract image vectors.

Outcomes

Ridge regression with 10K samples: 54 features (age, height, college, etc.) + 1024 features of self-photos extracted from the pre-trained deep neural network: Inception-v3)

- R^2 RMSE
Men 0.527 0.548
Women 0.417 0.581

Scatteplot (Actual vs. Pred)

Men Women
ridge ridge

SVM Classifier

R squared scores of the ridge regression model showed that the prediction was not well fitted. I tried binary classication Non Attractive (1.0 to 2.5) and Attractive (2.5 to 5.0) through Support Vector Machine with One-vs-Rest strategy to examine if binary relationship exist.

Outcomes

- Precision Recall FPR Accuracy F1
Men 0.69 0.78 0.19 0.80 0.72
Women 0.82 0.79 0.35 0.75 0.81

Logistic Regression

Approach

Binay classification on human attractiveness somehow worked. Since the appearance may be more important factor than other features, I decided to focus more on the appearance, especially face. I extracted faces from the photos by using OpenCV face recognition algorithm. I also applied PCA dimensional reduction on the face images to reduce the compuational costs and enhance meaningful feature relationships. Then, I applied logistic regression with binary classification between Non Attractive (1.0 to 2.5) and Attractive (2.5 to 5.0).

PCA

PCA Dimensional Reduction reduced 90% of features of face images from 4096 (64 by 64) dimensional features into 400 principal components features.

Outcomes

- Precision Recall FPR Accuracy F1
Men 0.73 0.73 0.39 0.73 0.78
Women 0.71 0.71 0.34 0.71 0.79

While conduction this research more deeply, the legal issues emerged regarding publication of research with the data from South Korea. As a result, all further processes for this project have been unfortunately suspended.

Even though I haven’t tried deep learning model for this prediction, human attractiveness prediction showed some performance on binary classification through SVM and Logistic regression.