Institute: Stony Brook University
The goal of this project was to predict human intersexual attractiveness through several machine learning models based on individual profile, such as age, height, college, and self-photos collected by the biggest dating service in South Korea. To be more specific about the data, every user must be rated over 3.0 average scores out of 5.0 in total by other random members (in the opposite sex) in order to join the membership of the service. Hence, every member of the service have their own rating scores which represents their intersexual attractiveness from 1.0 to 5.0 scale. They have more than 1 M members in their service.
Data Credentials and Policy
This project was conducted under highly-secured circumstances by cooperating with an authorized colleague located at the corporate branch. Designing/implementing the model was done at Stony Brook University while applying the actual data was safely done only under the circumstance that the corporation had provided.
Since this is about prediction as numerical scores, I first decided to look into regression models, such as ridge and elastic net regression. There also lots of self-phots corresponding to each user. Psycologically, I also assumed that attractiveness is mainly from the appearance. Thus, I considered the self-photos as a independent variables of my prediction model by converting the images to a meaningful feature vectors. In order to do this, I applied pre-trained model such as inception-v3 to extract image vectors.
Ridge regression with 10K samples: 54 features (age, height, college, etc.) + 1024 features of self-photos extracted from the pre-trained deep neural network: Inception-v3)
Scatteplot (Actual vs. Pred)
R squared scores of the ridge regression model showed that the prediction was not well fitted. I tried binary classication Non Attractive (1.0 to 2.5) and Attractive (2.5 to 5.0) through Support Vector Machine with One-vs-Rest strategy to examine if binary relationship exist.
Binay classification on human attractiveness somehow worked. Since the appearance may be more important factor than other features, I decided to focus more on the appearance, especially face. I extracted faces from the photos by using OpenCV face recognition algorithm. I also applied PCA dimensional reduction on the face images to reduce the compuational costs and enhance meaningful feature relationships. Then, I applied logistic regression with binary classification between Non Attractive (1.0 to 2.5) and Attractive (2.5 to 5.0).
PCA Dimensional Reduction reduced 90% of features of face images from 4096 (64 by 64) dimensional features into 400 principal components features.
Legal Issues Emerged and Conclusion
While conduction this research more deeply, the legal issues emerged regarding publication of research with the data from South Korea. As a result, all further processes for this project have been unfortunately suspended.
Even though I haven’t tried deep learning model for this prediction, human attractiveness prediction showed some performance on binary classification through SVM and Logistic regression.