5 Steps to Improve Life Satisfaction

Caelon Smith
8 min readApr 29, 2021

--

DATA SUMMARY

  • Overview: Life satisfaction scores for 15,792 individuals based on self-submitted surveys of 20 inputs (see below), covering an individuals Physical health, Mental health, Expertise, Connection to others, and Meaning
  • Inputs: (1) Daily Fruits/Vegetable consumption; (2) Daily Stress experienced; (3) # of new places visited in the past year; (4) number of close friends; (5) number of people helped in the past year; (6) amount of human interaction daily; (7) # of personal awards received in life; (8) Amount of time or money donated in the past year; (9) Daily success in completing to do list; (10) Daily hours spent working in a flow state; (11) Average daily steps walked/ran; (12) # Years life vision is planned ahead; (13) # Hours slept daily; (14) # Vacation days lost per year due to work; (15) Daily # of times person shouts; (16) Hours per week used to work on passion projects; (17) # of times per week spent on self-reflection; (18) BMI healthy/overweight; (19) Income sufficient for life expenses; (20) # of remarkable achievements to be proud of
  • Output: Based on the 20 inputs, a life satisfaction score is determined to reflect how well a person shapes their lifestyle, habits and behaviors to maximize their overall life satisfaction across the five aforementioned dimensions
  • Objective: This data is universally applicable to all living humans. The hope is that by examining the data thoroughly, we might distill some key insights on the factors most helpful for an individual to attain a healthy and satisfactory life

TAKEAWAY

  • The initial results were nothing new and groundbreaking: be in good shape and make good money to have a satisfactory life.
  • Some surprises were the utter lack of impact sleep duration, social networks, and meditation had on life satisfaction scores.
  • After stripping out income and BMI and Income, we were able to mine some value by seeing the influential areas that are more actionable than simply ‘earn more, be fit’.

Five Highly Influential Factors that are Actionable Today

  • Visit new places (doesn’t have to be Europe, but a different neighborhood, a museum, a park, etc.).
  • Go on daily walks — walk to new places to cover 2 features!
  • Plan ahead, think about your future.
  • Spend some time on your passion.
  • Help others.

You can have a highly satisfactory lifestyle and you don’t need to solely get swol and make paper. Theses five steps can be actioned upon today, and over time will greatly boost your life satisfaction. The dividends these actions pay may not be visible initially, but could very likely lead a person to getting fit and earning more wealth; the two steps that seem most hard from the get-go.

TARGET TO PREDICT

  • Life satisfaction score that individuals receive based upon the recorded values across the 20 inputs mentioned above
  • In the dataset, life satisfaction scores range from 480–820 points
  • According to the survey, an individual scoring below 550 has a poor life satisfaction score, above 680 and below 700 is considered a good life satisfaction score, and above 700 is considered to be an excellent life satisfaction score

DATA WRANGLING

  • The dataset is filled mostly with numerical values and so complex data wrangling and feature engineering were not as expansive as a result; 6 steps were taken
  • (1) Set our dataframe’s index to be the date, in datetime format
  • (2) Find columns with messy data, clean them to the proper format (i.e., make strings into numbers, reformat object columns to integer columns)
  • (3) Encode binary strings (e.g., Gender) to binary numerical
  • (4) Make a second life satisfaction column in classification format so we can run a classification analysis as well; create function to return 4 possible values (poor, average, good, excellent) based on original numerical score
  • (5) Drop potentially “leaky” columns. Our models were trained and tested with all input values; we then did the same after dropping two columns “BMI Score” and “Sufficient Income”. Since many other inputs are derivative to these two columns, the self-imposed restriction to drop them was made to see how well our model could perform without them vs with them (e.g., Fruits/Veggies, Daily Stress, Daily Steps are likely all dependent on BMI Score | Donations, To Do, Places Visited, Passion Time, etc. are likely all dependent on Sufficient Income)

REGRESSION or CLASSIFICATION

  • The life satisfaction score is numerical, and as such we can train some regression models to try and predict scores based on an array of input values
  • Since the survey makers have also categorized results into different buckets, we can also train a classification model to predict if an individual will have a poor score (i.e., score < 550), an average score (i.e., 550 < Score < 680), a good score (i.e., 680 < Score < 700), or an excellent score (i.e., Score > 700)

PERFORMANCE METRICS

  • Regression Model: Mean Absolute Error (MAE)
  • Baseline MAE → if we predicted the score for each individual to be 666 points, which is the average score across the entire dataset, the average absolute value of our error would be 36.1 points
  • Classification Model: Accuracy Score, F1 Score, Precision and Recall
  • Baseline Accuracy Score → 60.3% of individuals in our dataset had an average life satisfaction score (i.e., 550 < Score < 680), our most naive model would predict every individual has an average life satisfaction score and in our data set the naive model would be correct 60.3% of the time
  • F1 Score, Precision, Recall → these metrics will be more important in assessing our model’s performance than the accuracy score. The reason is that the accuracy score measures model predictability across the whole data set and, we have 4 classification buckets that are not distributed normally. While average scores are 60% of the population, poor scores represent <1% of the population; thus, a strong accuracy score could falsely imply the model is working well and simultaneously be erroneously predicting all individuals with poor scores. Therefore, these metrics will be a better proxy for gauging our models predictability

RESULTS: Test Set Performance

Linear Regression

  • BMI and Income Columns Dropped → 9.3 MAE
  • No Dropped Columns → 0.000000000000048 MAE

Ridge Regression

  • BMI and Income Columns Dropped → 9.3 MAE
  • No Dropped Columns → 0.001 MAE

Random Forest Classifier

BMI and Income Columns Dropped

  • Accuracy Score → 81.4% Accuracy Score
  • F1 Score60% Macro Average;
  • 89% (Average), 85% (Excellent), 19% (Good), 46% (poor)
  • Precision77% Macro Average;
  • 82% (Average), 86% (Excellent), 39% (Good), 100% (poor)
  • Recall56% Macro Average;
  • 98% (Average), 84% (Excellent), 13% (Good), 30% (poor)

No Dropped Column

  • Accuracy Score → 84.0% Accuracy Score
  • F1 Score63% Macro Average;
  • 91% (Average), 91% (Excellent), 24% (Good), 46% (poor)
  • Precision81% Macro Average;
  • 83% (Average), 92% (Excellent), 49% (Good), 100% (poor)
  • Recall59% Macro Average;
  • 99% (Average), 90% (Excellent), 16% (Good), 30% (poor)

XGB Classifier

BMI and Income Columns Dropped

  • Accuracy Score84.0% Accuracy Score
  • F1 Score62% Macro Average;
  • 91% (Average), 89% (Excellent), 22% (Good), 46% (poor)
  • Precision81% Macro Average;
  • 85% (Average), 86% (Excellent), 53% (Good), 100% (poor)
  • Recall59% Macro Average;
  • 99% (Average), 92% (Excellent), 14% (Good), 30% (poor)

No Dropped Column

  • Accuracy Score86.8% Accuracy Score
  • F1 Score87% Macro Average;
  • 93% (Average), 91% (Excellent), 34% (Good), 67% (poor)
  • Precision86% Macro Average;
  • 88% (Average), 87% (Excellent), 71% (Good), 100% (poor)
  • Recall67% Macro Average;
  • 99% (Average), 96% (Excellent), 23% (Good), 50% (poor)

VISUALIZATIONS

Coefficient Impacts: Ridge Regression (w/ all columns)

  • BMI and Income inputs are the most influential inputs for the model to determine a life satisfaction score

Coefficient Impacts: Ridge Regression (w/o BMI, Income)

  • In the absence of BMI and Income, Places visited and Lost vacation become the most influential inputs in our models ability to determine a life satisfaction score; additionally, Daily steps moves from 8th most important up to the 3rd most important feature

PDP Plot: Ridge Regression (BMI, Sufficient Income)

  • In a PDP plot of BMI and Income, we can see just how profound the impact these features have on the model. Looking below, we can see that when an individual has an overweight BMI and does not have sufficient income, there score is expected to be 645 points, when the opposite is true the number increases 34 points, to 679 (our baseline MAE was 36 points for reference!!)

PDP Plot: XGB Classification (BMI, Income)

  • In our classification model, we can see some interesting statistics using the PDP plot for income and BMI. Of the people with an unhealthy BMI and insufficient income, only 13% scored above 700 points; whereas 31% of folks that had healthy BMI’s and sufficient income had excellent scores. Our understanding checks out based on the purple boxes, which confirm that folks with unhealthy BMI’s and insufficient incomes are least likely to receive good or excellent scores (class 1 and 2). And the inverse is true as well, the purple box in the top left corner of Average Score (class 0) and Poor Score (class 3) show people with sufficient income and healthy BMI’s were the minorities in those classes.

PDP Plot: XGB Classification (daily steps, places visited)

  • The top two features in predicting class became Places Visited and Daily Steps in the absence of BMI and Income. We can see that for excellent scorers (top right), 37% walked 10,000 steps per day and had visited 10 new places in the past year. In contract, 76% of folks walking 1,000 steps a day and visiting 0 new places in the past year had an average score (i.e., 550–680).

SHAPLEY FORCE: Ridge Regression (w/o BMI, Income)

  • Another interesting visualization was the Shapley chart. The example below shows a model prediction of 703 points, which is an excellent score. In the absence of Income and BMI we can see the features that really stand out; life vision 10 years out, 10,000 steps walked per day, 6 hours per week working on a passion project, completing 90% of their daily to do list. Heavy opposing forces were 10 lost vacation days per year due to work, and only visiting 4 new places in the past year.

--

--

No responses yet