Wednesday, 30 December 2015

Research Topic for data management

I am particularly interested in general health and well being of people. I am using Addhealth data to explore relationship between various variables. After going through the codebook, I have decided to analyze the weight related issues among the masses.

For now, I will be focusing on perception of people about their weight and how they approach to weight issues. I am primarily focusing on weight losing methods which include exercise, diet pills and laxatives.

The topic questions that I will be looking into are as follows:

  1. What is the weight satisfaction level among people
  2. What is the association between weight of people and the weight loss methods(exercise, diet pills, laxatives)  they use based on their weight.
  3. What is the association between age of people and their exercise routines.
Using the search terms "weight loss" "age" "diet pills" and "exercise", a literature review shows many studies on the relationship between age and weight loss. One of the research article "Body image satisfaction, dieting beliefs, and weight loss behaviors in Adolescent Girls and Boys" (http://link.springer.com/article/10.1007/BF01537402#page-2) determines that young girls consider them overweight as compared to boys. It highlights the fact that inspite of right BMI, majority of  young girls are concerned about their weight and adopt various methods to lose weight.

In another study http://www.sciencedirect.com/science/article/pii/S0749379704000285, it focuses on weight loss practices among US adults. This article gives clear statistics on various methods being adopted for weight loss. It is noticeable that people give more emphasis on cutting down the calories and exercising. A small percentage of people go for diet pills.

I hypothesize that people are more weight conscious in age group 15 to 40 years and adopt exercise as a weight loss method.  Apparently, girls are more worried about their weight as compared to boys. I will be analyzing the same scenarios and try to find out a clear association between various parameters.

The reason I have chosen this topic is because when we see around us, we generally find people exercising who are fit and not overweight. On the other hand, most of the overweight people are not doing any physical exercise to lose weight.  Another thing is, young people are more health conscious, but I want to draw a clear association between agegroups and exercise habits in people. 

Also, It is important to know whether people who are overweight are going for diet pills or laxatives as a measure to lose weight.

I will be considering the "not marked" and "marked" entries which refer to those who do not take/do and those who take/do respectively. "Legitimate skip" entries will not be considered as these are not applicable entries. "Refused" and "don't know" will be marked as missing entries.

Below is the snapshot of the variables I will be considering for my study from my codebook :








Monday, 28 December 2015

Creating graphs for data

I have selected Addhealth data set for analyzing various variables using Univariate graphs and association between different parameters using bivariate graphs.

Below is the SAS code for both kind of graphs:



LIBNAME mydata "/courses/d1406ae5ba27fe300" access=readonly;
DATA new; set mydata.addhealth_pds;
LABEL H1GI1Y="Year of birth"
   H1GH60="Weight of population"    
   H1GH30D="Diet pills consumption in last 7 days"
   H1GH30E="Laxatives consumption in last 7 days"
   H1GH31B="Exercised in last 7 days";
   
/* Grouping the ages into 5 groups */
IF H1GI1Y LE 73 THEN agegroup = 5;
ELSE IF H1GI1Y LE 76 THEN agegroup = 4;
ELSE IF H1GI1Y LE 79 THEN agegroup = 3;
ELSE IF H1GI1Y LE 82 THEN agegroup = 2;
ELSE IF H1GI1Y LE 85 THEN agegroup = 1;
ELSE IF H1GI1Y = 96 THEN agegroup = .;
 
IF H1GH60  =  996 THEN weightgroup=.;
IF H1GH60  =  998 THEN weightgroup=.;

/* Grouping the weights into 5 ranges */
IF H1GH60  <= 100 THEN weightgroup = 1;
IF H1GH60  >  100 AND H1GH60 <=200 THEN weightgroup = 2;
IF H1GH60  >  200 AND H1GH60 <=300 THEN weightgroup = 3 ;
IF H1GH60  >  300 AND H1GH60 <=400 THEN weightgroup = 4;
IF H1GH60  =  999 THEN weightgroup=5;
IF H1GH30D = 6   THEN H1GH30D=.;
IF H1GH30D = 7   THEN H1GH30D=.;
IF H1GH30D = 8   THEN H1GH30D=.;
IF H1GH30E = 6   THEN H1GH30E=.;
IF H1GH30E = 7   THEN H1GH30E=.;
IF H1GH30E = 8   THEN H1GH30E=.;
IF H1GH31B = 7   THEN H1GH31B=.;

PROC SORT; by AID;
PROC FREQ; TABLES agegroup weightgroup;
/* Univariate graphs*/
PROC GCHART; VBAR agegroup/Discrete TYPE=PCT width=30;
PROC GCHART; VBAR weightgroup/Discrete TYPE=PCT width=30;
PROC GCHART; VBAR H1GH30D/Discrete TYPE=PCT width=30;
PROC GCHART; VBAR H1GH30E/Discrete TYPE=PCT width=30;
PROC GCHART; VBAR H1GH31B/Discrete TYPE=PCT width=30;

/*Bivariate graphs showing association between two variables*/
PROC GPLOT;PLOT H1GH60*agegroup;

PROC GCHART; VBAR agegroup/discrete type=mean SUMVAR=H1GH60;
PROC GCHART; VBAR agegroup/discrete type=percent SUMVAR=H1GH31B;
PROC GCHART; VBAR weightgroup/discrete type=percent SUMVAR=H1GH31B;
RUN;


The output is as below:
agegroup
Frequency
Percent
Cumulative Frequency
Cumulative Percent
1
8
0.12
8
0.12
2
2563
39.42
2571
39.55
3
3480
53.53
6051
93.08
4
450
6.92
6501
100.00
Frequency Missing = 3
weightgroup
Frequency
Percent
Cumulative Frequency
Cumulative Percent
1
517
8.14
517
8.14
2
5502
86.63
6019
94.77
3
322
5.07
6341
99.84
4
7
0.11
6348
99.95
5
3
0.05
6351
100.00
Frequency Missing = 153
The univariate graph of agegroups:

This graph is unimodal with its highest peak at group 3. i.e birth year 77 to 79. It seems to be skewed towards left.
Univariate graph for weightgroup:
This graph is unimodal with its highest peak at group 2. It seems to be skewed towards right as there are higher frequencies on lower groups.
Univariate graph for diet pills consumption:
This graph shows that there is a large population that did not consume diet pills for losing weight in last 7 days.
Univariate graph for laxative consumption on last 7 days:
This graph shows that there is a large population that did not consume laxatives for losing weight in last 7 days.
Univariate graph for exercise in last 7 days:
The graph shows that a large number of people worked out during last 7 days, though majority did not work out.
Bivariate graph for agegroup Vs weight:
This graph shows that the majority of population with more weight lie in group 2 and 3. The common weight is something between 100 to 160 lbs.
This graph shows association between agegroup and mean of weight for each group.
This graph is unimodal with peak at group 3. it is skewed towards left.
This graph is unimodal with peak at group 2 and skewed towards right. It shows that people from weightgroup 2 are more conscious for weigh loss through exercise.
Summary:
The basic purpose of this study is to analyse the values obtained by different variables and their association. We will be looking into the association between agegroup and exercise routine of people and weightgroup and exercise routine.
The top 5 graphs above are Univariate graphs that give the graphical representation of the percentage held for the value on X-axis. For simplification and data management decision, Ihave categorized the groups into various age groups and weight groups.
For categorical variables, H1GH30D, H1GH30E and H1GH31B, values 0 and 1 are considered as these values represent whether diet pills or laxatives were consumed or not and so as for exerercise. Value 7 which represent "not applicable" has been skipped as these entries are not useful for our study.
Graph 6 onwards are bivariate graphs showing association between different variables like agegroup and weight, agegroup and exercise in last 7 days, weightgroup and exercise in last 7 days.
From the graphs, we can illustrate that people from weightgroup 2 are more health conscious and have exercised in last 7 days. from weightgroup 3 and 4, inspite of more weight, they are not inclined to exercise.
Similarly, from agegroup 2 and 3, people are more conscious for exercise. So we can infer that people are health conscious when they are under 40 years and then they tend to gain weight but do not work out. Similary, after a particular weight i.e 250lbs and above, people are not interested in losing weight by working out.