Data Management Discussion Paper
The purpose of this assignment is to practice organizing data through ordering and grouping variables.
Data often appear disordered and it is difficult to see any connections or relationships. Ordering the data by certain variables or grouping variables into specific categories, such as age or sex categories, can help bring clarity to the data. Knowing how to organize data is an important skill to initiate the analytical process.
For this assignment, students will use Excel and SPSS Statistics to order variables. Using the \”Example Dataset,\” complete the steps below using both Excel and SPSS Statistics. View the Excel and SPSS tutorials for assistance in completing this assignment. Submit one Word document and include a screen shot of the data after completing the first two steps of Part 1 in Excel and SPSS to compare your results. Use a second Word document to complete Part 2 of the assignment.
Part 1: Ordering and Grouping Data Using Excel and SPSS
For Part 1, accomplish the following:
Order (sort) observations according to age.
Group observations by sex and investigate the age and income for males and females.
Create a new variable titled \”Exercise Group\” based on the variable \”Minutes Exercise.\” Use the following categories to create your groups: 1 = 0-30 minutes; 2 = 31-60 minutes; 3 = 61-90 minutes; 4 = 91-120 minutes; and 5 = 120+ minutes.
Part 2: Data Interpretation
Study the results of the dataset grouping and ordering. Discuss the following in a 500-750 word summary:
Describe the measurement levels for each of the variables in the dataset.
Discuss what you learned from ordering the data by age and why this information is important.
Describe the process you used to group the data in Excel and SPSS.
Describe what you learned by grouping the variables by category of exercise.
Are these data from a correlational study, experimental study, or quasi-experimental (observational) study? Discuss your rationale and identify a study question appropriate for this dataset.
General Requirements
Submit the Word document to the instructor.
APA style is not required, but solid academic writing is expected.
This assignment uses a rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion.
You are required to submit this assignment to LopesWrite. Refer to the LopesWrite Technical Support articles for assistance.
Data management
Part 1: Ordering and Grouping Data Using Excel and SPSS
Order (sort) observations according to age
Figure 1. Excel screenshot for observations sorted according to age
Figure 2. SPSS screenshot for observations sorted according to age
Group observations by sex and investigate the age and income for males and females
Figure 3. Excel screenshot for group observations by gender for age and annual income
Figure 4. SPSS screenshot for group observations by gender for age and annual income
Exercise group
Figure 5. Excel screenshot for exercise grouping
Part 2: Data Interpretation
Study the results of the dataset grouping and ordering. Discuss the following in a 500-750 word summary:
Describe the measurement levels for each of the variables in the dataset.
Nine variables were presented. The first variable is the participants’ ID that used numeric type of data presented in interval measurement level with the distance between variables being meaningful. The second variable is the participants’ gender (sex) that used string type of data presented in nominal measurement level with the attributes only named without a numerical value. The third variable is the participants’ smoking status that used string type of data presented in nominal measurement level with the attributes only named without a numerical value. The fourth variable is the participants’ education level that used numeric type of data presented in nominal measurement level with the attributes only named without a numerical value. Data Management Discussion Paper The fifth variable is the number of minutes participants used exercising, which used numeric type of data presented in nominal measurement level with the attributes only named without a numerical value. The sixth variable is the participants’ age that used numeric type of data presented in interval measurement level with the distance between variables being meaningful. The seventh variable is the participants’ employment status that used string type of data presented in nominal measurement level with the attributes only named without a numerical value. The eighth variable is the participants’ annual income that used numeric type of data presented in interval measurement level with the distance between variables being meaningful. The final variable is the participants’ neighborhood that used string type of data presented in nominal measurement level with the attributes only named without a numerical value (Dytham, 2011).
Discuss what you learned from ordering the data by age and why this information is important.
Meaningfully ordering the data offers two advantages. Firstly, it enabled efficient data searches since it makes it easier to visualize, analyze and understand the data. Secondly, it allows for the data to be processed sequentially thus making it easier to visualize the link between variables.
Describe the process you used to group the data in Excel and SPSS.
Excel process for grouping data was conducted through a four step process. The first step was to determine the data ranges. The second step was to create a new column for grouping the exercise data. The third step was to copy and paste the data on exercise minutes into the new column. The final step was to replace the data with the grouping labels.
SPSS process for grouping data was conducted through three steps. The first step was to click on the transform tab and select decide into different variables options. The second step was to select exercise minutes as the numeric variable and exercise group as the output variable. The third step was to change the old and new values with the ranges identified and values input for the grouping. Running these commands creates the desired grouping.
Describe what you learned by grouping the variables by category of exercise.
Grouping the data improved the efficiency of conducting estimations. In addition, it helped with visualizing the data and identifying trends. Also, it helped in identifying significant subpopulations that would act as the focus of attention.
Are these data from a correlational study, experimental study, or quasi-experimental (observational) study? Discuss your rationale and identify a study question appropriate for this dataset.
The data is collected from a correlational study. This is based on the fact that the participants are not subjected to any treatment. Rather, the research sought to understand the relationships between the naturally occurring variables (Creswell, 2013).
References
Creswell, J. (2013). Research design: qualitative, quantitative, and mixed method approaches. Thousand Oaks, CA: Sage Publications.
Dytham, C. (2011). Choosing and using statistics: a biologist’s guide (3rd ed.). Hoboken, NJ: Wiley-Blackwell. Data Management Discussion Paper