Statistics in Nursing
- What are the frequency and percentage of the COPD patients in the severe airfl ow limitation group who are employed in the Eckerblad et al. (2014) study?
- What percentage of the total sample is retired? What percentage of the total sample is on sick leave?
- What is the total sample size of this study? What frequency and percentage of the total sample were still employed? Show your calculations and round your answer to the nearest whole percent.
- What is the total percentage of the sample with a smoking history—either still smoking or former smokers? Is the smoking history for study participants clinically important? Provide a rationale for your answer.
- What are pack years of smoking? Is there a signifi cant difference between the moderate and severe airfl ow limitation groups regarding pack years of smoking? Provide a rationale for your answer.
- What were the four most common psychological symptoms reported by this sample of patients with COPD? What percentage of these subjects experienced these symptoms? Was there a sig-nifi cant difference between the moderate and severe airfl ow limitation groups for psychological symptoms?
- What frequency and percentage of the total sample used short-acting β 2 -agonists? Show your calculations and round to the nearest whole percent.
- Is there a signifi cant difference between the moderate and severe airfl ow limitation groups regarding the use of short-acting β 2 -agonists? Provide a rationale for your answer.
- Was the percentage of COPD patients with moderate and severe airfl ow limitation using short-acting β 2 -agonists what you expected? Provide a rationale with documentation for your answer.
- Are these fi ndings ready for use in practice? Provide a rationale for your answer.
Understanding Frequencies and Percentages STATISTICAL TECHNIQUE IN REVIEW Frequency is the number of times a score or value for a variable occurs in a set of data. Frequency distribution is a statistical procedure that involves listing all the possible values or scores for a variable in a study. Frequency distributions are used to organize study data for a detailed examination to help determine the presence of errors in coding or computer programming ( Grove, Burns, & Gray, 2013 ). In addition, frequencies and percentages are used to describe demographic and study variables measured at the nominal or ordinal levels. Percentage can be defi ned as a portion or part of the whole or a named amount in every hundred measures. For example, a sample of 100 subjects might include 40 females and 60 males. In this example, the whole is the sample of 100 subjects, and gender is described as including two parts, 40 females and 60 males. A percentage is calculated by dividing the smaller number, which would be a part of the whole, by the larger number, which represents the whole. The result of this calculation is then multiplied by 100%. For example, if 14 nurses out of a total of 62 are working on a given day, you can divide 14 by 62 and multiply by 100% to calculate the percentage of nurses working that day. Calculations: (14 ÷ 62) × 100% = 0.2258 × 100% = 22.58% = 22.6%. The answer also might be expressed as a whole percentage, which would be 23% in this example. A cumulative percentage distribution involves the summing of percentages from the top of a table to the bottom. Therefore the bottom category has a cumulative percentage of 100% (Grove, Gray, & Burns, 2015). Cumulative percentages can also be used to deter-mine percentile ranks, especially when discussing standardized scores. For example, if 75% of a group scored equal to or lower than a particular examinee ’ s score, then that examinee ’ s rank is at the 75 th percentile. When reported as a percentile rank, the percentage is often rounded to the nearest whole number. Percentile ranks can be used to analyze ordinal data that can be assigned to categories that can be ranked. Percentile ranks and cumulative percentages might also be used in any frequency distribution where subjects have only one value for a variable. For example, demographic characteristics are usually reported with the frequency ( f ) or number ( n ) of subjects and percentage (%) of subjects for each level of a demographic variable. Income level is presented as an example for 200 subjects: Income Level Frequency ( f ) Percentage (%) Cumulative % 1. < $40,000 2010%10% 2. $40,000–$59,999 5025%35% 3. $60,000–$79,999 8040%75% 4. $80,000–$100,000 4020%95% 5. > $100,000 105%100% EXERCISE 6 60EXERCISE 6 • Understanding Frequencies and PercentagesCopyright © 2017, Elsevier Inc. All rights reserved. In data analysis, percentage distributions can be used to compare fi ndings from different studies that have different sample sizes, and these distributions are usually arranged in tables in order either from greatest to least or least to greatest percentages ( Plichta & Kelvin, 2013 ). RESEARCH ARTICLE Source Eckerblad, J., Tödt, K., Jakobsson, P., Unosson, M., Skargren, E., Kentsson, M., & Thean-der, K. (2014). Symptom burden in stable COPD patients with moderate to severe airfl ow limitation. Heart & Lung, 43 (4), 351–357. Introduction Eckerblad and colleagues (2014 , p. 351) conducted a comparative descriptive study to examine the symptoms of “patients with stable chronic obstructive pulmonary disease (COPD) and determine whether symptom experience differed between patients with mod-erate or severe airfl ow limitations.” The Memorial Symptom Assessment Scale (MSAS) was used to measure the symptoms of 42 outpatients with moderate airfl ow limitations and 49 patients with severe airfl ow limitations. The results indicated that the mean number of symptoms was 7.9 ( ± 4.3) for both groups combined, with no signifi cant dif-ferences found in symptoms between the patients with moderate and severe airfl ow limi-tations. For patients with the highest MSAS symptom burden scores in both the moderate and the severe limitations groups, the symptoms most frequently experienced included shortness of breath, dry mouth, cough, sleep problems, and lack of energy. The research-ers concluded that patients with moderate or severe airfl ow limitations experienced mul-tiple severe symptoms that caused high levels of distress. Quality assessment of COPD patients ’ physical and psychological symptoms is needed to improve the management of their symptoms. Relevant Study Results Eckerblad et al. (2014 , p. 353) noted in their research report that “In total, 91 patients assessed with MSAS met the criteria for moderate ( n = 42) or severe airfl ow limitations ( n = 49). Of those 91 patients, 47% were men, and 53% were women, with a mean age of 68 ( ± 7) years for men and 67 ( ± 8) years for women. The majority (70%) of patients were married or cohabitating. In addition, 61% were retired, and 15% were on sick leave. Twenty-eight percent of the patients still smoked, and 69% had stopped smoking. The mean BMI (kg/m 2 ) was 26.8 ( ± 5.7). There were no signifi cant differences in demographic characteristics, smoking history, or BMI between patients with moderate and severe airfl ow limitations ( Table 1 ). A lower proportion of patients with moderate airfl ow limitation used inhalation treatment with glucocorticosteroids, long-acting β 2 -agonists and short-acting β 2 -agonists, but a higher proportion used analgesics compared with patients with severe airfl ow limitation. Symptom prevalence and symptom experience The patients reported multiple symptoms with a mean number of 7.9 ( ± 4.3) symptoms (median = 7, range 0–32) for the total sample, 8.1 ( ± 4.4) for moderate airfl ow limitation and 7.7 ( ± 4.3) for severe airfl ow limitation ( p = 0.36) . . . . Highly prevalent physical symp-toms ( ≥ 50% of the total sample) were shortness of breath (90%), cough (65%), dry mouth (65%), and lack of energy (55%). Five additional physical symptoms, feeling drowsy Understanding Frequencies and Percentages • EXERCISE 6Copyright © 2017, Elsevier Inc. All rights reserved. TABLE 1 BACKGROUND CHARACTERISTICS AND USE OF MEDICATION FOR PATIENTS WITH STABLE CHRONIC OBSTRUCTIVE LUNG DISEASE CLASSIFIED IN PATIENTS WITH MODERATE AND SEVERE AIRFLOW LIMITATION Moderate n = 42 Severe n = 49 p Value Sex, n (%)0.607 Women19 (45)29 (59) Men23 (55)20 (41)Age (yrs), mean ( SD )66.5 (8.6)67.9 (6.8)0.396Married/cohabitant n (%)29 (69)34 (71)0.854Employed, n (%)7 (17)7 (14)0.754Smoking, n %0.789 Smoking13 (31)12 (24) Former smokers28 (67)35 (71) Never smokers1 (2)2 (4)Pack years smoking, mean ( SD )29.1 (13.5)34.0 (19.5)0.177BMI (kg/m 2 ), mean ( SD )27.2 (5.2)26.5 (6.1)0.555FEV 1 % of predicted, mean ( SD )61.6 (8.4)42.2 (5.8) < 0.001SpO 2 % mean ( SD )95.8 (2.4)94.5 (3.0)0.009Physical health, mean ( SD )3.2 (0.8)3.0 (0.8)0.120Mental health, mean ( SD )3.7 (0.9)3.6 (1.0)0.628Exacerbation previous 6 months, n (%)14 (33)15 (31)0.781Admitted to hospital previous year, n (%)10 (24)14 (29)0.607Medication use, n (%) Inhaled glucocorticosteroids30 (71)44 (90)0.025 Systemic glucocorticosteroids3 (6.3)0 (0)0.094 Anticholinergic32 (76)42 (86)0.245 Long-acting β 2 -agonists30 (71)45 (92)0.011 Short-acting β 2 -agonists13 (31)32 (65)0.001 Analgesics11 (26)5 (10)0.046 Statins8 (19)11 (23)0.691 Eckerblad, J., Tödt, K., Jakobsson, P., Unosson, M., Skargren, E., Kentsson, M., & Theander, K. (2014). Symptom burden in stable COPD patients with moderate to severe airfl ow limitation. Heart & Lung, 43 (4), p. 353. numbness/tingling in hands/feet, feeling irritable, and dizziness, were reported by between 25% and 50% of the patients. The most commonly reported psychological symptom was diffi culty sleeping (52%), followed by worrying (33%), feeling irritable (28%) and feeling sad (22%). There were no signifi cant differences in the occurrence of physical and psy-chological symptoms between patients with moderate a
nd severe airfl ow limitations” ( Eckerblad et al., 2014 , p. 353). 62EXERCISE 6 • Understanding Frequencies and PercentagesCopyright © 2017, Elsevier Inc. All rights reserved. STUDY QUESTIONS 1. What are the frequency and percentage of women in the moderate airfl ow limitation group? 2. What were the frequencies and percentages of the moderate and the severe airfl ow limitation groups who experienced an exacerbation in the previous 6 months? 3. What is the total sample size of COPD patients included in this study? What number or fre-quency of the subjects is married/cohabitating? What percentage of the total sample is married or cohabitating? 4. Were the moderate and severe airfl ow limitation groups signifi cantly different regarding married/cohabitating status? Provide a rationale for your answer. 5. List at least three other relevant demographic variables the researchers might have gathered data on to describe this study sample. 6. For the total sample, what physical symptoms were experienced by ≥ 50% of the subjects? Identify the physical symptoms and the percentages of the total sample experiencing each symptom.
Interpreting Line Graphs EXERCISE 7
69 Interpreting Line Graphs STATISTICAL TECHNIQUE IN REVIEW Tables and fi gures are commonly used to present fi ndings from studies or to provide a way for researchers to become familiar with research data. Using fi gures, researchers are able to illustrate the results from descriptive data analyses, assist in identifying patterns in data, identify changes over time, and interpret exploratory fi ndings. A line graph is a fi gure that is developed by joining a series of plotted points with a line to illustrate how a variable changes over time. A line graph fi gure includes a horizontal scale, or x -axis, and a vertical scale, or y -axis. The x -axis is used to document time, and the y -axis is used to document the mean scores or values for a variable ( Grove, Burns, & Gray, 2013 ; Plichta & Kelvin, 2013 ). Researchers might include a line graph to compare the values for three or four variables in a study or to identify the changes in groups for a selected variable over time. For example, Figure 7-1 presents a line graph that documents time in weeks on the x -axis and mean weight loss in pounds on the y -axis for an experimental group consuming a low carbohydrate diet and a control group consuming a standard diet. This line graph illustrates the trend of a strong, steady increase in the mean weight lost by the experimental or intervention group and minimal mean weight loss by the control group. EXERCISE 7 FIGURE 7-1 ■ LINE GRAPH COMPARING EXPERIMENTAL AND CONTROL GROUPS FOR WEIGHT LOSS OVER FOUR WEEKS. Weight loss (lbs)Weeksy-axisx-axisControlExperimental10864201234 70EXERCISE 7 • Interpreting Line GraphsCopyright © 2017, Elsevier Inc. All rights reserved. RESEARCH ARTICLE Source Azzolin, K., Mussi, C. M., Ruschel, K. B., de Souza, E. N., Lucena, A. D., & Rabelo-Silva, E. R. (2013). Effectiveness of nursing interventions in heart failure patients in home care using NANDA-I, NIC, and NOC. Applied Nursing Research, 26 (4), 239–244. Introduction Azzolin and colleagues (2013) analyzed data from a larger randomized clinical trial to determine the effectiveness of 11 nursing interventions (NIC) on selected nursing out-comes (NOC) in a sample of patients with heart failure (HF) receiving home care. A total of 23 patients with HF were followed for 6 months after hospital discharge and provided four home visits and four telephone calls. The home visits and phone calls were organized using the nursing diagnoses from the North American Nursing Diagnosis Association International (NANDA-I) classifi cation list. The researchers found that eight nursing interven tions signifi cantly improved the nursing outcomes for these HF patients. Those interventions included “health education, self-modifi cation assistance, behavior modifi -cation, telephone consultation, nutritional counselling, teaching: prescribed medications, teaching: disease process, and energy management” ( Azzolin et al., 2013 , p. 243). The researchers concluded that the NANDA-I, NIC, and NOC linkages were useful in manag-ing patients with HF in their home. Relevant Study Results Azzolin and colleagues (2013) presented their results in a line graph format to display the nursing outcome changes over the 6 months of the home visits and phone calls. The nursing outcomes were measured with a fi ve-point Likert scale with 1 = worst and 5 = best. “Of the eight outcomes selected and measured during the visits, four belonged to the health & knowledge behavior domain (50%), as follows: knowledge: treatment regimen; compliance behavior; knowledge: medication; and symptom control. Signifi cant increases were observed in this domain for all outcomes when comparing mean scores obtained at visits no. 1 and 4 ( Figure 1 ; p < 0.001 for all comparisons). The other four outcomes assessed belong to three different NOC domains, namely, functional health (activity tolerance and energy conservation), physiologic health (fl uid balance), and family health (family participation in professional care). The scores obtained for activity tolerance and energy conservation increased signifi cantly from visit no. 1 to visit no. 4 ( p = 0.004 and p < 0.001, respectively). Fluid balance and family participation in professional care did not show statistically signifi cant differences ( p = 0.848 and p = 0.101, respectively) ( Figure 2 )” ( Azzolin et al., 2013 , p. 241). The signifi cance level or alpha ( α ) was set at 0.05 for this study. Interpreting Line Graphs • EXERCISE 7Copyright © 2017, Elsevier Inc. All rights reserved. FIGURE 2 ■ NURSING OUTCOMES MEASURED OVER 6 MONTHS (OTHER DOMAINS): Activity tolerance (95% CI − 1.38 to − 0.18, p = 0.004); energy conservation (95% CI − 0.62 to − 0.19, p < 0.001); fl uid balance (95% CI − 0.25 to 0.07, p = .848); family participation in professional care (95% CI − 2.31 to − 0.11, p = 0.101). HV = home visit. CI = confi dence interval. Azzolin, K., Mussi, C. M., Ruschel, K. B., de Souza, E. N., Lucena, A. D., & Rabelo-Silva, E. R. (2013). Effectiveness of nursing interventions in heart failure patients in home care using NANDA-I, NIC, and NOC. Applied Nursing Research, 26 (4), p. 242. 5.04.54.03.53.02.52.01.51.00.50MeanHV1HV2HV3HV4Fluid balanceFamily participationin professional careActivity toleranceEnergy conservation FIGURE 1 ■ NURSING OUTCOMES MEASURED OVER 6 MONTHS (HEALTH & KNOWLEDGE BEHAVIOR DOMAIN): Knowledge: medication (95% CI − 1.66 to − 0.87, p < 0.001); knowledge: treatment regimen (95% CI − 1.53 to − 0.98, p < 0.001); symptom control (95% CI − 1.93 to − 0.95, p < 0.001); and compliance behavior (95% CI − 1.24 to − 0.56, p < 0.001). HV = home visit. CI = confi dence interval. 5.04.54.03.53.02.52.01.51.00.50MeanHV1HV2HV3HV4Compliance behaviorSymptom controlKnowledge: medicationKnowledge: treatment reg 72EXERCISE 7 • Interpreting Line GraphsCopyright © 2017, Elsevier Inc. All rights reserved. STUDY QUESTIONS 1. What is the purpose of a line graph? What elements are included in a line graph? 2. Review Figure 1 and identify the focus of the x -axis and the y -axis. What is the time frame for the x -axis? What variables are presented on this line graph? 3. In Figure 1 , did the nursing outcome compliance behavior change over the 6 months of home visits? Provide a rationale for your answer. 4. State the null hypothesis for the nursing outcome compliance behavior. 5. Was there a signifi cant difference in compliance behavior from the fi rst home visit (HV1) to the fourth home visit (HV4)? Was the null hypothesis accepted or rejected? Provide a rationale for your answer. 6. In Figure 1 , what outcome had the lowest mean at HV1? Did this outcome improve over the four home visits? Provide a rationale for your answer.
Copyright © 2017, Elsevier Inc. All rights reserved. 77
Questions to Be Graded EXERCISE 7 Follow your instructor ’ s directions to submit your answers to the following questions for grading. Your instructor may ask you to write your answers below and submit them as a hard copy for grading. Alternatively, your instructor may ask you to use the space below for notes and submit your answers online at http://evolve.elsevier.com/Grove/statistics/ under “Questions to Be Graded.”
- What is the focus of the example Figure 7-1 in the section introducing the statistical technique of this exercise?
- In Figure 2 of the Azzolin et al. (2013 , p. 242) study, did the nursing outcome activity tolerance change over the 6 months of home visits (HVs) and telephone calls? Provide a rationale for your answer.
- State the null hypothesis for the nursing outcome activity tolerance.
- Was there a signifi cant difference in activity tolerance from the fi rst home visit (HV1) to the fourth home visit (HV4)? Was the null hypothesis accepted or rejected? Provide a rationale for your answer.
- In Figure 2 , what nursing outcome had the lowest mean at HV1? Did this outcome improve over the four HVs? Provide a rationale for your answer.
- What nursing outcome had the highest mean at HV1 and at HV4? Was this outcome signifi -cantly different from HV1 to HV4? Provide a rationale for your answer.
- State the null hypothesis for the nursing outcome family participation in professional care.
- Was there a statistically signifi cant difference in family participation in professional care from HV1 to HV4? Was the null hypothesis accepted or rejected? Provide a rationale for your answer.
- Was Figure 2 helpful in understanding the nursing outcomes for patients with heart failure (HF) who received four HVs and telephone calls? Provide a rationale for your answer. 10. What nursing interventions signifi cantly improved the nursing outcomes for these patients with HF? What implications for practice do you note from these study results? Copyright © 2017, Elsevier Inc. All rights reserved. 79 Measures of Central Tendency : Mean, Median, and Mode
EXERCISE 8 STATISTICAL TECHNIQUE IN REVIEW Mean, median, and mode are the three measures of central tendency used to describe study variables. These statistical techniques are calculated to determine the center of a distribution of data, and the central tendency that is calculated is determined by the level of measurement of the data (nominal, ordinal, interval, or ratio; see Exercise 1 ). The mode is a category or score that occurs with the greatest frequency in a distribution of scores in a data set. The mode is the only acceptable measure of central tendency for analyzing nominal-level data, which are not continuous and cannot be ranked, compared, or sub-jected to mathematical operations. If a distribution has two scores that occur more fre-quently than others (two modes), the distribution is called bimodal . A distribution with more than two modes is multimodal ( Grove, Burns, & Gray, 2013 ). The median ( MD ) is a score that lies in the middle of a rank-ordered list of values of a distribution. If a distribution consists of an odd number of scores, the MD is the middle score that divides the rest of the distribution into two equal parts, with half of the values falling above the middle score and half of the values falling below this score. In a distribu-tion with an even number of scores, the MD is half of the sum of the two middle numbers of that distribution. If several scores in a distribution are of the same value, then the MD will be the value of the middle score. The MD is the most precise measure of central ten-dency for ordinal-level data and for nonnormally distributed or skewed interval- or ratio-level data. The following formula can be used to calculate a median in a distribution of scores. Median()()MDN=+÷12 N is the number of scores ExampleMedianscoreth:N==+=÷=31311232216 ExampleMedianscoreth:.N==+=÷=404012412205 Thus in the second example, the median is halfway between the 20 th and the 21 st scores. The mean ( X ) is the arithmetic average of all scores of a sample, that is, the sum of its individual scores divided by the total number of scores. The mean is the most accurate measure of central tendency for normally distributed data measured at the interval and ratio levels and is only appropriate for these levels of data (Grove, Gray, & Burns, 2015). In a normal distribution, the mean, median, and mode are essentially equal (see Exercise 26 for determining the normality of a distribution). The mean is sensitive to extreme
Copyright © 2017, Elsevier Inc. All rights reserved. 77 Questions to Be Graded EXERCISE 7 Follow your instructor ’ s directions to submit your answers to the following questions for grading. Your instructor may ask you to write your answers below and submit them as a hard copy for grading. Alternatively, your instructor may ask you to use the space below for notes and submit your answers online at http://evolve.elsevier.com/Grove/statistics/ under “Questions to Be Graded.” 1. What is the focus of the example Figure 7-1 in the section introducing the statistical technique of this exercise? 2. In Figure 2 of the Azzolin et al. (2013 , p. 242) study, did the nursing outcome activity tolerance change over the 6 months of home visits (HVs) and telephone calls? Provide a rationale for your answer. 3. State the null hypothesis for the nursing outcome activity tolerance. 4. Was there a signifi cant difference in activity tolerance from the fi rst home visit (HV1) to the fourth home visit (HV4)? Was the null hypothesis accepted or rejected? Provide a rationale for your answer. Name: _______________________________________________________ Class: _____________________ Date: ___________________________________________________________________________________ 78EXERCISE 7 • Interpreting Line GraphsCopyright © 2017, Elsevier Inc. All rights reserved. 5. In Figure 2 , what nursing outcome had the lowest mean at HV1? Did this outcome improve over the four HVs? Provide a rationale for your answer. 6. What nursing outcome had the highest mean at HV1 and at HV4? Was this outcome signifi -cantly different from HV1 to HV4? Provide a rationale for your answer. 7. State the null hypothesis for the nursing outcome family participation in professional care. 8. Was there a statistically signifi cant difference in family participation in professional care from HV1 to HV4? Was the null hypothesis accepted or rejected? Provide a rationale for your answer. 9. Was Figure 2 helpful in understanding the nursing outcomes for patients with heart failure (HF) who received four HVs and telephone calls? Provide a rationale for your answer. 10. What nursing interventions signifi cantly improved the nursing outcomes for these patients with HF? What implications for practice do you note from these study results? Copyright © 2017, Elsevier Inc. All rights reserved. 79 Measures of Central Tendency : Mean, Median, and Mode EXERCISE 8 STATISTICAL TECHNIQUE IN REVIEW Mean, median, and mode are the three measures of central tendency used to describe study variables. These statistical techniques are calculated to determine the center of a distribution of data, and the central tendency that is calculated is determined by the level of measurement of the data (nominal, ordinal, interval, or ratio; see Exercise 1 ). The mode is a category or score that occurs with the greatest frequency in a distribution of scores in a data set. The mode is the only acceptable measure of central tendency for analyzing nominal-level data, which are not continuous and cannot be ranked, compared, or sub-jected to mathematical operations. If a distribution has two scores that occur more fre-quently than others (two modes), the distribution is called bimodal . A distribution with more than two modes is multimodal ( Grove, Burns, & Gray, 2013 ). The median ( MD ) is a score that lies in the middle of a rank-ordered list of values of a distribution. If a distribution consists of an odd number of scores, the MD is the middle score that divides the rest of the distribution into two equal parts, with half of the values falling above the middle score and half of the values falling below this score. In a distribu-tion with an even number of scores, the MD is half of the sum of the two middle numbers of that distribution. If several scores in a distribution are of the same value, then the MD will be the value of the middle score. The MD is the most precise measure of central ten-dency for ordinal-level data and for nonnormally distributed or skewed interval- or ratio-level data. The following formula can be used to calculate a median in a distribution of scores. Median()()MDN=+÷12 N is the number of scores ExampleMedianscoreth:N==+=÷=31311232216 ExampleMedianscoreth:.N==+=÷=404012412205 Thus in the second example, the median is halfway between the 20 th and the 21 st scores. The mean ( X ) is the arithmetic average of all scores of a sample, that is, the sum of its individual scores divided by the total number of scores. The mean is the most accurate measure of central tendency for normally distributed data measured at the interval and ratio levels and is only appropriate for these levels of data (Grove, Gray, & Burns, 2015). In a normal distribution, the mean, median, and mode are essentially equal (see Exercise 26 for determining the normality of a distribution). The mean is sensitive to extreme
Copyright © 2017, Elsevier Inc. All rights reserved. 77 Questions to Be Graded EXERCISE 7 Follow your instructor ’ s directions to submit your answers to the following questions for grading. Your instructor may ask you to write your answers below and submit them as a hard copy for grading. Alternatively, your instructor may ask you to use the space below for notes and submit your answers online at http://evolve.elsevier.com/Grove/statistics/ under “Questions to Be Graded.”
- What is the focus of the example Figure 7-1 in the section introducing the statistical technique of this exercise?
- In Figure 2 of the Azzolin et al. (2013 , p. 242) study, did the nursing outcome activity tolerance change over the 6 months of home visits (HVs) and telephone calls? Provide a rationale for your answer.
- State the null hypothesis for the nursing outcome activity tolerance.
- Was there a signifi cant difference in activity tolerance from the fi rst home visit (HV1) to the fourth home visit (HV4)? Was the null hypothesis accepted or rejected? Provide a rationale for your answer.
- In Figure 2 , what nursing outcome had the lowest mean at HV1? Did this outcome improve over the four HVs? Provide a rationale for your answer.
- What nursing outcome had the highest mean at HV1 and at HV4? Was this outcome signifi -cantly different from HV1 to HV4? Provide a rationale for your answer.
- State the null hypothesis for the nursing outcome family participation in professional care.
- Was there a statistically signifi cant difference in family participation in professional care from HV1 to HV4? Was the null hypothesis accepted or rejected? Provide a rationale for your answer.
- Was Figure 2 helpful in understanding the nursing outcomes for patients with heart failure (HF) who received four HVs and telephone calls? Provide a rationale for your answer.
- What nursing interventions signifi cantly improved the nursing outcomes for these patients with HF? What implications for practice do you note from these study results?
Copyright © 2017, Elsevier Inc. All rights reserved. 79 Measures of Central Tendency : Mean, Median, and Mode EXERCISE 8 STATISTICAL TECHNIQUE IN REVIEW Mean, median, and mode are the three measures of central tendency used to describe study variables. These statistical techniques are calculated to determine the center of a distribution of data, and the central tendency that is calculated is determined by the level of measurement of the data (nominal, ordinal, interval, or ratio; see Exercise 1 ). The mode is a category or score that occurs with the greatest frequency in a distribution of scores in a data set. The mode is the only acceptable measure of central tendency for analyzing nominal-level data, which are not continuous and cannot be ranked, compared, or sub-jected to mathematical operations. If a distribution has two scores that occur more fre-quently than others (two modes), the distribution is called bimodal . A distribution with more than two modes is multimodal ( Grove, Burns, & Gray, 2013 ). The median ( MD ) is a score that lies in the middle of a rank-ordered list of values of a distribution. If a distribution consists of an odd number of scores, the MD is the middle score that divides the rest of the distribution into two equal parts, with half of the values falling above the middle score and half of the values falling below this score. In a distribu-tion with an even number of scores, the MD is half of the sum of the two middle numbers of that distribution. If several scores in a distribution are of the same value, then the MD will be the value of the middle score. The MD is the most precise measure of central ten-dency for ordinal-level data and for nonnormally distributed or skewed interval- or ratio-level data. The following formula can be used to calculate a median in a distribution of scores. Median()()MDN=+÷12 N is the number of scores ExampleMedianscoreth:N==+=÷=31311232216 ExampleMedianscoreth:.N==+=÷=404012412205 Thus in the second example, the median is halfway between the 20 th and the 21 st scores. The mean ( X ) is the arithmetic average of all scores of a sample, that is, the sum of its individual scores divided by the total number of scores. The mean is the most accurate measure of central tendency for normally distributed data measured at the interval and ratio levels and is only appropriate for these levels of data (Grove, Gray, & Burns, 2015). In a normal distribution, the mean, median, and mode are essentially equal (see Exercise 26 for determining the normality of a distribution). The mean is sensitive to extreme
Copyright © 2017, Elsevier Inc. All rights reserved. 291
Calculating Descriptive Statistics
There are two major classes of statistics: descriptive statistics and inferential statistics. Descriptive statistics are computed to reveal characteristics of the sample data set and to describe study variables. Inferential statistics are computed to gain information about effects and associations in the population being studied. For some types of studies, descriptive statistics will be the only approach to analysis of the data. For other studies, descriptive statistics are the fi rst step in the data analysis process, to be followed by infer-ential statistics. For all studies that involve numerical data, descriptive statistics are crucial in understanding the fundamental properties of the variables being studied. Exer-cise 27 focuses only on descriptive statistics and will illustrate the most common descrip-tive statistics computed in nursing research and provide examples using actual clinical data from empirical publications. MEASURES OF CENTRAL TENDENCY A measure of central tendency is a statistic that represents the center or middle of a frequency distribution. The three measures of central tendency commonly used in nursing research are the mode, median ( MD ), and mean ( X ). The mean is the arithmetic average of all of a variable ’ s values. The median is the exact middle value (or the average of the middle two values if there is an even number of observations). The mode is the most commonly occurring value or values (see Exercise 8 ). The following data have been collected from veterans with rheumatoid arthritis ( Tran, Hooker, Cipher, & Reimold, 2009 ). The values in Table 27-1 were extracted from a larger sample of veterans who had a history of biologic medication use (e.g., infl iximab [Remi-cade], etanercept [Enbrel]). Table 27-1 contains data collected from 10 veterans who had stopped taking biologic medications, and the variable represents the number of years that each veteran had taken the medication before stopping. Because the number of study subjects represented below is 10, the correct statistical notation to refl ect that number is: n=10 Note that the n is lowercase, because we are referring to a sample of veterans. If the data being presented represented the entire population of veterans, the correct notation is the uppercase N. Because most nursing research is conducted using samples, not popu-lations, all formulas in the subsequent exercises will incorporate the sample notation, n. Mode The mode is the numerical value or score that occurs with the greatest frequency; it does not necessarily indicate the center of the data set. The data in Table 27-1 contain two EXERCISE 27 292EXERCISE 27 • Calculating Descriptive StatisticsCopyright © 2017, Elsevier Inc. All rights reserved. modes: 1.5 and 3.0. Each of these numbers occurred twice in the data set. When two modes exist, the data set is referred to as bimodal ; a data set that contains more than two modes would be multimodal . Median The median ( MD ) is the score at the exact center of the ungrouped frequency distribution. It is the 50th percentile. To obtain the MD , sort the values from lowest to highest. If the number of values is an uneven number, exactly 50% of the values are above the MD and 50% are below it. If the number of values is an even number, the MD is the average of the two middle values. Thus the MD may not be an actual value in the data set. For example, the data in Table 27-1 consist of 10 observations, and therefore the MD is calculated as the average of the two middle values. MD=+()=15202175… Mean The most commonly reported measure of central tendency is the mean. The mean is the sum of the scores divided by the number of scores being summed. Thus like the MD, the mean may not be a member of the data set. The formula for calculating the mean is as follows: XXn=∑ where X = mean ∑ = sigma, the statistical symbol for summation X = a single value in the sample n = total number of values in the sample The mean number of years that the veterans used a biologic medication is calculated as follows: X=+++++++++()=010313151520223030401019………..years TABLE 27-1 DURATION OF BIOLOGIC USE AMONG VETERANS WITH RHEUMATOID ARTHRITIS ( n = 10) Duration of Biologic Use (years) 0.10.31.31.51.52.02.23.03.04.0 293Calculating Descriptive Statistics • EXERCISE 27Copyright © 2017, Elsevier Inc. All rights reserved. The mean is an appropriate measure of central tendency for approximately normally distributed populations with variables measured at the interval or ratio level. It is also appropriate for ordinal level data such as Likert scale values, where higher numbers rep-resent more of the construct being measured and lower numbers represent less of the construct (such as pain levels, patient satisfaction, depression, and health status). The mean is sensitive to extreme scores such as outliers. An outlier is a value in a sample data set that is unusually low or unusually high in the context of the rest of the sample data. An example of an outlier in the data presented in Table 27-1 might be a value such as 11. The existing values range from 0.1 to 4.0, meaning that no veteran used a biologic beyond 4 years. If an additional veteran were added to the sample and that person used a biologic for 11 years, the mean would be much larger: 2.7 years. Simply adding this outlier to the sample nearly doubled the mean value. The outlier would also change the frequency distribution. Without the outlier, the frequency distribution is approximately normal, as shown in Figure 27-1 . Including the outlier changes the shape of the distribution to appear positively skewed. Although the use of summary statistics has been the traditional approach to describing data or describing the characteristics of the sample before inferential statistical analysis, its ability to clarify the nature of data is limited. For example, using measures of central tendency, particularly the mean, to describe the nature of the data obscures the impact of extreme values or deviations in the data. Thus, signifi cant features in the data may be concealed or misrepresented. Often, anomalous, unexpected, or problematic data and discrepant patterns are evident, but are not regarded as meaningful. Measures of disper-sion, such as the range, difference scores, variance, and standard deviation ( SD ), provide important insight into the nature of the data. MEASURES OF DISPERSION Measures of dispersion , or variability, are measures of individual differences of the members of the population and sample. They indicate how values in a sample are dis-persed around the mean. These measures provide information about the data that is not available from measures of central tendency. They indicate how different the scores are—the extent to which individual values deviate from one another. If the individual values are similar, measures of variability are small and the sample is relatively homogeneous in terms of those values. Heterogeneity (wide variation in scores) is important in some statistical procedures, such as correlation. Heterogeneity is determined by measures of variability. The measures most commonly used are range, difference scores, variance, and SD (see Exercise 9 ). FIGURE 27-1 ■ FREQUENCY DISTRIBUTION OF YEARS OF BIOLOGIC USE, WITHOUT OUTLIER AND WITH OUTLIER. 0FrequencyFrequency3-3.90-0.92-2.91-1.94-4.93-3.90-.91-1.92-2.94-4.95-5.96-6.97-7.98-8.99-9.910-10.911-11.9Years of biologic useYears of biologic use3.02.52.01.51.00.503.02.52.01.51.00.5 294EXERCISE 27 • Calculating Descriptive StatisticsCopyright © 2017, Elsevier Inc. All rights reserved. Range The simplest measure of dispersion is the range . In published studies, range is presented in two ways: (1) the range is the lowest and highest scores, or (2) the range is calculated by subtracting the lowest score from the highest score. The range for the scores in Table 27-1 is 0.3 and 4.0, or it can be calculated as follows: 4.0 − 0.3 = 3.7. In this form, the range is a difference score that uses only the two extreme scores for the comparison. The range is generally reported but is not used in further analyses. Difference Score
s Difference scores are obtained by subtracting the mean from each score. Sometimes a difference score is referred to as a deviation score because it indicates the extent to which a score deviates from the mean. Of course, most variables in nursing research are not “scores,” yet the term difference score is used to represent a value ’ s deviation from the mean. The difference score is positive when the score is above the mean, and it is negative when the score is below the mean (see Table 27-2 ). Difference scores are the basis for many statistical analyses and can be found within many statistical equations. The formula for difference scores is: XX− Σof absolute values95:. TABLE 27-2 DIFFERENCE SCORES OF DURATION OF BIOLOGIC USE X –X XX– 0.1 − 1.9 − 1.80.3 − 1.9 − 1.61.3 − 1.9 − 0.61.5 − 1.9 − 0.41.5 − 1.9 − 0.42.0 − 1.90.12.2 − 1.90.33.0 − 1.91.13.0 − 1.91.14.0 − 1.92.1 The mean deviation is the average difference score, using the absolute values. The formula for the mean deviation is: XXXndeviation=−∑ In this example, the mean deviation is 0.95. This value was calculated by taking the sum of the absolute value of each difference score (1.8, 1.6, 0.6, 0.4, 0.4, 0.1, 0.3, 1.1, 1.1, 2.1) and dividing by 10. The result indicates that, on average, subjects ’ duration of biologic use deviated from the mean by 0.95 years. Variance Variance is another measure commonly used in statistical analysis. The equation for a sample variance ( s 2 ) is below. sXXn221=−()−∑ 295Calculating Descriptive Statistics • EXERCISE 27Copyright © 2017, Elsevier Inc. All rights reserved. Note that the lowercase letter s 2 is used to represent a sample variance. The lowercase Greek sigma ( σ 2 ) is used to represent a population variance, in which the denominator is N instead of n − 1. Because most nursing research is conducted using samples, not popu-lations, formulas in the subsequent exercises that contain a variance or standard deviation will incorporate the sample notation, using n − 1 as the denominator. Moreover, statistical software packages compute the variance and standard deviation using the sample formu-las, not the population formulas. The variance is always a positive value and has no upper limit. In general, the larger the variance, the larger the dispersion of scores. The variance is most often computed to derive the standard deviation because, unlike the variance, the standard deviation refl ects impor-tant properties about the frequency distribution of the variable it represents. Table 27-3 displays how we would compute a variance by hand, using the biologic duration data. s213419=. s²=1.49 TABLE 27-3 VARIANCE COMPUTATION OF BIOLOGIC USE X X XX– XX–(())2 0.1 − 1.9 − 1.83.240.3 − 1.9 − 1.62.561.3 − 1.9 − 0.60.361.5 − 1.9 − 0.40.161.5 − 1.9 − 0.40.162.0 − 1.90.10.012.2 − 1.90.30.093.0 − 1.91.11.213.0 − 1.91.11.214.0 − 1.92.14.41 Σ 13.41 Standard Deviation Standard deviation is a measure of dispersion that is the square root of the variance. The standard deviation is represented by the notation s or SD . The equation for obtaining a standard deviation is SDX=−()−∑Xn21 Table 27-3 displays the computations for the variance. To compute the SD , simply take the square root of the variance. We know that the variance of biologic duration is s 2 = 1.49. Therefore, the s of biologic duration is SD = 1.22. The SD is an important sta-tistic, both for understanding dispersion within a distribution and for interpreting the relationship of a particular value to the distribution. SAMPLING ERROR A standard error describes the extent of sampling error. For example, a standard error of the mean is calculated to determine the magnitude of the variability associated with the mean. A small standard error is an indication that the sample mean is close to 296EXERCISE 27 • Calculating Descriptive StatisticsCopyright © 2017, Elsevier Inc. All rights reserved. the population mean, while a large standard error yields less certainty that the sample mean approximates the population mean. The formula for the standard error of the mean ( sX ) is: ssnX= Using the biologic medication duration data, we know that the standard deviation of biologic duration is s = 1.22. Therefore, the standard error of the mean for biologic dura-tion is computed as follows: sX=12210. sX=039. The standard error of the mean for biologic duration is 0.39. Confi dence Intervals To determine how closely the sample mean approximates the population mean, the stan-dard error of the mean is used to build a confi dence interval. For that matter, a confi dence interval can be created for many statistics, such as a mean, proportion, and odds ratio. To build a confi dence interval around a statistic, you must have the standard error value and the t value to adjust the standard error. The degrees of freedom ( df ) to use to compute a confi dence interval is df = n − 1. To compute the confi dence interval for a mean, the lower and upper limits of that interval are created by multiplying the sX by the t statistic, where df = n − 1. For a 95% confi dence interval, the t value should be selected at α = 0.05. For a 99% confi dence inter-val, the t value should be selected at α = 0.01. Using the biologic medication duration data, we know that the standard error of the mean duration of biologic medication use is sX=039. . The mean duration of biologic medication use is 1.89. Therefore, the 95% confi dence interval for the mean duration of biologic medication use is computed as follows: XstX± 189039226…±()() 189088..± As referenced in Appendix A , the t value required for the 95% confi dence interval with df = 9 is 2.26. The computation above results in a lower limit of 1.01 and an upper limit of 2.77. This means that our confi dence interval of 1.01 to 2.77 estimates the population mean duration of biologic use with 95% confi dence ( Kline, 2004 ). Technically and math-ematically, it means that if we computed the mean duration of biologic medication use on an infi nite number of veterans, exactly 95% of the intervals would contain the true population mean, and 5% would not contain the population mean ( Gliner, Morgan, & Leech, 2009 ). If we were to compute a 99% confi dence interval, we would require the t value that is referenced at α = 0.01. Therefore, the 99% confi dence interval for the mean duration of biologic medication use is computed as follows: 189039325…±()() 189127..± 297Calculating Descriptive Statistics • EXERCISE 27Copyright © 2017, Elsevier Inc. All rights reserved. As referenced in Appendix A , the t value required for the 99% confi dence interval with df = 9 is 3.25. The computation above results in a lower limit of 0.62 and an upper limit of 3.16. This means that our confi dence interval of 0.62 to 3.16 estimates the population mean duration of biologic use with 99% confi dence. Degrees of Freedom The concept of degrees of freedom ( df ) was used in reference to computing a confi dence interval. For any statistical computation, degrees of freedom are the number of inde-pendent pieces of information that are free to vary in order to estimate another piece of information ( Zar, 2010 ). In the case of the confi dence interval, the degrees of freedom are n − 1. This means that there are n − 1 independent observations in the sample that are free to vary (to be any value) to estimate the lower and upper limits of the confi dence interval. SPSS COMPUTATIONS A retrospective descriptive study examined the duration of biologic use from veterans with rheumatoid arthritis ( Tran et al., 2009 ). The values in Table 27-4 were extracted from a larger sample of veterans who had a history of biologic medication use (e.g., infl iximab [Remicade], etanercept [Enbrel]). Table 27-4 contains simulated demographic data col-lected from 10 veterans who had stopped taking biologic medications. Age at study enroll-ment, duration of biologic use, race/ethnicity, gender (F = female), tobacco use (F = former use, C = current use, N = never used), primary diagnosis (3 = irritabl
e bowel syndrome, 4 = psoriatic arthritis, 5 = rheumatoid arthritis, 6 = reactive arthritis), and type of biologic medication used were among the study variables examined. TABLE 27-4 DEMOGRAPHIC VARIABLES OF VETERANS WITH RHEUMATOID ARTHRITIS Patient ID Duration (yrs) Age Race/Ethnicity Gender Tobacco Diagnosis Biologic 10.142CaucasianFF5Infl iximab20.341Black, not of Hispanic OriginFF5Etanercept31.356CaucasianFN5Infl iximab41.578CaucasianFF3Infl iximab51.586Black, not of Hispanic OriginFF4Etanercept62.049CaucasianFF6Etanercept72.282CaucasianFF5Infl iximab83.035CaucasianFN3Infl iximab93.059Black, not of Hispanic OriginFC3Infl iximab104.037CaucasianFF5Etanercept 298EXERCISE 27 • Calculating Descriptive StatisticsCopyright © 2017, Elsevier Inc. All rights reserved. This is how our data set looks in SPSS. Step 1: For a nominal variable, the appropriate descriptive statistics are frequencies and percentages. From the “Analyze” menu, choose “Descriptive Statistics” and “Frequen-cies.” Move “Race/Ethnicity and Gender” over to the right. Click “OK.” 299Calculating Descriptive Statistics • EXERCISE 27Copyright © 2017, Elsevier Inc. All rights reserved. Step 2: For a continuous variable, the appropriate descriptive statistics are means and standard deviations. From the “Analyze” menu, choose “Descriptive Statistics” and “Explore.” Move “Duration” over to the right. Click “OK.” INTERPRETATION OF SPSS OUTPUT The following tables are generated from SPSS. The fi rst set of tables (from the fi rst set of SPSS commands in Step 1) contains the frequencies of race/ethnicity and gender. Most (70%) were Caucasian, and 100% were female. Frequencies Frequency Table RaceEthnicityFrequencyPercentValid PercentCumulative PercentValidBlack, not of Hispanic Origin330.030.030.0Caucasian770.070.0100.0Total10100.0100.0GenderFrequencyPercentValid PercentCumulative PercentValidF10100.0100.0100.0 300EXERCISE 27 • Calculating Descriptive StatisticsCopyright © 2017, Elsevier Inc. All rights reserved. DescriptivesStatisticStd. ErrorDuration of Biologic Use1.890.3860Lower Bound1.017Upper Bound2.7631.8721.7501.4901.2206.14.03.92.0.159.687-.4371.334Mean95% Confidence Interval for Mean 5% Trimmed MeanMedianVarianceStd. DeviationMinimumMaximumRangeInterquartile RangeSkewnessKurtosis The second set of output (from the second set of SPSS commands in Step 2) contains the descriptive statistics for “Duration,” including the mean, s (standard deviation), SE , 95% confi dence interval for the mean, median, variance, minimum value, maximum value, range, and skewness and kurtosis statistics. As shown in the output, mean number of years for duration is 1.89, and the SD is 1.22. The 95% CI is 1.02–2.76. Explore 301Calculating Descriptive Statistics • EXERCISE 27Copyright © 2017, Elsevier Inc. All rights reserved. STUDY QUESTIONS 1. Defi ne mean. 2. What does this symbol, s 2 , represent? 3. Defi ne outlier. 4. Are there any outliers among the values representing duration of biologic use? 5. How would you interpret the 95% confi dence interval for the mean of duration of biologic use? 6. What percentage of patients were Black, not of Hispanic origin? 7. Can you compute the variance for duration of biologic use by using the information presented in the SPSS output above?
Copyright © 2017, Elsevier Inc. All rights reserved. 305 Questions to Be Graded
EXERCISE 27 Follow your instructor ’ s directions to submit your answers to the following questions for grading. Your instructor may ask you to write your answers below and submit them as a hard copy for grading. Alternatively, your instructor may ask you to use the space below for notes and submit your answers online at http://evolve.elsevier.com/Grove/statistics/ under “
Name: _______________________________________________________
Class: _____________________
Date:_____________________
Questions to Be Graded.”
- What is the mean age of the sample data?
- What percentage of patients never used tobacco?
- What is the standard deviation for age?
- Are there outliers among the values of age? Provide a rationale for your answer.
- What is the range of age values?
- What percentage of patients were taking infl iximab?
- What percentage of patients had rheumatoid arthritis as their primary diagnosis?
- What percentage of patients had irritable bowel syndrome as their primary diagnosis?
- What is the 95% CI for age?
- What percentage of patients had psoriatic arthritis as their primary diagnosis?
Copyright © 2017, Elsevier Inc. All rights reserved. 307 Calculating Pearson Product-Moment Correlation Coeffi cient Correlational analyses identify associations between two variables. There are many differ-ent kinds of statistics that yield a measure of correlation. All of these statistics address a research question or hypothesis that involves an association or relationship. Examples of research questions that are answered with correlation statistics are, “Is there an associa-tion between weight loss and depression?” and “Is there a relationship between patient satisfaction and health status?” A hypothesis is developed to identify the nature (positive or negative) of the relationship between the variables being studied. The Pearson product-moment correlation was the fi rst of the correlation measures developed and is the most commonly used. As is explained in Exercise 13 , this coeffi cient (statistic) is represented by the letter r , and the value of r is always between − 1.00 and + 1.00. A value of zero indicates no relationship between the two variables. A positive cor-relation indicates that higher values of x are associated with higher values of y . A negative or inverse correlation indicates that higher values of x are associated with lower values of y . The r value is indicative of the slope of the line (called a regression line) that can be drawn through a standard scatterplot of the two variables (see Exercise 11 ). The strengths of different relationships are identifi ed in Table 28-1 ( Cohen, 1988 ). EXERCISE 28 TABLE 28-1 STRENGTH OF ASSOCIATION FOR PEARSON r Strength of Association Positive Association Negative Association Weak association0.00 to < 0.300.00 to < − 0.30Moderate association0.30 to 0.49 − 0.49 to − 0.30Strong association0.50 or greater − 1.00 to − 0.50 RESEARCH DESIGNS APPROPRIATE FOR THE PEARSON r Research designs that may utilize the Pearson r include any associational design ( Gliner, Morgan, & Leech, 2009 ). The variables involved in the design are attributional, meaning the variables are characteristics of the participant, such as health status, blood pressure, gender, diagnosis, or ethnicity. Regardless of the nature of variables, the variables submit-ted to a Pearson correlation must be measured as continuous or at the interval or ratio level. STATISTICAL FORMULA AND ASSUMPTIONS Use of the Pearson correlation involves the following assumptions: 1. Interval or ratio measurement of both variables (e.g., age, income, blood pressure, cholesterol levels). However, if the variables are measured with a Likert scale, and the frequency distribution is approximately normally distributed, these data are
Statistics in Nursing
Nursing HomeworksStatistics in Nursing
Understanding Frequencies and Percentages STATISTICAL TECHNIQUE IN REVIEW Frequency is the number of times a score or value for a variable occurs in a set of data. Frequency distribution is a statistical procedure that involves listing all the possible values or scores for a variable in a study. Frequency distributions are used to organize study data for a detailed examination to help determine the presence of errors in coding or computer programming ( Grove, Burns, & Gray, 2013 ). In addition, frequencies and percentages are used to describe demographic and study variables measured at the nominal or ordinal levels. Percentage can be defi ned as a portion or part of the whole or a named amount in every hundred measures. For example, a sample of 100 subjects might include 40 females and 60 males. In this example, the whole is the sample of 100 subjects, and gender is described as including two parts, 40 females and 60 males. A percentage is calculated by dividing the smaller number, which would be a part of the whole, by the larger number, which represents the whole. The result of this calculation is then multiplied by 100%. For example, if 14 nurses out of a total of 62 are working on a given day, you can divide 14 by 62 and multiply by 100% to calculate the percentage of nurses working that day. Calculations: (14 ÷ 62) × 100% = 0.2258 × 100% = 22.58% = 22.6%. The answer also might be expressed as a whole percentage, which would be 23% in this example. A cumulative percentage distribution involves the summing of percentages from the top of a table to the bottom. Therefore the bottom category has a cumulative percentage of 100% (Grove, Gray, & Burns, 2015). Cumulative percentages can also be used to deter-mine percentile ranks, especially when discussing standardized scores. For example, if 75% of a group scored equal to or lower than a particular examinee ’ s score, then that examinee ’ s rank is at the 75 th percentile. When reported as a percentile rank, the percentage is often rounded to the nearest whole number. Percentile ranks can be used to analyze ordinal data that can be assigned to categories that can be ranked. Percentile ranks and cumulative percentages might also be used in any frequency distribution where subjects have only one value for a variable. For example, demographic characteristics are usually reported with the frequency ( f ) or number ( n ) of subjects and percentage (%) of subjects for each level of a demographic variable. Income level is presented as an example for 200 subjects: Income Level Frequency ( f ) Percentage (%) Cumulative % 1. < $40,000 2010%10% 2. $40,000–$59,999 5025%35% 3. $60,000–$79,999 8040%75% 4. $80,000–$100,000 4020%95% 5. > $100,000 105%100% EXERCISE 6 60EXERCISE 6 • Understanding Frequencies and PercentagesCopyright © 2017, Elsevier Inc. All rights reserved. In data analysis, percentage distributions can be used to compare fi ndings from different studies that have different sample sizes, and these distributions are usually arranged in tables in order either from greatest to least or least to greatest percentages ( Plichta & Kelvin, 2013 ). RESEARCH ARTICLE Source Eckerblad, J., Tödt, K., Jakobsson, P., Unosson, M., Skargren, E., Kentsson, M., & Thean-der, K. (2014). Symptom burden in stable COPD patients with moderate to severe airfl ow limitation. Heart & Lung, 43 (4), 351–357. Introduction Eckerblad and colleagues (2014 , p. 351) conducted a comparative descriptive study to examine the symptoms of “patients with stable chronic obstructive pulmonary disease (COPD) and determine whether symptom experience differed between patients with mod-erate or severe airfl ow limitations.” The Memorial Symptom Assessment Scale (MSAS) was used to measure the symptoms of 42 outpatients with moderate airfl ow limitations and 49 patients with severe airfl ow limitations. The results indicated that the mean number of symptoms was 7.9 ( ± 4.3) for both groups combined, with no signifi cant dif-ferences found in symptoms between the patients with moderate and severe airfl ow limi-tations. For patients with the highest MSAS symptom burden scores in both the moderate and the severe limitations groups, the symptoms most frequently experienced included shortness of breath, dry mouth, cough, sleep problems, and lack of energy. The research-ers concluded that patients with moderate or severe airfl ow limitations experienced mul-tiple severe symptoms that caused high levels of distress. Quality assessment of COPD patients ’ physical and psychological symptoms is needed to improve the management of their symptoms. Relevant Study Results Eckerblad et al. (2014 , p. 353) noted in their research report that “In total, 91 patients assessed with MSAS met the criteria for moderate ( n = 42) or severe airfl ow limitations ( n = 49). Of those 91 patients, 47% were men, and 53% were women, with a mean age of 68 ( ± 7) years for men and 67 ( ± 8) years for women. The majority (70%) of patients were married or cohabitating. In addition, 61% were retired, and 15% were on sick leave. Twenty-eight percent of the patients still smoked, and 69% had stopped smoking. The mean BMI (kg/m 2 ) was 26.8 ( ± 5.7). There were no signifi cant differences in demographic characteristics, smoking history, or BMI between patients with moderate and severe airfl ow limitations ( Table 1 ). A lower proportion of patients with moderate airfl ow limitation used inhalation treatment with glucocorticosteroids, long-acting β 2 -agonists and short-acting β 2 -agonists, but a higher proportion used analgesics compared with patients with severe airfl ow limitation. Symptom prevalence and symptom experience The patients reported multiple symptoms with a mean number of 7.9 ( ± 4.3) symptoms (median = 7, range 0–32) for the total sample, 8.1 ( ± 4.4) for moderate airfl ow limitation and 7.7 ( ± 4.3) for severe airfl ow limitation ( p = 0.36) . . . . Highly prevalent physical symp-toms ( ≥ 50% of the total sample) were shortness of breath (90%), cough (65%), dry mouth (65%), and lack of energy (55%). Five additional physical symptoms, feeling drowsy Understanding Frequencies and Percentages • EXERCISE 6Copyright © 2017, Elsevier Inc. All rights reserved. TABLE 1 BACKGROUND CHARACTERISTICS AND USE OF MEDICATION FOR PATIENTS WITH STABLE CHRONIC OBSTRUCTIVE LUNG DISEASE CLASSIFIED IN PATIENTS WITH MODERATE AND SEVERE AIRFLOW LIMITATION Moderate n = 42 Severe n = 49 p Value Sex, n (%)0.607 Women19 (45)29 (59) Men23 (55)20 (41)Age (yrs), mean ( SD )66.5 (8.6)67.9 (6.8)0.396Married/cohabitant n (%)29 (69)34 (71)0.854Employed, n (%)7 (17)7 (14)0.754Smoking, n %0.789 Smoking13 (31)12 (24) Former smokers28 (67)35 (71) Never smokers1 (2)2 (4)Pack years smoking, mean ( SD )29.1 (13.5)34.0 (19.5)0.177BMI (kg/m 2 ), mean ( SD )27.2 (5.2)26.5 (6.1)0.555FEV 1 % of predicted, mean ( SD )61.6 (8.4)42.2 (5.8) < 0.001SpO 2 % mean ( SD )95.8 (2.4)94.5 (3.0)0.009Physical health, mean ( SD )3.2 (0.8)3.0 (0.8)0.120Mental health, mean ( SD )3.7 (0.9)3.6 (1.0)0.628Exacerbation previous 6 months, n (%)14 (33)15 (31)0.781Admitted to hospital previous year, n (%)10 (24)14 (29)0.607Medication use, n (%) Inhaled glucocorticosteroids30 (71)44 (90)0.025 Systemic glucocorticosteroids3 (6.3)0 (0)0.094 Anticholinergic32 (76)42 (86)0.245 Long-acting β 2 -agonists30 (71)45 (92)0.011 Short-acting β 2 -agonists13 (31)32 (65)0.001 Analgesics11 (26)5 (10)0.046 Statins8 (19)11 (23)0.691 Eckerblad, J., Tödt, K., Jakobsson, P., Unosson, M., Skargren, E., Kentsson, M., & Theander, K. (2014). Symptom burden in stable COPD patients with moderate to severe airfl ow limitation. Heart & Lung, 43 (4), p. 353. numbness/tingling in hands/feet, feeling irritable, and dizziness, were reported by between 25% and 50% of the patients. The most commonly reported psychological symptom was diffi culty sleeping (52%), followed by worrying (33%), feeling irritable (28%) and feeling sad (22%). There were no signifi cant differences in the occurrence of physical and psy-chological symptoms between patients with moderate a
nd severe airfl ow limitations” ( Eckerblad et al., 2014 , p. 353). 62EXERCISE 6 • Understanding Frequencies and PercentagesCopyright © 2017, Elsevier Inc. All rights reserved. STUDY QUESTIONS 1. What are the frequency and percentage of women in the moderate airfl ow limitation group? 2. What were the frequencies and percentages of the moderate and the severe airfl ow limitation groups who experienced an exacerbation in the previous 6 months? 3. What is the total sample size of COPD patients included in this study? What number or fre-quency of the subjects is married/cohabitating? What percentage of the total sample is married or cohabitating? 4. Were the moderate and severe airfl ow limitation groups signifi cantly different regarding married/cohabitating status? Provide a rationale for your answer. 5. List at least three other relevant demographic variables the researchers might have gathered data on to describe this study sample. 6. For the total sample, what physical symptoms were experienced by ≥ 50% of the subjects? Identify the physical symptoms and the percentages of the total sample experiencing each symptom.
Interpreting Line Graphs EXERCISE 7
69 Interpreting Line Graphs STATISTICAL TECHNIQUE IN REVIEW Tables and fi gures are commonly used to present fi ndings from studies or to provide a way for researchers to become familiar with research data. Using fi gures, researchers are able to illustrate the results from descriptive data analyses, assist in identifying patterns in data, identify changes over time, and interpret exploratory fi ndings. A line graph is a fi gure that is developed by joining a series of plotted points with a line to illustrate how a variable changes over time. A line graph fi gure includes a horizontal scale, or x -axis, and a vertical scale, or y -axis. The x -axis is used to document time, and the y -axis is used to document the mean scores or values for a variable ( Grove, Burns, & Gray, 2013 ; Plichta & Kelvin, 2013 ). Researchers might include a line graph to compare the values for three or four variables in a study or to identify the changes in groups for a selected variable over time. For example, Figure 7-1 presents a line graph that documents time in weeks on the x -axis and mean weight loss in pounds on the y -axis for an experimental group consuming a low carbohydrate diet and a control group consuming a standard diet. This line graph illustrates the trend of a strong, steady increase in the mean weight lost by the experimental or intervention group and minimal mean weight loss by the control group. EXERCISE 7 FIGURE 7-1 ■ LINE GRAPH COMPARING EXPERIMENTAL AND CONTROL GROUPS FOR WEIGHT LOSS OVER FOUR WEEKS. Weight loss (lbs)Weeksy-axisx-axisControlExperimental10864201234 70EXERCISE 7 • Interpreting Line GraphsCopyright © 2017, Elsevier Inc. All rights reserved. RESEARCH ARTICLE Source Azzolin, K., Mussi, C. M., Ruschel, K. B., de Souza, E. N., Lucena, A. D., & Rabelo-Silva, E. R. (2013). Effectiveness of nursing interventions in heart failure patients in home care using NANDA-I, NIC, and NOC. Applied Nursing Research, 26 (4), 239–244. Introduction Azzolin and colleagues (2013) analyzed data from a larger randomized clinical trial to determine the effectiveness of 11 nursing interventions (NIC) on selected nursing out-comes (NOC) in a sample of patients with heart failure (HF) receiving home care. A total of 23 patients with HF were followed for 6 months after hospital discharge and provided four home visits and four telephone calls. The home visits and phone calls were organized using the nursing diagnoses from the North American Nursing Diagnosis Association International (NANDA-I) classifi cation list. The researchers found that eight nursing interven tions signifi cantly improved the nursing outcomes for these HF patients. Those interventions included “health education, self-modifi cation assistance, behavior modifi -cation, telephone consultation, nutritional counselling, teaching: prescribed medications, teaching: disease process, and energy management” ( Azzolin et al., 2013 , p. 243). The researchers concluded that the NANDA-I, NIC, and NOC linkages were useful in manag-ing patients with HF in their home. Relevant Study Results Azzolin and colleagues (2013) presented their results in a line graph format to display the nursing outcome changes over the 6 months of the home visits and phone calls. The nursing outcomes were measured with a fi ve-point Likert scale with 1 = worst and 5 = best. “Of the eight outcomes selected and measured during the visits, four belonged to the health & knowledge behavior domain (50%), as follows: knowledge: treatment regimen; compliance behavior; knowledge: medication; and symptom control. Signifi cant increases were observed in this domain for all outcomes when comparing mean scores obtained at visits no. 1 and 4 ( Figure 1 ; p < 0.001 for all comparisons). The other four outcomes assessed belong to three different NOC domains, namely, functional health (activity tolerance and energy conservation), physiologic health (fl uid balance), and family health (family participation in professional care). The scores obtained for activity tolerance and energy conservation increased signifi cantly from visit no. 1 to visit no. 4 ( p = 0.004 and p < 0.001, respectively). Fluid balance and family participation in professional care did not show statistically signifi cant differences ( p = 0.848 and p = 0.101, respectively) ( Figure 2 )” ( Azzolin et al., 2013 , p. 241). The signifi cance level or alpha ( α ) was set at 0.05 for this study. Interpreting Line Graphs • EXERCISE 7Copyright © 2017, Elsevier Inc. All rights reserved. FIGURE 2 ■ NURSING OUTCOMES MEASURED OVER 6 MONTHS (OTHER DOMAINS): Activity tolerance (95% CI − 1.38 to − 0.18, p = 0.004); energy conservation (95% CI − 0.62 to − 0.19, p < 0.001); fl uid balance (95% CI − 0.25 to 0.07, p = .848); family participation in professional care (95% CI − 2.31 to − 0.11, p = 0.101). HV = home visit. CI = confi dence interval. Azzolin, K., Mussi, C. M., Ruschel, K. B., de Souza, E. N., Lucena, A. D., & Rabelo-Silva, E. R. (2013). Effectiveness of nursing interventions in heart failure patients in home care using NANDA-I, NIC, and NOC. Applied Nursing Research, 26 (4), p. 242. 5.04.54.03.53.02.52.01.51.00.50MeanHV1HV2HV3HV4Fluid balanceFamily participationin professional careActivity toleranceEnergy conservation FIGURE 1 ■ NURSING OUTCOMES MEASURED OVER 6 MONTHS (HEALTH & KNOWLEDGE BEHAVIOR DOMAIN): Knowledge: medication (95% CI − 1.66 to − 0.87, p < 0.001); knowledge: treatment regimen (95% CI − 1.53 to − 0.98, p < 0.001); symptom control (95% CI − 1.93 to − 0.95, p < 0.001); and compliance behavior (95% CI − 1.24 to − 0.56, p < 0.001). HV = home visit. CI = confi dence interval. 5.04.54.03.53.02.52.01.51.00.50MeanHV1HV2HV3HV4Compliance behaviorSymptom controlKnowledge: medicationKnowledge: treatment reg 72EXERCISE 7 • Interpreting Line GraphsCopyright © 2017, Elsevier Inc. All rights reserved. STUDY QUESTIONS 1. What is the purpose of a line graph? What elements are included in a line graph? 2. Review Figure 1 and identify the focus of the x -axis and the y -axis. What is the time frame for the x -axis? What variables are presented on this line graph? 3. In Figure 1 , did the nursing outcome compliance behavior change over the 6 months of home visits? Provide a rationale for your answer. 4. State the null hypothesis for the nursing outcome compliance behavior. 5. Was there a signifi cant difference in compliance behavior from the fi rst home visit (HV1) to the fourth home visit (HV4)? Was the null hypothesis accepted or rejected? Provide a rationale for your answer. 6. In Figure 1 , what outcome had the lowest mean at HV1? Did this outcome improve over the four home visits? Provide a rationale for your answer.
Copyright © 2017, Elsevier Inc. All rights reserved. 77
Questions to Be Graded EXERCISE 7 Follow your instructor ’ s directions to submit your answers to the following questions for grading. Your instructor may ask you to write your answers below and submit them as a hard copy for grading. Alternatively, your instructor may ask you to use the space below for notes and submit your answers online at http://evolve.elsevier.com/Grove/statistics/ under “Questions to Be Graded.”
EXERCISE 8 STATISTICAL TECHNIQUE IN REVIEW Mean, median, and mode are the three measures of central tendency used to describe study variables. These statistical techniques are calculated to determine the center of a distribution of data, and the central tendency that is calculated is determined by the level of measurement of the data (nominal, ordinal, interval, or ratio; see Exercise 1 ). The mode is a category or score that occurs with the greatest frequency in a distribution of scores in a data set. The mode is the only acceptable measure of central tendency for analyzing nominal-level data, which are not continuous and cannot be ranked, compared, or sub-jected to mathematical operations. If a distribution has two scores that occur more fre-quently than others (two modes), the distribution is called bimodal . A distribution with more than two modes is multimodal ( Grove, Burns, & Gray, 2013 ). The median ( MD ) is a score that lies in the middle of a rank-ordered list of values of a distribution. If a distribution consists of an odd number of scores, the MD is the middle score that divides the rest of the distribution into two equal parts, with half of the values falling above the middle score and half of the values falling below this score. In a distribu-tion with an even number of scores, the MD is half of the sum of the two middle numbers of that distribution. If several scores in a distribution are of the same value, then the MD will be the value of the middle score. The MD is the most precise measure of central ten-dency for ordinal-level data and for nonnormally distributed or skewed interval- or ratio-level data. The following formula can be used to calculate a median in a distribution of scores. Median()()MDN=+÷12 N is the number of scores ExampleMedianscoreth:N==+=÷=31311232216 ExampleMedianscoreth:.N==+=÷=404012412205 Thus in the second example, the median is halfway between the 20 th and the 21 st scores. The mean ( X ) is the arithmetic average of all scores of a sample, that is, the sum of its individual scores divided by the total number of scores. The mean is the most accurate measure of central tendency for normally distributed data measured at the interval and ratio levels and is only appropriate for these levels of data (Grove, Gray, & Burns, 2015). In a normal distribution, the mean, median, and mode are essentially equal (see Exercise 26 for determining the normality of a distribution). The mean is sensitive to extreme
Copyright © 2017, Elsevier Inc. All rights reserved. 77 Questions to Be Graded EXERCISE 7 Follow your instructor ’ s directions to submit your answers to the following questions for grading. Your instructor may ask you to write your answers below and submit them as a hard copy for grading. Alternatively, your instructor may ask you to use the space below for notes and submit your answers online at http://evolve.elsevier.com/Grove/statistics/ under “Questions to Be Graded.” 1. What is the focus of the example Figure 7-1 in the section introducing the statistical technique of this exercise? 2. In Figure 2 of the Azzolin et al. (2013 , p. 242) study, did the nursing outcome activity tolerance change over the 6 months of home visits (HVs) and telephone calls? Provide a rationale for your answer. 3. State the null hypothesis for the nursing outcome activity tolerance. 4. Was there a signifi cant difference in activity tolerance from the fi rst home visit (HV1) to the fourth home visit (HV4)? Was the null hypothesis accepted or rejected? Provide a rationale for your answer. Name: _______________________________________________________ Class: _____________________ Date: ___________________________________________________________________________________ 78EXERCISE 7 • Interpreting Line GraphsCopyright © 2017, Elsevier Inc. All rights reserved. 5. In Figure 2 , what nursing outcome had the lowest mean at HV1? Did this outcome improve over the four HVs? Provide a rationale for your answer. 6. What nursing outcome had the highest mean at HV1 and at HV4? Was this outcome signifi -cantly different from HV1 to HV4? Provide a rationale for your answer. 7. State the null hypothesis for the nursing outcome family participation in professional care. 8. Was there a statistically signifi cant difference in family participation in professional care from HV1 to HV4? Was the null hypothesis accepted or rejected? Provide a rationale for your answer. 9. Was Figure 2 helpful in understanding the nursing outcomes for patients with heart failure (HF) who received four HVs and telephone calls? Provide a rationale for your answer. 10. What nursing interventions signifi cantly improved the nursing outcomes for these patients with HF? What implications for practice do you note from these study results? Copyright © 2017, Elsevier Inc. All rights reserved. 79 Measures of Central Tendency : Mean, Median, and Mode EXERCISE 8 STATISTICAL TECHNIQUE IN REVIEW Mean, median, and mode are the three measures of central tendency used to describe study variables. These statistical techniques are calculated to determine the center of a distribution of data, and the central tendency that is calculated is determined by the level of measurement of the data (nominal, ordinal, interval, or ratio; see Exercise 1 ). The mode is a category or score that occurs with the greatest frequency in a distribution of scores in a data set. The mode is the only acceptable measure of central tendency for analyzing nominal-level data, which are not continuous and cannot be ranked, compared, or sub-jected to mathematical operations. If a distribution has two scores that occur more fre-quently than others (two modes), the distribution is called bimodal . A distribution with more than two modes is multimodal ( Grove, Burns, & Gray, 2013 ). The median ( MD ) is a score that lies in the middle of a rank-ordered list of values of a distribution. If a distribution consists of an odd number of scores, the MD is the middle score that divides the rest of the distribution into two equal parts, with half of the values falling above the middle score and half of the values falling below this score. In a distribu-tion with an even number of scores, the MD is half of the sum of the two middle numbers of that distribution. If several scores in a distribution are of the same value, then the MD will be the value of the middle score. The MD is the most precise measure of central ten-dency for ordinal-level data and for nonnormally distributed or skewed interval- or ratio-level data. The following formula can be used to calculate a median in a distribution of scores. Median()()MDN=+÷12 N is the number of scores ExampleMedianscoreth:N==+=÷=31311232216 ExampleMedianscoreth:.N==+=÷=404012412205 Thus in the second example, the median is halfway between the 20 th and the 21 st scores. The mean ( X ) is the arithmetic average of all scores of a sample, that is, the sum of its individual scores divided by the total number of scores. The mean is the most accurate measure of central tendency for normally distributed data measured at the interval and ratio levels and is only appropriate for these levels of data (Grove, Gray, & Burns, 2015). In a normal distribution, the mean, median, and mode are essentially equal (see Exercise 26 for determining the normality of a distribution). The mean is sensitive to extreme
Copyright © 2017, Elsevier Inc. All rights reserved. 77 Questions to Be Graded EXERCISE 7 Follow your instructor ’ s directions to submit your answers to the following questions for grading. Your instructor may ask you to write your answers below and submit them as a hard copy for grading. Alternatively, your instructor may ask you to use the space below for notes and submit your answers online at http://evolve.elsevier.com/Grove/statistics/ under “Questions to Be Graded.”
Copyright © 2017, Elsevier Inc. All rights reserved. 79 Measures of Central Tendency : Mean, Median, and Mode EXERCISE 8 STATISTICAL TECHNIQUE IN REVIEW Mean, median, and mode are the three measures of central tendency used to describe study variables. These statistical techniques are calculated to determine the center of a distribution of data, and the central tendency that is calculated is determined by the level of measurement of the data (nominal, ordinal, interval, or ratio; see Exercise 1 ). The mode is a category or score that occurs with the greatest frequency in a distribution of scores in a data set. The mode is the only acceptable measure of central tendency for analyzing nominal-level data, which are not continuous and cannot be ranked, compared, or sub-jected to mathematical operations. If a distribution has two scores that occur more fre-quently than others (two modes), the distribution is called bimodal . A distribution with more than two modes is multimodal ( Grove, Burns, & Gray, 2013 ). The median ( MD ) is a score that lies in the middle of a rank-ordered list of values of a distribution. If a distribution consists of an odd number of scores, the MD is the middle score that divides the rest of the distribution into two equal parts, with half of the values falling above the middle score and half of the values falling below this score. In a distribu-tion with an even number of scores, the MD is half of the sum of the two middle numbers of that distribution. If several scores in a distribution are of the same value, then the MD will be the value of the middle score. The MD is the most precise measure of central ten-dency for ordinal-level data and for nonnormally distributed or skewed interval- or ratio-level data. The following formula can be used to calculate a median in a distribution of scores. Median()()MDN=+÷12 N is the number of scores ExampleMedianscoreth:N==+=÷=31311232216 ExampleMedianscoreth:.N==+=÷=404012412205 Thus in the second example, the median is halfway between the 20 th and the 21 st scores. The mean ( X ) is the arithmetic average of all scores of a sample, that is, the sum of its individual scores divided by the total number of scores. The mean is the most accurate measure of central tendency for normally distributed data measured at the interval and ratio levels and is only appropriate for these levels of data (Grove, Gray, & Burns, 2015). In a normal distribution, the mean, median, and mode are essentially equal (see Exercise 26 for determining the normality of a distribution). The mean is sensitive to extreme
Copyright © 2017, Elsevier Inc. All rights reserved. 291
Calculating Descriptive Statistics
There are two major classes of statistics: descriptive statistics and inferential statistics. Descriptive statistics are computed to reveal characteristics of the sample data set and to describe study variables. Inferential statistics are computed to gain information about effects and associations in the population being studied. For some types of studies, descriptive statistics will be the only approach to analysis of the data. For other studies, descriptive statistics are the fi rst step in the data analysis process, to be followed by infer-ential statistics. For all studies that involve numerical data, descriptive statistics are crucial in understanding the fundamental properties of the variables being studied. Exer-cise 27 focuses only on descriptive statistics and will illustrate the most common descrip-tive statistics computed in nursing research and provide examples using actual clinical data from empirical publications. MEASURES OF CENTRAL TENDENCY A measure of central tendency is a statistic that represents the center or middle of a frequency distribution. The three measures of central tendency commonly used in nursing research are the mode, median ( MD ), and mean ( X ). The mean is the arithmetic average of all of a variable ’ s values. The median is the exact middle value (or the average of the middle two values if there is an even number of observations). The mode is the most commonly occurring value or values (see Exercise 8 ). The following data have been collected from veterans with rheumatoid arthritis ( Tran, Hooker, Cipher, & Reimold, 2009 ). The values in Table 27-1 were extracted from a larger sample of veterans who had a history of biologic medication use (e.g., infl iximab [Remi-cade], etanercept [Enbrel]). Table 27-1 contains data collected from 10 veterans who had stopped taking biologic medications, and the variable represents the number of years that each veteran had taken the medication before stopping. Because the number of study subjects represented below is 10, the correct statistical notation to refl ect that number is: n=10 Note that the n is lowercase, because we are referring to a sample of veterans. If the data being presented represented the entire population of veterans, the correct notation is the uppercase N. Because most nursing research is conducted using samples, not popu-lations, all formulas in the subsequent exercises will incorporate the sample notation, n. Mode The mode is the numerical value or score that occurs with the greatest frequency; it does not necessarily indicate the center of the data set. The data in Table 27-1 contain two EXERCISE 27 292EXERCISE 27 • Calculating Descriptive StatisticsCopyright © 2017, Elsevier Inc. All rights reserved. modes: 1.5 and 3.0. Each of these numbers occurred twice in the data set. When two modes exist, the data set is referred to as bimodal ; a data set that contains more than two modes would be multimodal . Median The median ( MD ) is the score at the exact center of the ungrouped frequency distribution. It is the 50th percentile. To obtain the MD , sort the values from lowest to highest. If the number of values is an uneven number, exactly 50% of the values are above the MD and 50% are below it. If the number of values is an even number, the MD is the average of the two middle values. Thus the MD may not be an actual value in the data set. For example, the data in Table 27-1 consist of 10 observations, and therefore the MD is calculated as the average of the two middle values. MD=+()=15202175… Mean The most commonly reported measure of central tendency is the mean. The mean is the sum of the scores divided by the number of scores being summed. Thus like the MD, the mean may not be a member of the data set. The formula for calculating the mean is as follows: XXn=∑ where X = mean ∑ = sigma, the statistical symbol for summation X = a single value in the sample n = total number of values in the sample The mean number of years that the veterans used a biologic medication is calculated as follows: X=+++++++++()=010313151520223030401019………..years TABLE 27-1 DURATION OF BIOLOGIC USE AMONG VETERANS WITH RHEUMATOID ARTHRITIS ( n = 10) Duration of Biologic Use (years) 0.10.31.31.51.52.02.23.03.04.0 293Calculating Descriptive Statistics • EXERCISE 27Copyright © 2017, Elsevier Inc. All rights reserved. The mean is an appropriate measure of central tendency for approximately normally distributed populations with variables measured at the interval or ratio level. It is also appropriate for ordinal level data such as Likert scale values, where higher numbers rep-resent more of the construct being measured and lower numbers represent less of the construct (such as pain levels, patient satisfaction, depression, and health status). The mean is sensitive to extreme scores such as outliers. An outlier is a value in a sample data set that is unusually low or unusually high in the context of the rest of the sample data. An example of an outlier in the data presented in Table 27-1 might be a value such as 11. The existing values range from 0.1 to 4.0, meaning that no veteran used a biologic beyond 4 years. If an additional veteran were added to the sample and that person used a biologic for 11 years, the mean would be much larger: 2.7 years. Simply adding this outlier to the sample nearly doubled the mean value. The outlier would also change the frequency distribution. Without the outlier, the frequency distribution is approximately normal, as shown in Figure 27-1 . Including the outlier changes the shape of the distribution to appear positively skewed. Although the use of summary statistics has been the traditional approach to describing data or describing the characteristics of the sample before inferential statistical analysis, its ability to clarify the nature of data is limited. For example, using measures of central tendency, particularly the mean, to describe the nature of the data obscures the impact of extreme values or deviations in the data. Thus, signifi cant features in the data may be concealed or misrepresented. Often, anomalous, unexpected, or problematic data and discrepant patterns are evident, but are not regarded as meaningful. Measures of disper-sion, such as the range, difference scores, variance, and standard deviation ( SD ), provide important insight into the nature of the data. MEASURES OF DISPERSION Measures of dispersion , or variability, are measures of individual differences of the members of the population and sample. They indicate how values in a sample are dis-persed around the mean. These measures provide information about the data that is not available from measures of central tendency. They indicate how different the scores are—the extent to which individual values deviate from one another. If the individual values are similar, measures of variability are small and the sample is relatively homogeneous in terms of those values. Heterogeneity (wide variation in scores) is important in some statistical procedures, such as correlation. Heterogeneity is determined by measures of variability. The measures most commonly used are range, difference scores, variance, and SD (see Exercise 9 ). FIGURE 27-1 ■ FREQUENCY DISTRIBUTION OF YEARS OF BIOLOGIC USE, WITHOUT OUTLIER AND WITH OUTLIER. 0FrequencyFrequency3-3.90-0.92-2.91-1.94-4.93-3.90-.91-1.92-2.94-4.95-5.96-6.97-7.98-8.99-9.910-10.911-11.9Years of biologic useYears of biologic use3.02.52.01.51.00.503.02.52.01.51.00.5 294EXERCISE 27 • Calculating Descriptive StatisticsCopyright © 2017, Elsevier Inc. All rights reserved. Range The simplest measure of dispersion is the range . In published studies, range is presented in two ways: (1) the range is the lowest and highest scores, or (2) the range is calculated by subtracting the lowest score from the highest score. The range for the scores in Table 27-1 is 0.3 and 4.0, or it can be calculated as follows: 4.0 − 0.3 = 3.7. In this form, the range is a difference score that uses only the two extreme scores for the comparison. The range is generally reported but is not used in further analyses. Difference Score
s Difference scores are obtained by subtracting the mean from each score. Sometimes a difference score is referred to as a deviation score because it indicates the extent to which a score deviates from the mean. Of course, most variables in nursing research are not “scores,” yet the term difference score is used to represent a value ’ s deviation from the mean. The difference score is positive when the score is above the mean, and it is negative when the score is below the mean (see Table 27-2 ). Difference scores are the basis for many statistical analyses and can be found within many statistical equations. The formula for difference scores is: XX− Σof absolute values95:. TABLE 27-2 DIFFERENCE SCORES OF DURATION OF BIOLOGIC USE X –X XX– 0.1 − 1.9 − 1.80.3 − 1.9 − 1.61.3 − 1.9 − 0.61.5 − 1.9 − 0.41.5 − 1.9 − 0.42.0 − 1.90.12.2 − 1.90.33.0 − 1.91.13.0 − 1.91.14.0 − 1.92.1 The mean deviation is the average difference score, using the absolute values. The formula for the mean deviation is: XXXndeviation=−∑ In this example, the mean deviation is 0.95. This value was calculated by taking the sum of the absolute value of each difference score (1.8, 1.6, 0.6, 0.4, 0.4, 0.1, 0.3, 1.1, 1.1, 2.1) and dividing by 10. The result indicates that, on average, subjects ’ duration of biologic use deviated from the mean by 0.95 years. Variance Variance is another measure commonly used in statistical analysis. The equation for a sample variance ( s 2 ) is below. sXXn221=−()−∑ 295Calculating Descriptive Statistics • EXERCISE 27Copyright © 2017, Elsevier Inc. All rights reserved. Note that the lowercase letter s 2 is used to represent a sample variance. The lowercase Greek sigma ( σ 2 ) is used to represent a population variance, in which the denominator is N instead of n − 1. Because most nursing research is conducted using samples, not popu-lations, formulas in the subsequent exercises that contain a variance or standard deviation will incorporate the sample notation, using n − 1 as the denominator. Moreover, statistical software packages compute the variance and standard deviation using the sample formu-las, not the population formulas. The variance is always a positive value and has no upper limit. In general, the larger the variance, the larger the dispersion of scores. The variance is most often computed to derive the standard deviation because, unlike the variance, the standard deviation refl ects impor-tant properties about the frequency distribution of the variable it represents. Table 27-3 displays how we would compute a variance by hand, using the biologic duration data. s213419=. s²=1.49 TABLE 27-3 VARIANCE COMPUTATION OF BIOLOGIC USE X X XX– XX–(())2 0.1 − 1.9 − 1.83.240.3 − 1.9 − 1.62.561.3 − 1.9 − 0.60.361.5 − 1.9 − 0.40.161.5 − 1.9 − 0.40.162.0 − 1.90.10.012.2 − 1.90.30.093.0 − 1.91.11.213.0 − 1.91.11.214.0 − 1.92.14.41 Σ 13.41 Standard Deviation Standard deviation is a measure of dispersion that is the square root of the variance. The standard deviation is represented by the notation s or SD . The equation for obtaining a standard deviation is SDX=−()−∑Xn21 Table 27-3 displays the computations for the variance. To compute the SD , simply take the square root of the variance. We know that the variance of biologic duration is s 2 = 1.49. Therefore, the s of biologic duration is SD = 1.22. The SD is an important sta-tistic, both for understanding dispersion within a distribution and for interpreting the relationship of a particular value to the distribution. SAMPLING ERROR A standard error describes the extent of sampling error. For example, a standard error of the mean is calculated to determine the magnitude of the variability associated with the mean. A small standard error is an indication that the sample mean is close to 296EXERCISE 27 • Calculating Descriptive StatisticsCopyright © 2017, Elsevier Inc. All rights reserved. the population mean, while a large standard error yields less certainty that the sample mean approximates the population mean. The formula for the standard error of the mean ( sX ) is: ssnX= Using the biologic medication duration data, we know that the standard deviation of biologic duration is s = 1.22. Therefore, the standard error of the mean for biologic dura-tion is computed as follows: sX=12210. sX=039. The standard error of the mean for biologic duration is 0.39. Confi dence Intervals To determine how closely the sample mean approximates the population mean, the stan-dard error of the mean is used to build a confi dence interval. For that matter, a confi dence interval can be created for many statistics, such as a mean, proportion, and odds ratio. To build a confi dence interval around a statistic, you must have the standard error value and the t value to adjust the standard error. The degrees of freedom ( df ) to use to compute a confi dence interval is df = n − 1. To compute the confi dence interval for a mean, the lower and upper limits of that interval are created by multiplying the sX by the t statistic, where df = n − 1. For a 95% confi dence interval, the t value should be selected at α = 0.05. For a 99% confi dence inter-val, the t value should be selected at α = 0.01. Using the biologic medication duration data, we know that the standard error of the mean duration of biologic medication use is sX=039. . The mean duration of biologic medication use is 1.89. Therefore, the 95% confi dence interval for the mean duration of biologic medication use is computed as follows: XstX± 189039226…±()() 189088..± As referenced in Appendix A , the t value required for the 95% confi dence interval with df = 9 is 2.26. The computation above results in a lower limit of 1.01 and an upper limit of 2.77. This means that our confi dence interval of 1.01 to 2.77 estimates the population mean duration of biologic use with 95% confi dence ( Kline, 2004 ). Technically and math-ematically, it means that if we computed the mean duration of biologic medication use on an infi nite number of veterans, exactly 95% of the intervals would contain the true population mean, and 5% would not contain the population mean ( Gliner, Morgan, & Leech, 2009 ). If we were to compute a 99% confi dence interval, we would require the t value that is referenced at α = 0.01. Therefore, the 99% confi dence interval for the mean duration of biologic medication use is computed as follows: 189039325…±()() 189127..± 297Calculating Descriptive Statistics • EXERCISE 27Copyright © 2017, Elsevier Inc. All rights reserved. As referenced in Appendix A , the t value required for the 99% confi dence interval with df = 9 is 3.25. The computation above results in a lower limit of 0.62 and an upper limit of 3.16. This means that our confi dence interval of 0.62 to 3.16 estimates the population mean duration of biologic use with 99% confi dence. Degrees of Freedom The concept of degrees of freedom ( df ) was used in reference to computing a confi dence interval. For any statistical computation, degrees of freedom are the number of inde-pendent pieces of information that are free to vary in order to estimate another piece of information ( Zar, 2010 ). In the case of the confi dence interval, the degrees of freedom are n − 1. This means that there are n − 1 independent observations in the sample that are free to vary (to be any value) to estimate the lower and upper limits of the confi dence interval. SPSS COMPUTATIONS A retrospective descriptive study examined the duration of biologic use from veterans with rheumatoid arthritis ( Tran et al., 2009 ). The values in Table 27-4 were extracted from a larger sample of veterans who had a history of biologic medication use (e.g., infl iximab [Remicade], etanercept [Enbrel]). Table 27-4 contains simulated demographic data col-lected from 10 veterans who had stopped taking biologic medications. Age at study enroll-ment, duration of biologic use, race/ethnicity, gender (F = female), tobacco use (F = former use, C = current use, N = never used), primary diagnosis (3 = irritabl
e bowel syndrome, 4 = psoriatic arthritis, 5 = rheumatoid arthritis, 6 = reactive arthritis), and type of biologic medication used were among the study variables examined. TABLE 27-4 DEMOGRAPHIC VARIABLES OF VETERANS WITH RHEUMATOID ARTHRITIS Patient ID Duration (yrs) Age Race/Ethnicity Gender Tobacco Diagnosis Biologic 10.142CaucasianFF5Infl iximab20.341Black, not of Hispanic OriginFF5Etanercept31.356CaucasianFN5Infl iximab41.578CaucasianFF3Infl iximab51.586Black, not of Hispanic OriginFF4Etanercept62.049CaucasianFF6Etanercept72.282CaucasianFF5Infl iximab83.035CaucasianFN3Infl iximab93.059Black, not of Hispanic OriginFC3Infl iximab104.037CaucasianFF5Etanercept 298EXERCISE 27 • Calculating Descriptive StatisticsCopyright © 2017, Elsevier Inc. All rights reserved. This is how our data set looks in SPSS. Step 1: For a nominal variable, the appropriate descriptive statistics are frequencies and percentages. From the “Analyze” menu, choose “Descriptive Statistics” and “Frequen-cies.” Move “Race/Ethnicity and Gender” over to the right. Click “OK.” 299Calculating Descriptive Statistics • EXERCISE 27Copyright © 2017, Elsevier Inc. All rights reserved. Step 2: For a continuous variable, the appropriate descriptive statistics are means and standard deviations. From the “Analyze” menu, choose “Descriptive Statistics” and “Explore.” Move “Duration” over to the right. Click “OK.” INTERPRETATION OF SPSS OUTPUT The following tables are generated from SPSS. The fi rst set of tables (from the fi rst set of SPSS commands in Step 1) contains the frequencies of race/ethnicity and gender. Most (70%) were Caucasian, and 100% were female. Frequencies Frequency Table RaceEthnicityFrequencyPercentValid PercentCumulative PercentValidBlack, not of Hispanic Origin330.030.030.0Caucasian770.070.0100.0Total10100.0100.0GenderFrequencyPercentValid PercentCumulative PercentValidF10100.0100.0100.0 300EXERCISE 27 • Calculating Descriptive StatisticsCopyright © 2017, Elsevier Inc. All rights reserved. DescriptivesStatisticStd. ErrorDuration of Biologic Use1.890.3860Lower Bound1.017Upper Bound2.7631.8721.7501.4901.2206.14.03.92.0.159.687-.4371.334Mean95% Confidence Interval for Mean 5% Trimmed MeanMedianVarianceStd. DeviationMinimumMaximumRangeInterquartile RangeSkewnessKurtosis The second set of output (from the second set of SPSS commands in Step 2) contains the descriptive statistics for “Duration,” including the mean, s (standard deviation), SE , 95% confi dence interval for the mean, median, variance, minimum value, maximum value, range, and skewness and kurtosis statistics. As shown in the output, mean number of years for duration is 1.89, and the SD is 1.22. The 95% CI is 1.02–2.76. Explore 301Calculating Descriptive Statistics • EXERCISE 27Copyright © 2017, Elsevier Inc. All rights reserved. STUDY QUESTIONS 1. Defi ne mean. 2. What does this symbol, s 2 , represent? 3. Defi ne outlier. 4. Are there any outliers among the values representing duration of biologic use? 5. How would you interpret the 95% confi dence interval for the mean of duration of biologic use? 6. What percentage of patients were Black, not of Hispanic origin? 7. Can you compute the variance for duration of biologic use by using the information presented in the SPSS output above?
Copyright © 2017, Elsevier Inc. All rights reserved. 305 Questions to Be Graded
EXERCISE 27 Follow your instructor ’ s directions to submit your answers to the following questions for grading. Your instructor may ask you to write your answers below and submit them as a hard copy for grading. Alternatively, your instructor may ask you to use the space below for notes and submit your answers online at http://evolve.elsevier.com/Grove/statistics/ under “
Name: _______________________________________________________
Class: _____________________
Date:_____________________
Questions to Be Graded.”
Copyright © 2017, Elsevier Inc. All rights reserved. 307 Calculating Pearson Product-Moment Correlation Coeffi cient Correlational analyses identify associations between two variables. There are many differ-ent kinds of statistics that yield a measure of correlation. All of these statistics address a research question or hypothesis that involves an association or relationship. Examples of research questions that are answered with correlation statistics are, “Is there an associa-tion between weight loss and depression?” and “Is there a relationship between patient satisfaction and health status?” A hypothesis is developed to identify the nature (positive or negative) of the relationship between the variables being studied. The Pearson product-moment correlation was the fi rst of the correlation measures developed and is the most commonly used. As is explained in Exercise 13 , this coeffi cient (statistic) is represented by the letter r , and the value of r is always between − 1.00 and + 1.00. A value of zero indicates no relationship between the two variables. A positive cor-relation indicates that higher values of x are associated with higher values of y . A negative or inverse correlation indicates that higher values of x are associated with lower values of y . The r value is indicative of the slope of the line (called a regression line) that can be drawn through a standard scatterplot of the two variables (see Exercise 11 ). The strengths of different relationships are identifi ed in Table 28-1 ( Cohen, 1988 ). EXERCISE 28 TABLE 28-1 STRENGTH OF ASSOCIATION FOR PEARSON r Strength of Association Positive Association Negative Association Weak association0.00 to < 0.300.00 to < − 0.30Moderate association0.30 to 0.49 − 0.49 to − 0.30Strong association0.50 or greater − 1.00 to − 0.50 RESEARCH DESIGNS APPROPRIATE FOR THE PEARSON r Research designs that may utilize the Pearson r include any associational design ( Gliner, Morgan, & Leech, 2009 ). The variables involved in the design are attributional, meaning the variables are characteristics of the participant, such as health status, blood pressure, gender, diagnosis, or ethnicity. Regardless of the nature of variables, the variables submit-ted to a Pearson correlation must be measured as continuous or at the interval or ratio level. STATISTICAL FORMULA AND ASSUMPTIONS Use of the Pearson correlation involves the following assumptions: 1. Interval or ratio measurement of both variables (e.g., age, income, blood pressure, cholesterol levels). However, if the variables are measured with a Likert scale, and the frequency distribution is approximately normally distributed, these data are
Core Competencies of Nurse Practtioner and Nurse Educator.
UncategorizedHi, need to submit a 750 words essay on the topic Core Competencies of Nurse Practtioner and Nurse Educator.It is therefore necessary to compare and contrast the various roles core competencies between the nurse practitioners and the nurse educators.Both the nursing educator and the nurse practitioners pursue a continuous quality improvement role in the nursing field. For instance, the role of Nurse Educators follows a multidimensional scope that requires an ever enduring commitment (National League for Nursing [NLN], 2005, p. 6). In order to improve the general quality in the nursing field, the Nurse Educators constantly engage in practices that would enhance their career knowledge and participate in the professional development activities such as seminars that would increase their effectiveness. The educators usually use the feedback gained from the nursing students to improve the effectiveness of their roles. Similarly, the nursing practitioners engage in regular activities that tend to improve the nursing field. For instance, they carry out a critical analysis of data and evidence while integrating knowledge from various disciplines with the aim of improving the quality of nursing.Both the nurse educators and the nurse practitioners make use of various assessment and evaluation strategies during their nursing roles. Ideally, nurse educators make wide use of existing literature in the medical sphere to develop evidence (NLN, 2005, p. 3), and evaluate different situations that emerge during their day to day practices. They also incorporate different strategies such as offering various assessment and evaluation tests to determine and review the level of competence among the learners. In comparison, the nursing practitioners incorporate the use of various clinical investigation strategies through the use of patients’ electronic databases such as health records to carry out an analysis of their patients (The National Organization of Nurse
>Computer Science homework help
UncategorizedStudy Questions:
disadvantage of a code of ethics?
example.
obtain one?
one?
Exercises :
Childhood Obesity
Nursing HomeworksLiterature Review (Childhood Obesity)
Details:
While the implementation plan prepares students to apply their research to the problem or issue they have identified for their capstone change proposal project, the literature review enables students to map out and move into the active planning and development stages of the project.
A literature review analyzes how current research supports the PICOT, as well as identifies what is known and what is not known in the evidence. Students will use the information from the earlier PICOT Statement Paper and Literature Evaluation Table assignments to develop a 750-1,000 word review that includes the following sections:
Prepare this assignment according to the guidelines found in the APA Style Guide, located in the Student Success Center. An abstract is not required.
This assignment uses a rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion.
You are required to submit this assignment to Turnitin. Please refer to the directions in the Student Success Center
What are some strengths and weaknesses of the proposal?
UncategorizedExecutive Summary – Nursing
In this assignment, you will select a program, quality improvement initiative, or other project from your place of employment. Assume you are presenting this program to the board for approval of funding. Write an executive summary (850-1,000 words) to present to the board, from which they will make their decision to fund your program or project. The summary should include:
Share your written proposal with your manager, supervisor or other colleague in a formal leadership position within a health care organization. Request their feedback using the following questions as prompts:
Submit the written proposal along with the “Executive Summary Feedback Form.”
Prepare this assignment according to the APA guidelines found in the APA Style Guide, located in the Student Success Center. An abstract is not required.
Needs to be 100% PLAGARISM FREE
Computer Science homework help
UncategorizedDescribe How Course work 1 and Course work 2 points are related to Job description points(Co-Relate Course work 1 and course work 2 with Job Description ( Write 40 points) -à 800 words
Course work 1
Course work 2
Job Description points
Essay
UncategorizedNursing homework help
UncategorizedIf you talk about a possible poor health outcome, do you believe that outcome will occur? Do you believe eye contact and personal contact should be avoided?
You would have a difficult time practicing as a nurse if you believed these to be true. But they are very real beliefs in some cultures.
Differences in cultural beliefs, subcultures, religion, ethnic customs, dietary customs, language, and a host of other factors contribute to the complex environment that surrounds global healthcare issues. Failure to understand and account for these differences can create a gulf between practitioners and the public they serve.
In this Assignment, you will examine a global health issue and consider the approach to this issue by the United States and by one other country.
To Prepare:
The Assignment: (1- to 2-page Global Health Comparison Matrix; 1-page Plan for Social Change)
Part 1: Global Health Comparison Matrix
Focusing on the country you selected and the U.S., complete the Global Health Comparison Matrix. Be sure to address the following:
Part 2: A Plan for Social Change
Reflect on the global health policy comparison and analysis you conducted in Part 1 of the Assignment and the impact that global health issues may have on the world, the U.S., your community, as well as your practice as a nurse leader.
In a 1-page response, create a plan for social change that incorporates a global perspective or lens into your local practice and role as a nurse leader.
HOW WOULD YOU DESCRIBE THE DUAL PURPOSE OF FINANCIAL MANAGEMENT IN HEALTH CARE TODAY?
Uncategorized1. The demand for healthcare administrators has gone up substantially over recent decades, as has the salary range for the position. The job of managing healthcare facilities has become more challenging and more important than ever before. How would you describe the dual purpose of financial management in health care today?
Your response must be at least 200 words in length.
2. Consider the Ridgeland Heights Medical Center (RHMC) actions presented on pages 11 and 12 of your textbook. The medical center is trying to counter a dwindling inpatient census at the facility. What do you think of the three strategies proposed here? Can you think of any other strategies that RHMC should consider here?
Your response must be at least 200 words in length.
3.Consider the big picture of our American economy with health care being an important part of it. How does the large share of American gross domestic product (GDP) that we spend on health care each year impact our nation’s economy overall? What other areas of our economy may currently be suffering because of so much medical spending? In your view, can America continue to spend this much money on health care without sacrificing the quality of American life in other areas? Why or why not? Please support your position.
Your response must be at least 200 words in length.
4. Preparing for the annual healthcare facility audit is a major undertaking. It consumes a considerable portion of the CFO’s time and energy each year.
Review the items presented in Exhibit
1.3 of your textbook, and consider which of these areas might be most susceptible to adjustments by the auditors. What can the facility’s management team do to reduce audit adjustments in these areas?
Your response must be at least 200 words in length.
Berger, S. (2014). Fundamentals of health care financial management: A practical guide to fiscal issues and activities (4th ed.). San Francisco, CA: Jossey-Bass.
Operations Management homework help
UncategorizedVolkswagen’s Diesel Deception
Between 2009 and 2015, Volkswagen manufactured and marketed clean diesel automobiles that were designed to provide high performance without the polluting emissions commonly associated with diesel engines. These turbocharged direct injection (TDI) clean diesel vehicles were very popular in Western Europe, where environmentally conscious or “green” consumers found they could have fast, responsive cars that seemed to sip diesel. On September 18, 2015, the U.S. Environmental Protection Agency announced that it was suing the Volkswagen Group for selling over 482,000 diesel Volkswagens and Audis with software “defeat devices” that caused the vehicles to be far more polluting than expected during normal driving. The vehicles would be recalled for repairs.
The Volkswagen group manufactures and markets automobiles, vans, and trucks around the world in a variety of brands. The Volkswagen marque is the company’s most popular brand. Prestige brands such as Audi, Porsche, and Bentley have significantly lower sales volumes, but much higher margins.
In May 2016, VW reported a quarterly profit on Volkswagen-branded cars of only €73 million for the first quarter of 2016, a significant decrease from the €514 million profit it posted in the first quarter of 2015. Much of the profits were erased by dealer incentives and consumer rebates that supported sales of gasoline-powered Volkswagen-branded vehicles. As a whole, Volkswagen Group posted a quarterly profit of €2.4 billion; Audi and Porsche accounted for two-thirds of that profit.
In the following weeks, the U.S. and German investigators swarmed into Volkswagen offices, including the company’s international headquarters in Wolfsburg, Germany, and the corporate offices of the company’s U.S. subsidiary, Volkswagen Group of America (VWoA).
Volkswagen’s History and Culture
Founded in 1937, Volkswagen was intended to produce a “people’s car,” designed by Ferdinand Porsche, for the citizens of the Third Reich. The town of Wolfsburg was established in 1938 for VW employees.
VW’s international success helped spur the recovery of West Germany.
VW opened a U.S.$1 billion manufacturing facility in Chattanooga, Tennessee, in 2008. To secure Volkswagen’s commitment, the state of Tennessee offered Volkswagen a package of tax incentives that grew to almost $U.S.1 billion by 2015.
Porsche took over VW in 2009
after decades of cooperation and conflict between the Porsche family and Volkswagen management.
own the rest. A network of powerful German labor unions participate in management decisions, as compensation for funds that were confiscated after World War II.
Volkswagen had a fleet of corporate jets, including an Airbus A319; VW owned over 100 factories in 31 countries
across 12 different brands (see Figure 1), and the Volkswagen air services subsidiary that flew company executives as needed.
U.S. distribution of the VW Beetle, a modified version of the original “people’s car” design, began in 1949. The company founded Volkswagen Group of America (VWoA) in 1955, and created the Audi marque in 1969.
In 2015, Volkswagen was tightly controlled by the billionaire descendants of Ferdinand Porsche, who own 50 percent. Independent shareholders own about 12 percent of the stock. The north German state government of Lower Saxony, where Wolfsburg is located, and Qatar’s sovereign wealth fund
Figure 1
Volkswagen Automotive Brands
Volkswagen





CEO Müller, announced that the company might have to sell the corporate Airbus A319 corporate jet, among other major changes. The company set aside €6.7 billion to cover the costs of repairing faulty diesel cars, including the option of repurchasing some diesel vehicles from consumers.
While Volkswagen planned to keep its 12 different brands, plans for a €100 million corporate design center intended for Wolfsburg were scrapped.
In an NPR interview recorded during a visit to Detroit, Müller apologized for the scandal, and promised to “deliver appropriate solutions to [VWoA] customers.”
Earlier in the interview, Müller claimed that Volkswagen did not lie to the American public:

Later in April, Müller personally apologized to President Barack Obama for the emissions scandal.
The following month, Volkswagen challenged the U.S. Department of Justice’s authority in the matter, claiming that the affected cars were sold not by the European parent companies, but by local businesses in the United States.
While Volkswagen’s European operations designed the automobiles and their emissions systems, many of the affected diesel automobiles were manufactured in Volkswagen’s Chattanooga facility.
Audi
Bentley
Bugatti
Lamborghini
Ducati
MAN
Porsche
Scania
SEAT
ŠKODA
Sources: Volkswagen, “Brands and products,” http://www.volkswagenag.com/content/vwcorp/content/en/brands_and_products.html. Accessed June 8, 2016.
“Be aggressive at all times” was how one Volkswagen executive described the company’s confident approach to global competition. Volkswagen chief executives including Ferdinand Piëch, a grandson of Ferdinand Porsche, and Piëch’s successor, Martin Winterkorn, heavily promoted clean diesel technology as part of the company’s environmental commitment. He had promised that Volkswagen would surpass Toyota to become the world’s largest automobile manufacturer, and that clean diesel vehicles, not hybrids, were the key to global domination.
Soon after the EPA recall announcement in September 2015, Winterkorn resigned. In December 2015, the new CEO, Mathias Müller, and the chairman of Volkswagen’s supervisory board announced in a press conference that Volkswagen employees had created the emissions test scheme in 2005, after realizing the company’s diesel technology could not pass U.S. environmental standards.
In January 2016, members of the Porsche and Piëch families, who owned half of Volkswagen, made public statements endorsing Müller after his controversial visit to the United States.
Frankly spoken, it was a technical problem. We made a default, we had a … not the right interpretation of the American law. And we had some targets for our technical engineers, and they solved this problem and reached targets with some software solutions which haven’t been compatible to the American law. That is the thing. And the other question you mentioned — it was an ethical problem? I cannot understand why you say that.
NPR interviewed Müller the next day, and the CEO attempted to mitigate the damage of his previous statements:
We have to accept that the problem was not created three months ago. It was created, let me say, 10 years ago. … We had the wrong reaction when we got information year by year from the EPA and from the [California Air Resources Board]…. We have to apologize for that, and we’ll do our utmost to do things right for the future.
In April 2016, Volkswagen agreed to repurchase almost all the affected 2 L diesel vehicles in the United States, and further agreed to provide owners with additional compensation. This buyback program was estimated to cost U.S.$7 billion, but it did not include 3-liter diesel vehicles from Audi and Porsche.
Cheating the System
The emissions control systems used in the affected Volkswagen, Audi, and Porsche cars included software designed by Volkswagen engineers to deceive or cheat emissions tests. Automakers often use common body frames, engines, components, and software across multiple brands to reduce duplication and costs. Emissions tests usually involve running at several different speeds while the driving wheels of the vehicle rest on a treadmill. When testing a front wheel drive model, the back wheels remain stationary.
To test an all-wheel or four-wheel drive vehicle, treadmills are placed under both axles. The vehicle is connected to a dynamometer, a device that measures the torque or power of an engine.
Sensors attached to the vehicle’s exhaust pipe measure the vehicle’s emissions.
of VW diesel vehicles was activated only when the following conditions were met:
Under normal driving conditions, the vehicle’s braking and stability control systems might take over the vehicle because a lack of steering column movement; this is one indication of a loss of vehicular control, such as a skid. Therefore, the test or “dyno” mode performed a useful function by allowing the vehicle to be driven normally on a dynamometer.
These components were programmed by VW engineers, using proprietary code developed within the company. The U.S. Environmental Protection Agency (EPA) performs emission testing on only about 10 to 15 percent of new cars each year, and relies on automobile manufacturers to certify the emissions performance of its vehicles. According to Columbia University law professor Eben Moglen, “[s]oftware is in everything … proprietary software is an unsafe building material. You can’t inspect it.”
In the summer of 2015, the EPA announced that it opposed inspection of proprietary automobile software, supporting automobile manufacturers who claimed that people might try to reprogram their vehicles systems to increase performance in unsafe ways.
Diesel engines produce emissions that include nitrogen oxides and ozone. These are chemical compounds that, according to the EPA, can cause “adverse respiratory effects including airway inflammation in healthy people and increased respiratory symptoms in people with asthma,” especially inside vehicles and near roads.
Emissions control systems are installed in vehicles to reduce the production and/or emissions of compounds. Volkswagen started selling diesel cars in the United States in 1977, taking advantage of increased consumer interest in diesel fuel economy.
This method used a solution of 70 percent water and 30 percent urea to convert emissions to nitrogen, oxygen, water, and carbon dioxide .
A computerized controller sprayed an optimal amount of liquid as the emissions passed through the exhaust system. The liquid is sold in the United States as AdBlue.
This system required drivers to have the urea tank refilled periodically at a service center.
Some industry experts claimed that traps were less effective than urea-based systems.
during the “on road” mode that was used for normal operation of the vehicle.
This boosted the vehicle’s overall speed and acceleration but reduced fuel economy while increasing emissions by a factor of 40. VW’s diesel emissions control systems also increased the price of each vehicle between U.S.$5,000 and U.S.$8,000. 
The test or “dyno” mode used in the engine control unit (ECU)
the steering wheel was not being moved;
the vehicle was operating at a constant speed; and
the atmospheric barometric pressure was steady.
In April 2016, German newspapers and television broadcasts revealed that an early version of this “dyno” mode plan was found in a 2006 PowerPoint presentation that had been prepared by a German Volkswagen executive.
The ECU, braking, and stability control modules for VW diesel vehicles were manufactured by Bosch, a major manufacturer of automotive components.
Volkswagen engineers took advantage of “dyno” mode by programming the ECU to shift the vehicle’s emissions control systems into a full power mode that significantly reduced emissions, but used significantly more fuel to operate.
One form of Volkswagen’s diesel emissions control systems used a technology called selective catalytic reduction (SCR).
A different system was installed in the Golf and other small cars, partly because the SCR system required more space than was available. This version did not require refills; it used a nitrogen oxide trap located before the exhaust valve and catalytic converter to capture and reduce emissions. The vehicle used about 4 percent more diesel fuel when the trap was operating at full power.
VW engineers changed the vehicle’s software to turn off the nitrogen oxide trap or catalytic scrubbers
Catching the Cheat
Government reliance upon manufacturer testing can be problematic. According to Zeynep Tufekci, an assistant professor at the University of North Carolina, smart cars and other smart devices should be tested in realistic conditions, not in a controlled environment. Companies should not be able to use copyright and intellectual property laws to restrict inspection of proprietary software, especially when the code is used in important processes such as voting and public safety. Developers should also include logs and audit trails in their software, to help document its operation.

it was clear that the VW diesel vehicles produced much higher levels of emissions during the WVU road tests than were seen in dynamometer tests performed by the California Air Resources Board.
ICCT posted the findings to its Web site in May 2014 and notified the EPA. Investigations by CARB and the EPA led to the EPA’s September 2015 announcement. The regulators refused to certify VW’s 2016 diesel vehicles for sale, leaving VW and its North American dealers with billions of dollars in new car inventory that could not legally be sold.
On September 21, VW’s stock price dropped 23 percent.
(See Figure 2)
Volkswagen’s “dyno” or cheat mode was discovered in 2014 by researchers at West Virginia University (WVU) who measured the emissions of VW diesel vehicles during long-distance driving tests. One vehicle had a nitrogen oxide trap, while two other vehicles used urea-based SCR systems. WVU was contracted by an NGO, the International Council on Clean Transportation (ICCT), to perform these tests after European investigators noticed discrepancies in their emissions tests of VW and BMW diesel vehicles. U.S. emissions testing is more stringent than European testing, and California automobile emissions standards are more stringent that Federal standards.
While the WVU report only mentioned Volkswagen once,
Over 11 million diesel vehicles worldwide had engines that were affected by VW’s unorthodox technology; 660,000 were sold in the United States. The EPA ordered a recall of over a dozen diesel-powered models.
Figure 2
Diesel Automobiles Recalled by the EPA
Audi A3 (2010–2015)
More extensive modifications were needed for SCR models.
Audi A6 Quattro (2014–2016)
Audi A7 Quattro (2014–2016)
Audi A8 (2014–2016)
Audi A8L (2014–2016)
Audi Q5 (2014–2016)
Audi Q7 (2009–2016)
Porsche Cayenne (2013–2016)
Volkswagen Beetle (2012–2015)
Volkswagen Beetle Convertible (2012–2015)
Volkswagen Golf (2010–2015)
Volkswagen Golf SportWagen (2015)
Volkswagen Jetta (2009–2015)
Volkswagen Passat (2012–2015)
Volkswagen Touareg (2009–2016)
U.S. consumers were assured that they could continue to drive their affected vehicles while the recall was being organized. For 2015 and 2016 model year vehicles that used the nitrogen oxide trap, the repair was most likely a software patch, installed by a dealer.
Marketing the Clean Diesel
Between 2009 and 2015, VWoA bought significant amounts of advertising for diesel vehicles in the United States, which was one of the Volkswagen’s most profitable markets. Diesel vehicle sales accounted for about 5 percent of the North American market,
but about 25 percent of VW’s sales were in the diesel category.
While VW is a market leader in China, diesel engines are unpopular there. There are stringent emissions control rules in European countries, especially in cities such as Paris, but diesel vehicles held a 50 percent market share in Western Europe.
Between January and September 2015, VW spent $77 million on U.S. television advertising for diesel vehicles, which was about 45 percent of the company’s total in that market.
Another 2015 VW advertisement showed precocious boys who cause chaos in a convenience store, to the sounds of Waylon Jennings’ country music song “Mommas, don’t let your babies grow up to be cowboys.” Their mother notices the boys are missing while she refuels their vehicle outside. A VW diesel Jetta drives by, and the viewers see the mother who is driving that vehicle while her three boys sit quietly.
After the EPA’s September 18 announcement, VWoA paused its national advertising through October 11, including the company’s non-diesel vehicles.
Advertising for gasoline and electric vehicles resumed slowly, as VWoA managers and ad agencies scrambled to create new campaigns and content.
VW diesel ads used humor to emphasize the high performance and clean emissions of its diesel cars. In a 2015 campaign, three older women discussed the drawbacks of diesel cars while being driven in a VW diesel vehicle. The series, titled “Old Wives Tales,” focused on consumer complaints regarding diesel cars, including sluggish performance, loud noise, and the scarcity of diesel fuel. The passengers in the commercials were always surprised when their VW vehicle overcame the problems they discussed.
Another benefit that VW and Audi emphasized in their marketing was decreased diesel fuel consumption. During the 2010 Super Bowl, Audi ran a television advertisement for its A3 TDI hatchback that showed the car as the only vehicle that could pass through a fictional “green police” checkpoint. For the 2015 diesel Jetta, VW aired a television advertisement that claimed “When you’re driving, things aren’t always what they appear to be.” The advertisement only aired a few times before it was pulled in September 2015.
Government Investigations
Over 450 VW and third-party investigators conducted a probe during late 2015 and early 2016, coordinated by the accounting firm Deloitte and a U.S.-based law firm, Jones Day. There were many obstacles in VW’s internal reports and documentation on the affected diesel systems. VW engineers used dozens of code words such as “acoustical software” when referring to the emission control countermeasures. The investigators turned their focus on about 20 VW employees. Many persons interviewed during the investigation were “reluctant to provide insight because they were afraid of the legal consequences.”
The German employees under investigation were not executives. However, the idea that VW executives were unaware of the diesel defeat designs “just doesn’t’ pass the launch test,” to quote John German, a former EPA official who became a senior fellow at ICCT and helped begin that group’s investigation of VW in 2013.
French authorities launched their own investigation into intentional fraud by VW.
Margo Oge, who was director of the EPA Office of Transportation and Air Quality in 2011, revealed that German Volkswagen executives had pressured the EPA for “special fuel economy credits for environmental friendliness” that were equivalent to those awarded to zero-emissions vehicles such as electric cars.
Oge perceived that the German Volkswagen executives believed their diesel technology was superior to electric motors: “I never had a problem dealing with the Americans. The U.S. Volkswagen people would always come and apologize to us after meeting with the Germans. My sense was that things were being dictated by Germany.”
Volkswagen acknowledged that there were at least 50 other whistleblowers.
Investors criticized Volkswagen’s executive compensation practices. Billionaire investor Christopher Hohn of TCI Fund Management wrote in a letter to Volkswagen’s executive supervisory boards that top management compensation appeared to be “excessive,” and was “unlinked to transparent metrics and paid in cash with no vesting or deferral, and has encouraged aggressive management behavior, contributing to the diesel scandal.”

German law exempts companies from being prosecuted for crimes; the German Penal Code or Strafgesetzbuch (StGB) stipulates that only individuals can be held liable for criminal acts. Six Volkswagen employees were under investigation for charges of corporate tax evasion. In the United States, Senators Ron Wyden (D-OR) and Orrin Hatch (R-UT) accused Volkswagen and VWAG of accepting as much as U.S.$51 million in tax incentive credits for diesel vehicles.
Whistleblowers also came forward. David Donovan, who worked at VWoA in electronic discovery and information management, claims he was fired in December 2015 after he reported his concerns to the company’s legal department.
The legal responsibilities of Volkswagen and VWoA executives is also of concern. CIOs are responsible for finding and archiving data, messages, and other corporate information. In September 2015, U.S. Deputy Attorney General Sally Quillian Yates announced that the U.S. Department of Justice planned to increase its efforts to prosecute corporate executives for their involvement in corporate misconduct.
Michael Schrage, a research fellow at MIT’s Center for Digital Business, noted that Volkswagen had brought the crisis on itself by failing to acknowledge societal and technological change. The emergence of the Internet of Things (IoT), in which products are embedded with sensors and smart systems, coupled with societal acceptance of social media, made the revelation of corporate deception far more likely than ever before.
Volkswagen Diesel Timeline
2005: Volkswagen executives make diesel the focus of the company’s U.S. marketing efforts. A small group of Volkswagen engineers and employees in Germany decide to find ways to cheat emissions testing.
2006: A Volkswagen executive prepares a PowerPoint presentation that describes how to cheat U.S. emissions testing.
2007: Martin Winterkorn becomes CEO of Volkswagen.
2008: Volkswagen opens a U.S.$1 billion production facility in Chattanooga, Tennessee, in return for U.S.$577 million in state tax incentives.
2009: Volkswagen and Porsche merge. Diesel vehicles with the altered software go on sale. VWAG launches diesel vehicle marketing campaign in the United States.
2011: Volkswagen opens a new manufacturing facility in Chattanooga, Tennessee.
2014: Volkswagen decides to expand the Chattanooga plant instead of moving production to Puebla, Mexico, based on an additional $U.S.230 million in state tax incentives.
September 18, 2015: The EPA orders Volkswagen to recall 486,000 because they used software designed to cheat emissions tests.
September 22, 2015: Volkswagen reveals that 11 million diesel cars worldwide used the affected software.
September 25, 2015: Winterkorn resigns as CEO. Matthias Müller, the head of the company’s Porsche unit, is named as his replacement.
November 2, 2015: The EPA discovers cheating software on more cars than previously disclosed and, for the first time, also finds the illegal software in a Porsche model.
November 3, 2015: Volkswagen announces that it understated emissions of gasoline powered cars in Europe.
November 9, 2015: VWoA offers $1,000 gift cards to owners of affected diesel vehicles in the United States. Volkswagen later states that this offer does not apply to owners in the EU.
November 11, 2015: Volkswagen halts production of the 2016 diesel Passat at its Chattanooga manufacturing facility.
November 25, 2015: Volkswagen announces that a set of simple repairs could bring the affected diesel cars in to compliance with European standards.
December 10, 2015: The chairman and CEO of VW presented the results of an internal inquiry, revealing that the decision by employees to cheat on emissions tests was made in 2005.
January 10, 2016: CEO Müller claims in a radio interview that the emissions scandal was a technical issue, not an ethical concern. He changes his statement the next day.
March 2, 2016: Volkswagen reveals that former CEO Winterkorn received a memo on problems with diesel emissions in Volkswagen vehicles in May 2014, but did not indicate if Winterkorn had ever read the document.
April 22, 2016: Volkswagen agrees to fix or buyback almost all affected diesel cars in the United States.
April 24, 2016: CEO Müller personally apologizes to President Barack Obama for the emissions scandal, during a state dinner hosted by German Chancellor Angela Merkel.
May 24, 2016: Volkswagen claims that the U.S. Government has no jurisdiction over the emissions scandal. The company will continue its own internal investigation.
Questions for Discussion