STATISTICS OF THE SCIENCES

euroasia

10 лет назад

Students are often intimidated by statistics. This brief overview is intended to place statistics in context and to provide a reference sheet for those who are trying to interpret statistics that they read. It does not attempt to show or to explain the mathematics involved. Although it is helpful if those who use statistics understand the math, the computer age has rendered that understanding unnecessary for many purposes. Practically speaking, students often simply want to know whether a particular result is significant, i.e. how likely it is that the obtained result may be attributable to something other than chance. Computer programs can easily produce numbers that allow such conclusions, if the student knows which tests to use and has an understanding of what the numbers mean. This summary is intended to help achieve that understanding [1].

As with all t tests, the one-sample t test assumes that the data be reasonably normally distributed, especially with respect to skewness. Extreme or outlying values should be carefully checked.

Before proceeding with the one-sample t test, we must verify the assumption of normality distributed data, by getting a histogram or a stemplot or a boxplot graph or by using normality test ( One Sample t test) , see the results below

The One-Sample T Test procedure:

Tests the difference between a sample mean and a known or hypothesized value
Allows you to specify the level of confidence for the difference
Produces a table of descriptive statistics for each test variable

Example: This example uses the file score.save. Use One Sample T Test to determine whether or not the mean score of math for the sample significantly differ from 75 .

Note: The data used in one sample t test is a quantitative data.

Before proceeding with the one-sample t test , we must verify the assumption of normality distributed data, by getting a histogram or a stemplot or a poxplot graph or by using normality test (One Sample t test) , see the results below which shows that the distribution of the math score is a normal distribution. Histogram plot for math [2].

Sample Mean

The sample mean (x̄) is defined as the mean or average of a limited number of samples drawn from a population of experimental data [4]. The mean can be calculated manually or with the aid of a statistical function on a scientific calculator. The latter method is the most desirable and time efficient. Despite the use of wonderful technology, it is important to understand how the value is derived.

x̄ = ( Σ x_k ) / n

Where, x_k is defined as the value of an individual experimental value, Σ x_k is the sum of all the experimental values and n, is the number of experimental values used to obtain the sum.

For example, a certain experiment yielded the following data values for lead: 10 ppm, 8 ppm, 7 ppm, 11 ppm and 16 ppm. The mean value is calculated by the following:

(10 + 8 + 7 + 11 + 16) ppm/5 = 10.4 ppm = 10 ppm (use the appropriate significant digits)

Standard Deviation

The term standard deviation (s) is used as a measure of precision. Precision describes how two or more numbers are in agreement if the exact same method or procedure is used. The standard deviation can be easily calculated using the statistical function on any calculator. But, again, understanding the mathematical derivation is important. Standard deviation is calculated by [5]:

Using the example from above, the standard deviation is:

sqrt[(10-10)²+(10-8)²+(10-7)²+(11-10)²+(16-10)²/5-1] =

=sqrt[0+4+9+1+36/4] = 3.5 = 4 ppm

The mean and standard deviation for the experiment can be expressed as: (10 ± 4) ppm.

Types of Student t-tests

A variety of student t-tests can be utilized to evaluate methods for purposes of method development or quality control. Typically, a student t-test is used to indicate the difference between two means.

Case 1: If an accepted value, such as a Certified Reference Material (CRM), is known

This type of situation is used to compare an experimental mean with a value that is obtained from a sample, where the value is certified through analytical means known as a Certified Reference Material (CRM). CRMs are put through rigorous testing procedures to validate accurate concentrations levels and therefore there is a high degree of confidence is these analytically determined concentrations. In order to compare an experimental value with a CRM value to validate a method/procedure, the following t-test is utilized:

μ = x̄ ± ts /sqrt(N)

If the equation is rearranged for the value of t:

± t = (x̄ — μ)sqrt(N)/s,

where μ is the value of the certified reference material, t is the student’s t-value, obtained for N-1 degrees of freedom, at a pre-selected confidence interval, typically a 95% confidence interval. The t-values are obtained from a table similar to the one below:

Table 1.

Values for t at N-1 Degrees of Freedom for Various Confidence Intervals (CI)

N-1	90% CI	95% CI	99% CI
1	6.314	12.706	127.32
2	2.920	4.303	14.089
3	2.353	3.182	7.453
4	2.132	2.776	5.598
5	2.015	2.571	4.773
6	1.943	2.447	4.317
7	1.895	2.365	4.029
8	1.860	2.306	3.832
9	1.833	2.262	3.690
10	1.812	2.228	3.581
∞	1.645	1.960	2.807

Using the same data set utilized earlier for lead: 10 ppm, 8 ppm, 7 ppm, 11 ppm and 16 ppm. Assume there is a CRM value of 9.43 ppm for lead in a sandy soil sample. Case 1 can be used to compare whether or not the data for the given experimental method is considered reliable and valid in contrast to the CRM value:

± t = (x̄ — μ)sqrt(N)/s

Plugging in the values: ± t = (10 ppm – 9.43 ppm)sqrt(5)/4 ppm

± t = 0.32

Consulting the t-table at the 95% confidence interval, at N-1, the t-value is 2.776. If the calculated t-value is lower than the tabulated t value at the 95% CI, there is not statistical difference. If the calculated t-value is higher than the tabulated t value at the 95% CI, there is a statistical difference. In this case, the calculated t-value is lower than the tabulated t-value and therefore the method is considered a valid procedure.

Case 2: When the accepted value is unknown

When the accepted value is unknown, a paired t-test is used to determine the validity of the experimental number. Usually, a second mean is achieved using a different instrument, another laboratory or a secondary method within the same laboratory. The experiment t-value is calculated by:

± t = ((x̄₁ — x̄₂)/s_p)(N₁N₂/N₁ + N₂)^½

where x̄₁ is the mean of one data set, x̄₂ is the mean from the second data set and s_p is called the pooled standard deviation given by:

s_{p =}(s₁²(N₁-1) + s₂²(N₂-1) + … s_k²(N_k-1)/N_T-k)^½

Where the value of k is the number of experimental means used for comparison. For example, if there are two sets of experimental means, then the value of k is 2.

Example:

Table 2.

Lead Concentrations For Two Different Method Determinations Using ICP-MS From Lab A and Lab B

Lab A Data/ppm of Pb	Lab B Data/ppm of Pb
17.1	17.2
16.2	17.1
14.6	17.0
22.8	19.0
18.7	18.3
x̄₁ = 17.9 S₁ = 3.2	x̄₂= 17.7 S₂ = 0.9

s_p= (39.7 + 3.0)/(10-2)^½ = 2.3

± t = (17.9 – 17.7)/2.3(5 x 5/5 + 5)^½ = (0.09)(1.6) = 0.1

The t-value from the table at a 95% confidence interval for 10 samples at N-1 is 2.262. Since the calculated t-value is less than the tabulated t-value at a 95% confidence interval, there is no statistical difference between the two methods. Therefore, both methods are valid procedures.

Rejection of Data Points

Often in research there are data points that seem out of range or questionable compared to the entire data set. It may be desirable to omit questionable data points from overall calculations. Questionable data that is omitted is called an outlier. However, omission of data points must be rigorously questioned using a statistical method called a Q-test. To conduct the statistical test, the value of Q is compared to its nearest data point called a. A second variable called w, is the difference between Q and its furthest data point. A Q-test is determined by the following:

Q = a/w

Considering, the original data set for lead: 10 ppm, 8 ppm, 7 ppm, 11 ppm and 16 ppm, we may consider 16 ppm as a potential outlier. To test the validity of this assumption, the Q-test will be utilized:

Q = a/w = 16-11/16-7 = 5/9 = 0.55

To assess the value of 0.55, one needs to refer to a table of rejection quotient for various confidence levels, similar to the one below:

Table 3.

Rejection Quotients (Q) at Various Confidence Intervals

# of Observations	Q₉₀	Q₉₅	Q₉₉
3	0.941	0.970	0.994
4	0.765	0.829	0.926
5	0.642	0.710	0.821
6	0.560	0.625	0.740
7	0.507	0.568	0.680
8	0.468	0.526	0.634
9	0.437	0.493	0.598
10	0.412	0.466	0.568

As there are five data points with no loss of degrees of freedom, n = 5 then Q = 0.710 at 95% CI. If a calculated Q-value is greater than the tabulated Q-value, the outlier can be rejected. However, if a calculated Q-value is less than the tabulated Q-value, then the outlier cannot be rejected as it is considered a valid data point.

Referring back to the example, the calculated Q-value = 0.55, the tabulated Q-value = 0.710 at 95% CI. Therefore, the calculated Q value < tabulated Q value, and, the value of 16 ppm cannot be rejected.

F-test: Comparison of Precision Measurement

An F-test is a simple calculation to compare the precision of two sets of measurement. The sets do not have to be obtained from the identical sample, so long as both samples are sufficiently similar that any indeterminate errors can be considered the same. An F-test can provide insights into two main areas: 1) Is method A more precise than method B? 2) Is there a difference in the precision of the two methods? To calculate an F-test, the standard deviation of the method which is assumed to be more precise is placed in the denominator, while the standard deviation of the method which is assumed to be least precise is placed in the numerator.

Using the two-piece data set for lead obtained above, the standard deviations of s1 = 3.2 ppm (least precise) and s2 = 0.9 ppm (more precise) were obtained.

F= s₁²/ s₂² = (3.2)²/(0.9)² = 10.2/0.8 = 12.8

To further analyse this resultant F-test value, reference to a table of critical values for F is essential. A similar table is found below:

Table 4.

Critical Values For F At A 5% Level

Degrees of Freedom (Denominator)	Degrees of Freedom (Numerator)
	2	3	4	5
2	19.00	19.16	19.25	19.30
3	9.55	9.28	9.12	9.01
4	6.94	6.59	6.39	6.26
5	5.79	5.41	5.19	5.05

Each data set had five degrees of freedom and hence the tabulated F-value is 5.05. In comparison to the calculated F-test, the calculated value of 12.8 is greater than the tabulated value of 5.05. Therefore, it is demonstrated that the more precise method is indeed derived from data set number two [3].

List of references:

csub.edu/~bhartsell/StatisticsReview.doc‏
iugaza.edu.ps/nbarakat/files/2010/02/part4.doc‏
Statistical tables where derived from: Douglas, A.S; West, D.M.; Holler F.J., 1992, Fundamentals of Analytical Chemistry, Sixth Edition. Saunders College Publishing, Florida, USA.
Гулай Т.А., Долгополова А.Ф., Литвин Д.Б. Анализ и оценка приоритетности разделов математических дисциплин, изучаемых студентами экономических специальностей аграрных вузов.//Вестник АПК Ставрополья. № 1.
Гулай Т. А.,Долгополова А. Ф., Литвин Д. Б. Совершенствование профессиональной подготовки экономистов через направленность содержания математического образования // Аграрная наука, творчество, рост : сб. тр. Междунар. науч.-практ. конф. (Ставрополь, 08-14 февраля 2013 г.) / СтГАУ. Ставрополь, 2013. Т. 2. С. 252-254.[schema type=»book» name=»STATISTICS OF THE SCIENCES» author=»Litvin Dmitry Borisovich, Sami Atiyah Sayyid Al-Farttoosi» publisher=»БАСАРАНОВИЧ ЕКАТЕРИНА» pubdate=»2017-05-06″ edition=»ЕВРАЗИЙСКИЙ СОЮЗ УЧЕНЫХ_ 28.02.2015_02(11)» ebook=»yes» ]