WHAT IS STATISTICS?

 



WHAT IS STATISTICS?

Statistics has both plural and singular senses. 

               In its plural sense,  it refers to numerical facts that are systematically collected and analyzed, as in the responses in a questionnaire.

In the singular sense, it refers to the scientific discipline consisting of theory and methods for processing numerical information that one can use when making decisions in the face of uncertainty.

In both senses, the term Statistics refers to quantities computed from numerical information as such, statisticians are involved with methods of data collection, data summarization or presentation, data analyses as well as communicating the results of its analyses or data interpretation hence, Statistics may also be defined as the branch of science that deals with the collection, presentation, analysis, and interpretation of quantitative data.



DATA MANAGEMENT

    Data collection – refers to the process of gathering or obtaining information, measurements, observations, etc.

    Data presentation – refers to the organization of collected data into meaningful forms such as the use of tables, graphs, or charts so that logical and statistical conclusions can be derived from the collected measurements.

    Data analysis – pertains to the process of extracting from the given data meaningful or relevant information from which numerical description can be formulated.

    Data interpretation – refers to the task of drawing conclusions from the analyzed data which normally involves the formulation of forecasts  or predictions about larger groups based on the data collected from small groups


BRANCHES OF STATISTICS

    Descriptive Statistics – is concerned with data organization and presentation without drawing conclusions or inferences beyond the data.
    Inferential Statistics – is concerned with drawing generalizations beyond the data collected provided that the data collected is a part of a large set of items.
BASIC TERMS

  • Universe – is the set of all entities under study
  • Variable – is the attribute of interest observable/measurable of each entity in the universe
  • Population – is the set of all possible values of the variable
  • Sample – a subset of the population
  • Parameters – are numerical measures that describe the population of interest

VARIABLE

It typically takes on more than one value of the monthly weight gain  (or loss) for an individual which changes from month to month. Data obtained from these variables may be broadly classified as either quantitative or qualitative.

QUANTITATIVE VARIABLE

    - Takes on numerical values and is otherwise known as numerical data.
    - Sizes are meaningful.     - Answer questions such as “how much” or “how many”.     - Have actual units of measure.     - Examples are height, weight, household size, number of nuts, and distance traveled.

QUALITATIVE VARIABLE
    - Do not strictly take on numerical values although numeric codes may de develop from the values.     - Answer questions “What kind”.     - Either be ordered or non-ordered.     - Examples are gender, occupation, taste, income data group, brand, and color.


DISCRETE AND CONTINUOUS DATA
  • Quantitative data may be further classified into discrete or continuous types.
  • Discrete data can be counted, assuming only a countable number of values. Examples are ages to the nearest year, number of nuts cracked, and weight of leaves
  • Continuous data are those that can be measured. Examples are height, weight, yield, IQ
MEASUREMENT
  • Is how one assigns a value to a trait, an attribute, or a characteristic such as sex, weight, religion, etc.
  • Is a basis for determining the statistical tool to use.
  • Arranged in chronological order from lowest to highest.
FORMULAS BEING USED




SUMMARY OF STATISTICAL TESTS


SAMPLE PROBLEMS:

1. A successful attack by an interceptor requires (a) the reliable operation of a computing system, (b) the transmission of correct directions, and (c) the proper functioning of the striking mechanism.  When the P[(a)] is 0.8 and (b) is assured, the overall probability of success is 0.6.  If the computing system is improved to 90% reliability, while P[(b)] is reduced to 0.8 and P[(c)] remains unchanged, what is the new overall probability of success?

2.  It is found that in manufacturing a certain article, defects of one type occur with a probability of 0.1 and another type with a probability of 0.05, also that the two defects occur independently of one another.  Calculate the probability that an article does not have both kinds of defects an article is defective a defective article has only one type of defect.


3.      Suppose that in a particular assembly line of two sections, only 75% of all items produced were found to be satisfactory.  It is claimed that the first section has a 95% satisfactory rating in its production.  What is the probability that a satisfactory item from section 1 will turn out to be a satisfactory item at the end of the assembly line?

4.      Test of Hypothesis on One Population Mean

A certain brand of milk is advertised as having a net weight of 250 grams. If the net weights of a random sample of 10 cans are: 251, 246, 250, 243, 245, 247, 249, 248, 245, and 246 grams, can it be concluded that the average net weight of the cans is less than the advertised amount?  Use a=0.05.


5.      Test of Hypothesis on One Population Mean (Note: The t-tabular value is approximated by the Z-tabular value because of the large sample size)

An electrical company claims that the lives of the light bulbs it manufactures are normally distributed with a mean of 10,000 hours and a standard deviation of 500 hours.  If a random sample of 100 bulbs produced by this company has a mean life of 9800 hours, does the data support the claim of the electrical company? Use a=0.05.


6. Z-test on One Population Proportion

A manufacturer claims that at least 98% of the equipment she supplies to a factory conforms to specifications. An examination of a sample of 200 pieces of equipment revealed that 8 were faulty.  Test her claim at a 5% level of significance.


7.  F-test for One-Way Analysis of Variance (Comparison of More Than Two Means)

Let             m1 = mean speed of machine operator 1 in accomplishing a task

                  m2 = mean speed of machine operator 2 in accomplishing a task

m3 = mean speed of machine operator 3 in accomplishing a task

m4 = mean speed of machine operator 4 in accomplishing a task

Ho: m1 = m2 = m3 = m4 The mean speed of the 4 machine operators in accomplishing a task are the same

Ha: At least one mean speed is different from the rest.

Test Procedure: F-test for One-Way ANOVA at a=0.05

Decision Rule: Reject Ho if Fc > F0.05(3,16) = 3.24, otherwise fail to reject Ho.


8. T-test on Two Population Means (obtained from independent samples)

Consider an experiment to investigate the effectiveness of cloud seeding in the artificial production of rainfall.  Two farming areas with similar past meteorological records were selected for the experiment.  One is seeded regularly throughout the year while the other is left unseeded.  The monthly precipitation in inches at the farms will be recorded for six randomly selected months.  The data gathered is given below:

                        Farm Area\Month     1          2          3          4          5          6

                        Seeded                        1.75     2.12     1.53     1.10     1.70     2.42

                        Unseeded                   1.62     1.83     1.40     0.75     1.71     2.33


Let  m1 = mean monthly precipitation for the seeded farm

m2 = mean monthly precipitation for the unseeded farm

Ho: m1 = m2 The mean monthly precipitation for the seeded farm is the same as for the unseeded farm.

Ha: m1 ¹ m2 The mean monthly precipitation for the seeded farm is not the same as for the unseeded farm.

Test Procedure: t-test for two population means from independent samples at a=0.05

Decision Rule: Reject Ho if |tc| > t0.025(10) = 2.228, otherwise fail to reject Ho.


9. Correlation and Simple Linear Regression Analysis

Theoretically, heat transfer will be related to the area at the top of the tube that is “unflooded” by the condensation of the vapor.  The data below are the unflooded ratio (x) and heat transfer enhancement (y) values recorded for the twelve integral fin tubes.

X    1.98     1.95     1.78     1.64     1.54     1.32     2.12     1.88     1.70     1.58     2.47     2.37    

Y    4.4       5.3       4.5       4.5       3.7       2.8       6.1       4.9       4.9       4.1       7.0       6.7

a)     Compute the correlation coefficient and interpret its value.

b)     Find the regression line relating heat transfer enhancement to the unflooded ratio and interpret the values of b0 and b1.


10. Chi-Square Goodness-of-Fit Test

“Snack time” a newly introduced product in the market, claims to contain popcorn, peanuts, and beans in the proportion 5:3:3.  A sample of 3000 packs was found to contain 1350 popcorn, 700 peanuts, and 950 beans.  Test the hypothesis that the company produces the product in the proportion 5:3:3. Use a=0.05.

Ho:” Snack time” contains popcorn, peanuts, and beans in the 5:3:3 proportion.

Ha:” Snack time” does not contain popcorn, peanuts, and beans in the 5:3:3 proportion.

Test Procedure:  Chi-Square Goodness-of-Fit Test at a=0.05

Decision Rule: Reject Ho if c2c > c20.05(2) = 5.991, otherwise fail to reject Ho.


11. Chi-Square Test of Independence

A random sample of 400 married men, all retired or at least in their 65s were classified according to educational attainment and number of children.

 

Number of Children

Educational Attainment

0-2

3-5

Over 5

None

12

22

26

Elementary

14

59

37

High School

40

80

34

College

26

31

19


Test the hypothesis that the number of children is independent of the level of education attained by the father.  Use a=0.05.

Ho: The number of children is independent of the level of education attained by the father.

Ha: The number of children is related to the level of education attained by the father.

Test Procedure:  Chi-Square Test of Independence at a=0.05

Decision Rule: Reject Ho if c2c > c20.05(6) = 12.592, otherwise fail to reject Ho.

Post a Comment

0 Comments