# Introductory Statistics

Almost everything we do in real life involves statistics.

Consider these as just a few...

- The snack you ate for lunch was the result of supermarket
*surveys*about people's favourite tastes. - Have you ever
*voted*for your favourite song in an online survey? - Have you ever seen a
*table*of sports results? - The chair on which you are sitting has a height determined from countless studies involving the
*mean*of lower leg lengths. - If you are an average-sized person, you would no problem buying running shoes because most retail stores stock shoe sizes within the
*inter-quartile range*. - If you have ever washed with soap, scientists have done numerous
*standard deviation*calculations on the effectiveness of the chemicals in the soap. - Have you ever seen a
*graph*of the dollar's value, bank profits, or the number of people who voted for the X-Factor winner? - Have you ever heard the term "Global Warming"? This concept results from investigations involving
*correlation*between carbon emissions and air temperatures.

## Maths Fun - The Ig-Nobel Prize!

The famous prestigious **Nobel Prize** recognizes the extraordinary achievements emanated from the brilliant minds of scientists, mathematicians,
literary figures and humanitarians each year.

As a joke, there is also an **Ig-Nobel Prize** awarded each year, however, it is awarded to a scientist who researched something bizarre!
For example, the achievements of two previous award-winners were teaching a red-footed turtle to yawn, and finding that if a person's belly is
hairy, there is more belly button lint from clothes.

Researchers must be passionate about the topics they research. What is a topic about which you have a genuine burning passion and would like to know more?

## Data Types

**Numerical Data** - Data in the form of *numbers* includes number of pets, sprint times and so on. Numerical data is of two types which are:

*Discrete data*- where the numbers are separate from each other; usually whole numbers (e.g. number of wins by your favourite basketball team; shoe sizes)*Continuous data*- where the numbers are ongoing; usually including decimals (e.g. mass of flour; heights of people)

**Categorical Data** - Data in the form of *words* such as the name of your favourite singer. Categorical data is of two types which are:

*Nominal data*-*Names*(e.g. months of the year; surnames; colours)*Ordinal data*- Words that indicate an*order*(e.g. t-shirt sizes such as small, medium and large; race positions such as first, second and third; hair colour such as light brown and dark brown)

## Did You Know That...?

With over 200 nations in the world, the 2009 **Happy Planet Index** rated countries' "happiness" as follows:

Costa Rica (1st), Vietnam (5th), Brazil (9th), Egypt (12th), Saudi Arabia (13th), Philippines (14th), Indonesia (16th), China (20th), Mexico (23rd), Pakistan (24th), Malaysia (33rd), India (35th), Singapore (49th), Germany (51st), Israel (67th), France (71st), United Kingdom (74th), Japan (75th), Iraq (79th), Iran (81st), Canada (89th), Australia (102nd), Russia (108th), United States of America (114th), South Africa (118th), Sudan (121st), Tanzania (142nd).

## Data Collection

**Population** - The population is everyone or everything that you could ask to get your data. Examples include:

*Every*student in your school (including those who are absent)*All*the cars registered by your state's motor vehicle department*All*of a music CD production company's sales data*Every*man, woman and child in a country.

**Sample** - Because it is often impossible, impractical or unnecessary to obtain data from the entire population, a smaller selected group is often asked. Examples include:

- 'Phoning telephone numbers which end with the digit 5
- Asking every tenth student on your school's enrolment list
- Asking the opinion of a movie from cinema goers exiting via the centre door
- Obtaining the statistics of car damage in all car accidents (Remember that many accidents are minor and unreported.)

**Census** - When data is obtained from the entire population, this data collection is called a census. In USA, a census of the whole population
is done every ten years. In Australia, a census occurs every five years. This is to obtain information to help the government know where it is best to
spend money collected from our taxes to build hospitals and schools.

**Survey** - When data is obtained from a smaller sample, this is called a survey. A survey is different from a questionnaire, the latter being a document containing the questions to be asked.

## Did You Know That...?

The earliest **census** is in dispute. It may have been done by the Babylonians about 8000 years ago, the Chinese about 5000 years ago, or the Egyptians about 3000 years ago.

## Sampling Methods

**Random Sampling** - This is the fairest sampling method. Examples include picking a name from a hat, or choosing a telephone number using random
number generating program such as the one on your calculator.

**Systematic Sampling** - This is arguably the second fairest sampling method. Examples include choosing every tenth student from a school's
enrolment list, or choosing the last name of every page of a telephone book.

**Stratified Sampling** - The word "strata" means "layers" or "groups". For example, a survey of male and female students at your school requires that the
number of *boys* and *girls* in the survey must be proportional to the total number of boys and girls were enrolled at the school. Stratified sampling is usually
done in conjunction with another method such as random or systematic sampling. (A worked example is shown below.)

**Cluster Sampling** - The samples are from typical geographic areas. For example, a questionnaire of city and country teenagers
about their driving skills may give quite different responses.

**Accessibility Sampling** - This form of sampling is convenient in the ease of accessing people, but can be biased. For example, a questionnaire asking
all the athletes who use the gym about the school lunch menu is convenient, but may obtain different responses if the sample was students in the school library.

**Capture-Recapture Method** - This sampling method is used to estimate wildlife population numbers. A scientist will capture, tag and release a certain
number of animals on the first occasion, then capture and count those already tagged on a second and much later occasion, and then use the following formula
to estimate the animal population. (A worked example is shown below.)

Whole population = | Number caught in 2nd capture | × Number caught in 1st capture |

Number tagged in 2nd capture |

## Example One - Stratified Sampling

In a company, there are 10 managers, 20 supervisors and 70 tradespeople. A survey regarding working conditions is to be carried out and a sample of 30 people from all staffing levels is required. How many from each staffing level should be in the sample?

**Answer:**

Total Number of Staff = 100

Total Number in Sample = 30

Number of managers = ^{10}⁄_{100} × 30 = 3 managers

Number of supervisors = ^{20}⁄_{100} × 30 = 6 supervisors

Number of tradespeople = ^{70}⁄_{100} × 30 = 21 tradespeople

## Example Two - Capture-Recapture Method

A marine biologist captures and tags 200 turtles during a week of research. One year later, the scientist captures 100 turtles and finds that 40 of these already have tags from the previous year. Use the formula to estimate the turtle population.

**Answer:**

Whole population = | Number caught in 2nd capture | × Number caught in 1st capture |

Number tagged in 2nd capture | ||

= | 100 | × 200 |

40 | ||

= | 500 turtles |

## Questions

**Q1.** In an environmental engineering company, there is a total of 200 staff, of whom 150 are women and 50 are men. In a sample of 40 staff, how many men and women should there be?

**Q2.** A scientist captures and tags 500 sea lions during a week of research. One year later, the scientist captures 200 turtles and finds that 50 of these already have tags from the previous year. Use the formula to estimate the sea lion population.

**Answers:**

**A1.** 30 women, 10 men
**A2.** 2000 sea lions

## Bias

**Bias in the questionnaire design** may be from a question that may receive inaccurate responses (e.g. "Do you take illegal drugs?"); or
create too much emotion or pressure (e.g. "Bats spread viruses to livestock. You would be in favour of culling them, wouldn't you?"); or incomprehensible
to the ordinary person (e.g. "Which operating system - Alpha, Bravo or Xenon - do you prefer?").

**Bias in the sampling method** occurs if the sample size is too small (e.g. Only five people are asked.); or not representative of the population
(e.g. Elderly folks at a retirement home are asked about teenagers' preferred music.); or performed at inappropriate times or places (e.g.
Supermarket customers at 10 am Monday may be different to those at 10 am Saturday.).

**Bias in the interpretation of results** may occur where statistical calculations are performed on data which is not
suited for the method (e.g. Finding the "average" of people's favourite colours is invalid.)

## Did You Know That...?

Prior to the 1951 English census, women were asked to be more **honest** in answering the question about their age.

## Questionnaire of Your Choice

Choose a topic about which you are passionate. **Write about 10 questions for a questionnaire**. The first three or four questions should be to find
out characteristics of the persons you will ask (e.g. age, gender, suburb) and the rest should be about the chosen topic (e.g. favourite music).
Test this on your friends. Rewrite any questions that needed improvement.