Basic Concepts of Statistics
Statistics
is the branch of mathematics that deals with the collection, organization,
analysis, and interpretation of numerical data.
Since research yields such quantitative data, statistics is a basic tool
of measurement, evaluation, and research.
The
word statistics is sometimes used to refer to any measure computed on the basis
of data obtained from a characteristic of a population under study. If one intends to know the average number of
senior citizens in a certain district, and this average will be taken from only
292 of the 20,000 families of that community, then this average constitutes a
statistic. However, if the entire 20,000
families were used in the calculation of the average number of senior citizens,
then the resulting average is now referred to as a parameter. The whole group of 20,000 families about
which we make estimates is the population or universe, and the smaller group of
392 families, which we selected as part of the population is the sample.
VARIABLES
An
important characteristic of many research questions is that they imply a
relationship of some sort to be investigated.
At this instance, it is necessary to introduce the concept of variables for a relationship is a
statement about variables. A variable is
any observable characteristic of a
person or object which may taken on several values or may be expressed in
several different categories – the individual members in the class of objects
must differ or vary to qualify the class as a variable. However, if all members of the class are
identical, we do not have a variable – such characteristic is called a
constant, since the individual members of the class are not allowed to vary.
QUANTITATIVE AND CATEGORICAL VARIABLES
Variables
can be classified as either quantitative or categorical. Quantitative variables exist in some degree
(rather than all or none) along a continuum from less to more, and we can
assign numbers to different individuals or objects to indicate how much of the
variable they possess. Variables that
can take specific or isolated values along a scale called discrete
variables. For example, when we count
the number of OFWs and the likes, we have a discrete variable.
Variables
that do not fall under the category of discrete are called continuous
variables. These variables are
measurable such as the exact ages of students in a certain class, the height
and the eights of children, and temperature.
By the
way of contrast, categorical variables do not vary in degree, amount or quantity
but are categorically different.
Variables such as gender, eye color, religion, occupation, position and
level of performance on a job are categorical.
Variables
may also be classified as dependent, independent and intervening
variables. The independent variable is
presumed to affect or influence other variables. The dependent or outcome variable is presumed
to be affected by one or more independent variables. An intervening variable is an independent
variable that may have unintended effects on a dependent variable in a
particular study.
Figure 1 shows the illustration of independent, intervening
and dependent variables.
Independent Variable Intervening
Variable Dependent
variable
Educational
attainment Age,
Gender, Civil status, Performance
Length
of service, socio-economic status
INSTRUMENTATION
The
collection of data is an extremely important part of any type of research, for
the conclusions of a study are based on what the data show. Thus, the kind of data to be gathered, the
method to be used in the gathering of data, and the treatment of the data need
to be considered carefully. The success
and usefulness of the result of the study will depend much on the accuracy and
reliability of the data. Bear in mind
that no statistical treatment can make unreliable data correct.
Data
refers to the kinds of information researches obtain on the subjects of their
research. They can be classified
according to source and to form.
TYPES OF DATA ACCORDING TO SOURCE
1.
Primary
data. These are data that are gathered
directly from the respondents of the study through observation, interview,
questionnaire experiment, or measurement.
2.
Secondary data.
These are data that have been previously gathered, compiled and are made
available to the researcher for analysis.
These include books, journals, records, reports and other publication.
TYPES OF DATA ACCORDING TO FORM
1.
Quantitative data. These are data that are measured on a scale.
2.
Qualitative or Categorical data. These are observations that can be classified
into a single category or a set of categories.
An important decision every researcher makes during the
planning stage of his investigation is the selection of the kind of data he
intends to collect the device (such as a pencil-and-paper test, a questionnaire,
or a rating scale) the researcher uses to gather data is called an
instrument. The whole process of
gathering data is known as instrumentation.
DATA COLLECTION METHODS
OBSERVATION.
Observation is one of the earliest methods for acquiring knowledge. In this method, the researcher watches
closely the overt behaviors of the subjects under investigation in various
natural settings. Observations may be
done by actual participation, which allows the researcher to gain detailed and
comprehensive picture of the respondents.
This is known as participant observation. However, the researcher is cautioned not to get
emotionally involved in the group for it may lose the objectivity of the study.
It
would be advantageous for the study that the respondents are not aware that
they are being observed so that they will behave naturally. This kind of observation is known as
non-participant observation.
Observation
may also be classified as structured type and unstructured type. In structured observation, the researcher
makes use of an observation guide that limits the focus of his observations to
aspects of behavior and activities or events relevant to the research problem
and activities. The unstructured
observation is open and flexible because the researcher does not restrict his
activity within an observation guide.
This gives the researcher an opportunity to modify the objectives of his
study as he gathers more data about the research problem.
Interview. Interview is a method of personal
communication between the researcher and the respondents. This method provides consistent and precise
information to the researcher because the respondent may classify the information. It is probably the most effective way there
is to enlist the cooperation of the respondents.
TYPES OF INTERVIEW
1.
Structured
interview. This type of
interview uses a research instrument called interview schedule. An interview schedule is made up of carefully
prepared and logically ordered questions.
2.
Unstructured interview. This type of interview is open and
flexible. The contents, sequence and
wordings of the questions are up to the researcher who makes use of an
interview guide which is the listing of topics that will be taken up during the
interview process.
Questionnaire.
In this method the subject responds to the questions by writing, or,
more commonly, marking an answer sheet.
The advantages of questionnaires are they can be mailed to given to
large numbers of people at the same time.
The disadvantages are the unclear or seemingly ambiguous questions
cannot be classified, and the respondents had no chance to expand on, or react
verbally to a question of particular interest or importance.
MEASUREMENT SCALES
A variable
uses a different type of analysis and measurement, requiring the use of
different measurement scales.
Measurement scales are ways of assigning numerals to variables. There are four type of measurement scales and
these are nominal, ordinal, interval and ratio scale.
Nominal scale. This scale is the simplest, and most limited
form of measurement researches can use.
It is merely used to differentiate categories in order to show
differences. For example, the
respondents under study may be grouped according to their gender and the
researcher may then assign the number 1 to females and the number 2 to
males. Since these numbers are simply
used for identification purposes, no implication that the males (assigned
number 2) is more anything than the females (assigned number 1).
Ordinal scale.
An ordinal scale is one which data are not only classified but also
ordered some way – high low or least to most.
For instance, a researcher might rank-order teacher’s performance
ratings from high to low. Notice,
however, that the difference in ratings or in actual performance between the
first and the second-ranked teachers and between the third-and fourth-ranked
students would not necessarily be the same.
Ordinal scales indicate relative standing among individuals.
Interval scale.
Interval scale has the attributes of ordinal scales plus another
feature: the distances between the points on the scale are equal. Examples of interval measurements are achievement
test scores, mental ability scores and temperature scales. This, it two students got scores of 75 and
80, respectively, in an achievement test, the distance between these scores is
said to be same as the distance between the two pupils who got scores of 90 and
95. The zero point on an interval scale
does not reflect a total absence of what is being measured. Thus, O on the Celsius scale does not
indicate that the object has no temperature.
Ratio Scale.
Ratio scale is similar to the interval scale only it has an actual, or
true zero point which indicates a total absence of the property being
measured. For example, a scale designed
to measure weight would be a ratio scale, because the zero on the scale
represents zero, or no weight at all.
According to John Best, the researcher who uses statistics
goes beyond the manipulation of data. He
is aware that the proper application of statistical method involves answering
the following questions:
1.
What
facts need to be gathered to provide the information necessary to answer the
question or to test the hypothesis?
2.
How are these observations to be selected,
gathered, organized and analyzed?
3.
What assumptions underlie the statistical
methodology to be employed?
4.
What conclusions can be validly drawn from the
analysis of the data?
Research consists of careful, systematic, patient study and
investigation in some field of knowledge undertaken for the purpose of
discovering relationships between variables.
The ultimate purpose is to obtain evidence to support or refute proposed
facts or principles that may be used to explain phenomena and predict future
occurrences. To conduct research,
principles must be established so that the observation and description have a
commonly understood meaning. Measurement
is the most reliable and universally accepted process of description, assigning
quantitative values to the properties of objects and events.
DESCRIPTIVE AND INFERENTIAL ANALYSIS
After instruments have been administered and data have been
collected and organized, the first step in data analysis is to describe it in a
summary fashion using one or more descriptive analysis.
Descriptive analysis. This type of statistical analysis limits
generalization to the particular group of individuals observed. No conclusions can be made beyond this group
and any similarity to those outside the group cannot be assumed. The data describe one group and the group
only lead to committing the type II error that is, accepting the null
hypothesis instead of rejecting it.
A correlated t-test is more powerful than is an independent
t-test when the subjects are truly dependent on each other. The independent t-test is more powerful when
the subjects are independently selected and assigned.
Experimental research. Experimental research is one of the most
powerful research methodologies researchers can use. Of the many types of research, it is the best
way to establish cause-and-effect relationships between variables.
In an
experimental study, researchers look at the effects of at least one independent
variable on one or more dependent variables.
The independent variable in experimental research is frequently referred
to as the experimental or treatment variable.
The dependent variable, also known as the criterion or outcome variable
refers to the results or outcomes of the study.
The major characteristic of experimental research, which
distinguishes it from all other types of research, is that researchers
manipulate the independent variable.
They decide the nature of the treatment, to which it is to be applied,
and to what extent. Independent
variables frequently manipulated in educational research include methods of
instruction, types of assignment, learning materials, rewards given to
students, and types of questions asked by teachers.
SAMPLING
A population refers to the entire group or set of
individuals or items to whom the researchers would like to generalize the
results of the study. A population is
further distinguished by its role in the study.
Consider the following examples.
Research problem: The Effects of Multimedia Instruction on
the Mathematical Achievement of first year High School Students in the Division
of City Schools of Manila.
Target Population:
All first year high school students in the division of City Schools of
Manila.
Accessible Population: All first year high school students
in the pilot high schools, Division of City Schools Manila
Sample: Ten percent of the first year high school students
in the pilot high schools, Division of City Schools, Manila.
A sample is a group of individuals in a research study on
which information or generalization about the population is drawn. It is essential that researchers describe the
population and the sample in sufficient detail so that others can determine the
applicability of the findings to their own situations.
DETERMINING SUFFICIENT SAMPLE SIZE FROM THE POPULATION
A common practice of some researchers who do not know the
systematic way of determining the adequate sample size is the use of getting
percentage from the population. They
base their sample size on their subjective decision.
To avoid such problem, we shall discuss two formulas in
determining sufficient sample size. It
is up to you to choose the one you
prefer when you conduct your research study.
The Slovin Formula. The
Slovin formula is the most common formula the researchers use in determining
the sufficient sample size. The reason
why most researchers utilize this formula is that they find it convenient to
compute. The formula is indicated below:
Where:
n = sample size; N = population size; e = the desired margin of error
which is usually 0.05
There are few guidelines suggested by
Fraenkel, with regard to the minimum number of 100 is essential. For correlational studies, a sample of at
least 50 is deemed necessary to establish the existence of a relationship. For experimental and causal-comparative
studies, a minimum of 30 individuals per group, although sometimes experimental
studies with only 15 individuals in each group can be defended if they are very
tightly controlled; studies using 15 subjects per group should be replicated,
however, before too much is made of any findings that occur.
SAMPLING DESIGNS
The term sampling as used in research, refers
to the process of selecting the subjects who will participate in a research
study. Sampling can be classified into
two basic types: non-probability and probability. In the non-probability type, there is no way
in estimating the probability that each individual or element will be included
in the sample. In probability sampling,
each member of the population does have an equal chance of being chosen as
representative sample.
PROBABILITY SAMPLING
1.
Simple random
sampling.
The basic type and most popular sampling design is the simple random
sampling. A simple random sample is one
in which each and every member of the population has an equal and independent
chance of being chosen. As such, it is
considered the best sampling design.
In this design, samples are picked either
by the use of table of random numbers or lottery techniques. The table of random numbers is an extremely
large list of numbers that has no order or pattern. Such lists can be found in the appendices of
most statistics books. It uses columns
or rows of numerical digits that were mechanically generated. In choosing sample units, the digits to be
used in the table should correspond to the digits of the population. For example, to obtain a sample of 200 from a
population of 2000 individuals select a column of numbers, start anywhere in
the column, and begin reading four digits numbers. This is the case since the final number 2000
consists of four digits. Use the same number
of digits for each individual.
Individual 1 would be known as 0001; individual 2 as 0002 and so
forth. The researcher would then proceed
to write down the first 200 numbers in the column that have a value of 2000 or
less.
In
the lottery technique, each population unit is assigned a number that is
written on a slip of paper. The papers
that are physically identical, are put into a bowl or a box and mixed
thoroughly. Then the samples are drawn
one at a time until the desired sample size is reached.
2.
Systematic sampling. This is a modified form of simple random
sampling. It involves selecting every
kth element in the population until the desired number of samples is
obtained. The value of k (sampling
interval) is determined by dividing the population size (N) by the sample size
(n). The quotient is then rounded off to
the next integer. For example, in a
population list of 5000 names, to select a sample of 370 a researcher would
select every 14th name on the list until a total of 370 names were
chosen.
The problem with systematic sampling is that it is sometime
overlooked. If the population has been
ordered systematically – that is, if the individuals are arranged in some sort
of pattern that accidentally coincides with the sampling interval – a markedly
biased can result. When planning to
select a sample from a list of some sort, researchers should carefully examine
the list to make sure there is no cyclical pattern present.
3.
Stratefied Random Sampling. This is the process of subdividing the
population into subgroups or strata and drawing members at random from each
subgroup or stratum in the same proportion as they exist in the
population. Suppose a researcher wants
to find out the effectiveness of the new Mathematics 1 worktext a certain
school district is considering adopting.
She intends to compare the achievement of students using the new book
with that of students using the more traditional text the district has been
using for years. Since she has reason to
believe that gender is a variable that may affect the outcomes of her study,
she decides to ensure that the proportion of males and females in the research
is the same as in the population. The
steps in the sampling process would be as follows:
A)
She identifies the target
population. The population consists of
all 3500 First year public high school students in the district enrolled during
the school year 1999-2000.
B)
She determines the desired sample
size. Using the slovin’s formula, we get
a sample of 359 students from the population of 3500.
C)
She identifies the strata into which the
population has been subdivided.
The population has been subdivided by gender. The researcher finds out that there are 2,100
females (60 percent) and 1,400 males (40 percent) in the population.
D)
She determines the number of respondents
to be selected from each stratum 60 percent of 359 = 215 females; 40 percent of
359 = 144 males
E)
Using a table of random numbers, she then
randomly selects 215 females and 144 males from the population. The advantage of stratified random sampling
is that it increases the likehood of representatives, especially if one’s
sample is not very large. It ensures
that the key characteristics of the respondents in the population are taken in
the same proportions in the sample.
4.
Cluster random sampling. The selection of groups, or clusters of
subjects rather than individuals is known as cluster sampling. This sampling design is used when the
population is very large and widely spread out over a wide geographical
area. Just as simple random sampling is
more effective with larger number of subjects, so is cluster random sampling
more effective with the larger number of clusters.
The following steps show how to determine
a cluster of mathematics teachers in a particular Division of City Schools
A)
Identify the population. The population, for example, consists of all
the 800 mathematics teachers in 36 schools in a given division.
B)
Determine the sample size. A sample of 267 teachers will be selected
from the population.
C)
Identify the logical cluster for the
given population. The logical cluster
for the population would be by school.
D)
Obtain a list of the clusters comprising
the population. A list of all the
schools in the given city is needed.
E)
Estimate the average number of members
per cluster in the population. Let us
say that the average number of teachers per school is 14.
F)
Determine the number of clusters to be
selected from the population by dividing the required sample size by the
average number of members per cluster in the population. In the example, the
desired size is 267 and there is an average of 14 teachers per school. Hence the number of clusters needed is 267/14
– 19.07 or 19.
G)
Randomly select the needed number of
clusters. Using simple random sampling,
19 schools shall be selected from the population list of 36 schools.
H)
Include all the members in the selected clusters. All the teachers in each of the 19 schools
selected will comprise the desired sample.
The advantages of cluster sampling are that it can be used when it is
difficult or impossible to select a random sample of individuals, it is often
far easier to implement, and it is frequency less time-consuming. Its disadvantage is that there is a far
greater chance of selecting a sam0ple that is not representative of the
population.
5.
Multi-Stage Sampling. This design is an extended version of cluster
sampling. It involves several stages in
drawing the representative sample from the population. The population units are grouped into
hierarchy of elements, and sampling is done successively. For example, in a nationwide research,
regions are selected at the first stage, provinces at the second stage, cities
and municipalities at the third stage, barangays of the sample cities and municipalities
at the fourth stage, and finally individual respondents within the selected
barangays at the fifth stage. At each stage,
simple random, systematic, or stratified designs may be utilized.
NON-PROBABILITY SAMPLING
1.
Convenience sampling. This design is resorted to when it is
extremely difficult to select a random sample.
Thus a researcher simply takes the closes persons (convenience sample)
who are available for study. For
example, if a researcher is interested to find out how jeepney drivers feel
about the oil deregulation law, she just goes to the nearest jeepney station
and interviews the first 50 jeepneys who are in there. Convenience samples cannot be considered
representative of any population and should be avoided. However, if they are the only choice a
researcher has, the study should be replicated with a number of similar samples
to decrease the likelihood that the results obtained were simply a one-time
occurrence.
2.
Purposive Sampling. This design is also known as judgmental
sampling. A purposive sample is a sample
selected because the individuals have special qualifications of some sort. Usually, this set of qualifications meet the
purposes of the researchers study. For
example, a researcher is interested in the changes in the sexual behavior of
people in middle life. She sets the
qualification of middle-aged person as one whose age in the range of 40 to 55
years old. Whoever qualifies and is
available is taken until the desired number of sample is attained.
3.
Snowball sampling. This design requires identification of a few
persons whose qualifications meet the purposes of the study. These persons serve as informants leading the
researchers to other individuals who qualify for inclusion in the sample who in
turn, lead to more persons who can be interviewed. This process goes on and on until the desired
number of respondents is obtained.
THE FREQUENCY DISTRIBUTION
Data may be arranged alphabetically,
chronologically, in ranked form or by using arrays. The choice of arrangement depends on the
purpose/s of the researcher.
The aforementioned ways of organizing
data, however, would not yield much information nor would give the reader a
clear picture of the whole situation.
They would just make the task of treating the data statistically more
convenient.
UNGROUPED DATA DISTRIBUTIONS
The list of test scores in a teacher’s
class record provided an example of ungrouped data. Since the usual method of listing is
alphabetical, the scores are difficult to interpret without some other type of
organization.
Example:
Alvarez,
Oliver 68
Ballesteros,
Antonio 79
Dacanay,
Alfredo 98
Salinas,
Romel 70
Wenceslao,
Paulino 88
The Array. Arranging the same set of scores in
descending or ascending order of magnitude produces what is known as array.
98
88
79
70
68
The array provides a more convenient
arrangement. In the above example, the
highest score (98), the lowest score (68) and the middle score (79) are easily
identified. The range (the difference
between the highest score and the lowest score) can easily be determined.
GROUPED DATA DISTRIBUTIONS
An array may help make the overall
pattern of data apparent. However, if
the number of scores is large construction of the array may have to be done on
a computer, and even then, the array may turn out to be so large that it is
difficult to understand.
Data
are more clearly presented when scores are grouped or presented in a frequency
distribution. A frequency distribution
is a tabular summary of a set of data that shows the frequency or number of
data items that fall in each of several distinct classes. To construct a frequency distribution, study
the test scores of students presented below and then follow the steps specified
below the table.
THE
SCORES OF 50 STUDENTS IN THE ENGLISH 1 MIDTERM EXAM
29 25 23 20 18 17 15 13 10 9
28 24 21 20 18 16 15 12 10 9
27 24 20 19 18 16 15 12 9 8
26 23 20 19 17 16 14 10 9 8
26 23 20 19 17 16 14 10 9 6
1.
Find the range. Range = Highest score – lowest score; Range =
29 – 6 = 23
2.
Determine the tentative or approximated
number of classes (k). K = 1 + 3.322 log
N. k = 1 + 3.322 log N= 1 + 3.322 log 50= k =7
Note: always round-off the quotient to
the next integer
3.
Determine the approximate size of the
class interval ( c ). C = range/k; 23/7
= 3.29 = 4.
Note: always round-off the quotient to
the next integer)
4.
Write the class intervals starting with
the lowest score. Stop when the class
already includes the highest score.
5.
Determine the class frequency for each
class interval by referring to the tally column and present the results in a
table.
Weights (in Kg) of 40 Grade 1 Pupils in a
Public School
Class limit Weight(in
kg) Class Boundaries Class
Mark Class Frequency Relative Frequency Percentage of Observations
89 – 93 88.5-93.5 91 3 0.075 7.5
84 – 88 83.5-88.5 86 5 0.125 12.5
79 – 83 78.5-83.5 81 11 0.275 27.5
74 – 78 73.5-78.5 76 12 0.300 30.0
69 – 73 68.5-73.5 71 5 0.125 12.50
64 – 68 63.5-68.5 66 3 0.075 7.5
59 – 63 58.5-63.5 61 1 0.025 2.5
The lowest and highest
values that can fit in a class are called the lower class limit and upper class
limit, respectively. In the class step 59 – 63 of the table is the lower class
limit and 63 is the upper class limit.
The difference between the lower class limit of one class and the lower
class limit of the next class is the class width. Each class should have the same class width
or interval (c), although it is not uncommon to see either the first of the
last class width a little longer of shorter than the others. The center of the class is called the midpoint. The midpoint is computed by getting one-half
of the sum of the lower and the upper class limits of one class. For example:
Class mark (midpoint) = (59 + 63)/2 = 61.
The class boundaries
are the true limits of a class defined by a lower boundary and an upper
boundary. Each class boundary equals the
number midway between the upper limit of a class and the lower limit of the
next higher class. A relative frequency
distribution indicates the proportion of the number of observations that is
occurring in each interval. That is
relative frequency = f/n.
No comments:
Post a Comment