Producing Data:
Sampling
Dr. Nusar Hajarisman, MS.
Program Studi Statistika – Universitas Islam Bandung
Introduction
What is
sampling???
Sampling is the process of
selecting a group of individuals
from a population in order to
study them and characterize
the population as a whole.
Introduction Population
• The entire group of
individuals is called the population.
• For example, a researcher may be interested in the
relation between class size
(variable 1) and academic
performance (variable 2)
for the population of third-
grade children.
Introduction Sample
• Usually populations are so large that a researcher
cannot examine the entire group. Therefore, a sample is selected to represent the population in a research
study. The goal is to use the results obtained from the
sample to help answer
questions about the
population.
Introduction
Introduction
Introduction
Survey Sampling
• Survey sampling is a procedure within the design of an investigation through which data is collected through tools such as questionnaires or surveys.
• Sampling helps a lot in research. In
survey research, sampling refers to how we select members from the population to be in the study. It determines the
accuracy of research/survey results.
• The principle of sample surveys is not to observe the entire population studied but rather a properly selected subset, called a sample.
Survey Sampling
Types of survey sampling/Sampling Methods
Survey Sampling
Types of survey sampling/Sampling Methods
Survey Sampling
Characteristics of Probability Sampling
• In probability sampling we refer from the sample as well as the population.
• In probability sampling every individual of the population has equal probability to be taken into the sample.
• Probability sample may be representative of the population
• The observations (data) of the probability sample are used for the inferential purpose.
• Probability sample has not from distribution for any variable.
• Inferential or parametric statistics are used for probability sample.
• There is a risk for drawing conclusions from probability sample.
• The probability is comprehensive. Representativeness refers to characteristic.
• Comprehensiveness refers to size and area.
Probability Sampling
Probability Sampling
Probability Sampling
Simple Random Sampling
Probability Sampling
Simple Random Sampling
• A simple random sample is one in which each element of the
population has an equal and independent chance of being included in the sample i.e. a sample selected by randomization method is known as simple-random sample and this technique is simple random-sampling.
• Randomization is a method and is done by using a number of techniques as :
– Tossing a coin.
– Throwing a dice.
– Lottery method.
– Blind folded method.
– By using random table of ‘Tippett’s Table’.
Probability Sampling
Simple Random Sampling
Probability Sampling
Simple Random Sampling: Advantages
• It requires a minimum knowledge of population.
• It is free from subjectivity and free from personal error.
• It provides appropriate data for our purpose.
• The observations of the sample can be used for
inferential purpose.
Probability Sampling
Simple Random Sampling: Disadvantages
• The representativeness of a sample cannot be ensured by this method.
• This method does not use the knowledge about the population.
• The inferential accuracy of the finding
depends upon the size of the sample.
Probability Sampling
Stratified Random Sampling
Probability Sampling
Stratified Random Sampling
• It is an improvement over the earlier method.
• When employing this technique, the researcher divides his population in strata on the basis of some
characteristics and from each of these smaller homogeneous groups (strata) draws at random a predetermined number of units.
• Researcher should choose that characteristic or criterion which seems to be more relevant in his research work.
• Stratified sampling may be of three types:
– Disproportionate stratified sampling.
– Proportionate stratified sampling.
– Optimum allocation stratified sampling.
Probability Sampling
Stratified Random Sampling
• Disproportionate sampling means that the size of the sample in each unit is not proportionate to the size of the unit but depends upon considerations involving personal judgement and convenience. This method of sampling is more effective for comparing strata which have different error possibilities. It is less efficient for determining population characteristics.
• Proportionate sampling refers to the selection from each sampling unit of a sample that is proportionate to the size of the unit. Advantages of this
procedure include representativeness with respect to variables used as the basis of classifying categories and increased chances of being able to make comparisons between strata. Lack of information on proportion of the
population in each category and faulty classification may be listed as disadvantages of this method.
• Optimum allocation stratified sampling is representative as well as
comprehensive than other stratified samples. It refers to selecting units from each stratum should be in proportion to the corresponding stratum the
population. Thus sample obtained is known as optimum allocation stratified sample.
Probability Sampling
Stratified Random Sampling; Example
Probability Sampling
Stratified Random Sampling
Probability Sampling
Stratified Random Sampling: Advantages
• It is (more precisely third way) a good representative of the population.
• It is an improvement over the earlier.
• It is an objective method of sampling.
• Observations can be used for inferential
purpose.
Probability Sampling
Stratified Random Sampling: Disadvantages
• Serious disadvantage of this method is that it is difficult for the researcher to decide the relevant criterion for stratification.
• Only one criterion can be used for stratification, but it generally seems more than one criterion relevant for stratification.
• It is costly and time consuming method.
• Selected sample may be epresentative with reference to the used criterion but not for the other.
• There is a risk in generalization.
Probability Sampling
Systematic Sampling
Probability Sampling
Systematic Sampling
• Systematic sampling is an improvement over the simple random sampling. This method requires thecomplete information about the population.
• There should be a list of informations of all the individuals of the population in any systematic way.
• Now we decide the size of the sample:
Let sample size = n and population size = N
• Now we select each N/nth individual from the list and thus we have the desired size of sample which is known as systematic sample.
• Thus for this technique of sampling population should be arranged in any systematic way.
Probability Sampling
Systematic Sampling
Step one: Develop a defined structural audience to start working on the sampling aspect.
Step two: As a researcher, figure out the ideal size of the sample, i.e., how many people from the entire population to choose to be a part of the sample.
Step three: Once you decide the sample size, assign a number to every member of the sample.
Step four: Define the interval of this sample. This will be the standard distance between the elements
Probability Sampling
Systematic Sampling: Advantages
• This is a simple method of selecting a sample.
• It reduces the field cost.
• Inferential statistics may be used.
• Sample may be comprehensive and representative of population.
• Observations of the sample may be used for
drawing conclusions and generalizations.
Probability Sampling
Systematic Sampling: Disadvantages
• This is not free from error, since there is
subjectivity due to different ways of systematic list by different individuals.
• Knowledge of population is essential.
• Information of each individual is essential.
• This method can’t ensure the representativeness.
• There is a risk in drawing conclusions from the
observations of the sample.
Probability Sampling
Cluster Sampling
Probability Sampling
Cluster Sampling
• To select the intact group as a whole is known as a Cluster sampling.
• In Cluster sampling the sample units contain groups of elements (clusters) instead of individual members or items in the population.
• Rather than listing all elementary school children in a given city and randomly selecting 15 per cent of these students for the sample, a researcher lists all of the
elementary schools in the city, selects at random 15 per cent of these clusters of units, and uses all of the
children in the selected schools as the sample.
Probability Sampling
Cluster Sampling
Probability Sampling
Cluster Sampling: Advantages
• It may be a good representative of the population.
• It is an easy method.
• It is an economical method.
• It is practicable and highly applicable in education.
• Observations can be used for inferential
purpose.
Probability Sampling
Cluster Sampling: Disadvantages
• Cluster sampling is not free from error.
• It is not comprehensive.
Probability Sampling
Stratified Sample vs Simple random sample
Probability Sampling
Stratified Sample vs Cluster sample
Non Probability Sampling
Non Probability Sampling
Characteristics of Non Probability Sampling
• There is no idea of population in non-probability sampling.
• There is no probability of selecting any individual.
• Non-probability sample has free distribution.
• The observations of non-probability sample are not used for generalization purpose.
• Non-parametric or non-inferential statistics are used in non probability sample.
• There is no risk for drawing conclusions from non-
probability sample.
Non Probability Sampling
Type of Non Probability Sampling
Non Probability Sampling
Convenience Sampling
Non Probability Sampling
Convenience Sampling
• The term convenience or incidental or accidental applied to those samples that are taken because they are most frequently available, i.e. this refers to groups which are used as samples of a
population because they are readily available or because the researcher is unable to employ
more acceptable sampling methods.
Convenience Sampling
Advantages
• It is very easy method of sampling.
• It is frequently used in behavioural sciences.
• It reduces the time, money and energy
i.e. it is an economical method.
Disadvantages
• It is not a
representative of the population.
• It is not free from error.
• Parametric statistics
cannot be used.
Non Probability Sampling
Purposive Sampling
Non Probability Sampling
Purposive Sampling
• The purposive sampling is selected by some arbitrary method because it is known to be representative of the total population, or it is known that it will produce well matched groups.
• The Idea is to pick out the sample in relation to some criterion, which are considered important for the
particular study.
• This method is appropriate when the study places
special emphasis upon the control of certain specific
variables.
Non Probability Sampling
Purposive Sampling
Non Probability Sampling
Purposive Sampling
Non Probability Sampling
Quota Sampling
Non
Probability Sampling
Quota Sampling
• This combined both judgement sampling and probability
sampling.
• The population is classified into several categories: on the basis of judgement or assumption or the previous knowledge, the proportion of population falling into each category is decided.
• Thereafter a quota of cases to be
drawn is fixed and the observer is
allowed to sample as he likes.
Quota Sampling
Advantages
• It is an improvement over the judgement sampling.
• It is an easy sampling technique.
• It is most frequently used in social
surveys.
Disadvantages
• It is not a
representative sample.
• It is not free from error.
• It has the influence of
regional geographical
and social factors.
Non Probability Sampling
Snowball Sampling
Non Probability Sampling
Snowball Sampling
• Snowball sampling or chain-referral sampling is defined as a non-probability sampling technique in which the samples have traits that are rare to find.
• This is a sampling technique, in which existing subjects provide referrals to recruit samples required for a research study.
• For example, if you are studying the level of customer
satisfaction among the members of an elite country club, you will find it extremely difficult to collect primary data sources unless a member of the club agrees to have a direct
conversation with you and provides the contact details of the
other members of the club.
Non Probability Sampling
Snowball Sampling
• This sampling method involves a primary data source
nominating other potential data sources that will be able to participate in the research studies.
• Snowball sampling method is purely based on referrals and that is how a researcher is able to generate a sample.
• Snowball sampling is a popular business study method. The snowball sampling method is extensively used where a
population is unknown and rare and it is tough to choose
subjects to assemble them as samples for research.
Snowball Sampling
Types of Snowball Sampling
• Linear Snowball Sampling: The formation of a sample group starts with one individual subject providing
information about just one other subject and then the chain continues with only one referral from one subject. This
pattern is continued until enough number of subjects are
available for the sample.
Snowball Sampling
Types of Snowball Sampling
• Exponential Non-
Discriminative Snowball Sampling: In this type, the first subject is
recruited and then he/she provides multiple referrals.
Each new referral then
provides with more data
for referral and so on, until
there is enough number of
subjects for the sample.
Snowball Sampling
Types of Snowball Sampling
• Exponential Discriminative Snowball Sampling: In this technique, each subject gives multiple referrals,
however, only one subject is recruited from each referral.
The choice of a new subject depends on the nature of
the research study.
Snowball Sampling
Advantages
• It’s quicker to find samples: Referrals make it easy and quick to find subjects as they come from reliable sources. An additional task is saved for a researcher, this time can be used in conducting the study.
• Cost effective: This method is cost effective as the referrals are obtained from a primary data source. It’s is convenient and not so expensive as compared to other methods.
• Sample hesitant subjects: Some people do not want to come
forward and participate in research studies, because they don’t want their identity to be exposed. Snowball sampling helps for this
situation as they ask for a reference from people known to each other. There are some sections of the target population which are hard to contact.
Snowball Sampling
Disadvantages
• Sampling bias and margin of error: Since people refer those whom they know and have similar traits this
sampling method can have a potential sampling bias and margin of error. This means a researcher might only be able to reach out to a small group of people and may not be able to complete the study with conclusive results.
• Lack of cooperation: There are fair chances even after
referrals, people might not be cooperative and refuse to
participate in the research studies.
Snowball Sampling
Disadvantages
• The sample design should yield a truly representative sample;
• The sample design should be such that it results in small sampling error;
• The sample design should be viable in the context of budgetary constraints of the research study;
• The sample design should be such that the systematic bias can be controlled; and
• The sample must be such that the results of the sample
study would be applicable, in general, to the universe at
a reasonable level of confidence.
Potential Source of Error
Sampling Error vs Non-sampling Error
Sampling Error
Sampling Error
• Population specification error: A population specification error occurs when researchers don’t know precisely who to survey. For example, imagine a research study about kid’s apparel. Who is the right person to survey? It can be both parents, only the mother, or the child. The parents make purchase decisions, but the kids may influence their choice.
• Sample frame error: Sampling frame errors arise when researchers target the sub-population wrongly while selecting the sample. For example, picking a sampling frame from the telephone white pages book may have erroneous inclusions because people shift their
cities. Erroneous exclusions occur when people prefer to un-list their numbers. Wealthy households may have more than one connection, thus leading to multiple inclusions.
Sampling Error
• Selection error: A selection error occurs when respondents self- select themselves to participate in the study. Only the interested ones respond. You can control selection errors by going the extra step to request responses from the entire sample. Pre-survey
planning, follow-ups, and a neat and clean survey design will boost respondents’ participation rate. Also, try methods like CATI surveys and in-person interviews to maximize responses.
• Sampling errors: Sampling errors occur due to a disparity in the representativeness of the respondents. It majorly happens when the researcher does not plan his sample carefully. These sampling
errors can be controlled and eliminated by creating a careful sample design, having a large enough sample to reflect the entire
population, or using an online sample or survey audiences to collect responses.
Sampling Error
What are the steps to reduce sampling errors?
• Increase sample size: A larger sample size results in a more accurate result because the study gets closer to the actual population size.
• Divide the population into groups: Test groups according to their size in the population instead of a random sample. For example, if people of a specific demographic make up 20% of the population, make sure that your study is made up of this variable to reduce sampling bias.
• Know your population: Study your population and
understand its demographic mix. Know what demographics
use your product and service and ensure you only target the
sample that matters.
Non-Sampling Error
What are the steps to reduce sampling errors?
How to Calculate Sample Size
• Scientific studies often rely on surveys distributed among a sample of some total population.
• Your sample will need to include a certain number of people, however, if you want it to accurately reflect the conditions of the overall population it's meant to
represent.
• To calculate your necessary sample size, you'll need to determine several set values and plug them into an
appropriate formula.
Determining Key Values
Know your population size
• Population size refers to the total number of people within your demographic. For larger studies, you can use an approximated value instead of the precise number.
• Precision has a greater statistical impact when you work with a
smaller group. For instance, if you wish to perform a survey among members of a local organization or employees of a small business, the population size should be accurate within a dozen or so people.
• Larger surveys allow for a greater deviance in the actual population.
For example, if your demographic includes everyone living in the United States, you could estimate the size to roughly 320 million people, even though the actual value may vary by hundreds of thousands.
Determining Key Values
Determine your margin of error
• Margin of error, also referred to as "confidence interval," refers to the amount of error you wish to allow in your results.[2]
• The margin of error is a percentage the indicates how close your sample results will be to the true value of the overall population discussed in your study.
• Smaller margin of errors will result in more accurate answers, but choosing a smaller margin of error will also require a larger sample.
• When the results of a survey are presented, the margin of error
usually appears as a plus or minus percentage. For example: "35%
of people agree with option A, with a margin of error of +/- 5%"
• In this example, the margin of error essentially indicates that, if the entire population were asked the same poll question, you are
"confident" that somewhere between 30% (35 - 5) and 40% (35 + 5) would agree with option A.
Determining Key Values
Specify your standard of deviation
• The standard of deviation indicates how much variation you expect among your responses. Extreme answers are more likely to be
accurate than moderate results.
• Plainly stated, if 99% of your survey responses answer "Yes" and only 1% answer "No," the sample probably represents the overall population very accurately.
• On the other hand, if 45% answer "Yes" and 55% answer "No," there is a greater chance of error.
• Since this value is difficult to determine you give the actual survey, most researchers set this value at 0.5 (50%). This is the worst case scenario percentage, so sticking with this value will guarantee that your calculated sample size is large enough to accurately represent the overall population within your confidence interval and confidence level.
Determining Key Values
Find your Z-score
• The Z-score is a constant value automatically set based on your confidence level. It indicates the "standard normal score," or the number of standard deviations between any selected value and the average/mean of the population.
• You can calculate z-scores by hand, look for an online calculator, or find your z-score on a z-score table. Each of these methods can be fairly complex, however.
• Since confidence levels are fairly standardized, most researchers simply memorize the necessary z-score for the most common confidence levels:
– 80% confidence => 1.28 z-score – 85% confidence => 1.44 z-score – 90% confidence => 1.65 z-score – 95% confidence => 1.96 z-score – 99% confidence => 2.58 z-score
Sample Size
Using the Standard Formula
Sample Size
Using the Standard Formula
Sample Size
Using the Standard Formula
Sample Size
Creating a Formula for Unknown or Very Large Populations
Sample Size
Creating a Formula for Unknown or Very Large Populations
Sample Size
Creating a Formula for Unknown or Very Large Populations
Sample Size
Using Slovin's Formula
Sample Size
Using Slovin's Formula
Sample Size
Using Slovin's Formula
THANK YOU!