03 Producing Data-Sampling

(1)

Producing Data:

Sampling

Dr. Nusar Hajarisman, MS.

Program Studi Statistika – Universitas Islam Bandung

(2)

Introduction

What is

sampling???

Sampling is the process of

selecting a group of individuals

from a population in order to

study them and characterize

the population as a whole.

(3)

Introduction Population

• The entire group of

individuals is called the population.

• For example, a researcher may be interested in the

relation between class size

(variable 1) and academic

performance (variable 2)

for the population of third-

grade children.

(4)

Introduction Sample

• Usually populations are so large that a researcher

cannot examine the entire group. Therefore, a sample is selected to represent the population in a research

study. The goal is to use the results obtained from the

sample to help answer

questions about the

population.

(5)

Introduction

(6)

Introduction

(7)

Introduction

(8)

Survey Sampling

• Survey sampling is a procedure within the design of an investigation through which data is collected through tools such as questionnaires or surveys.

• Sampling helps a lot in research. In

survey research, sampling refers to how we select members from the population to be in the study. It determines the

accuracy of research/survey results.

• The principle of sample surveys is not to observe the entire population studied but rather a properly selected subset, called a sample.

(9)

Survey Sampling

Types of survey sampling/Sampling Methods

(10)

Survey Sampling

Types of survey sampling/Sampling Methods

(11)

Survey Sampling

Characteristics of Probability Sampling

• In probability sampling we refer from the sample as well as the population.

• In probability sampling every individual of the population has equal probability to be taken into the sample.

• Probability sample may be representative of the population

• The observations (data) of the probability sample are used for the inferential purpose.

• Probability sample has not from distribution for any variable.

• Inferential or parametric statistics are used for probability sample.

• There is a risk for drawing conclusions from probability sample.

• The probability is comprehensive. Representativeness refers to characteristic.

• Comprehensiveness refers to size and area.

(12)

Probability Sampling

(13)

Probability Sampling

(14)

Simple Random Sampling

(15)

• A simple random sample is one in which each element of the

population has an equal and independent chance of being included in the sample i.e. a sample selected by randomization method is known as simple-random sample and this technique is simple random-sampling.

• Randomization is a method and is done by using a number of techniques as :

– Tossing a coin.

– Throwing a dice.

– Lottery method.

– Blind folded method.

– By using random table of ‘Tippett’s Table’.

(16)

Probability Sampling

(17)

Simple Random Sampling: Advantages

• It requires a minimum knowledge of population.

• It is free from subjectivity and free from personal error.

• It provides appropriate data for our purpose.

• The observations of the sample can be used for

inferential purpose.

(18)

Simple Random Sampling: Disadvantages

• The representativeness of a sample cannot be ensured by this method.

• This method does not use the knowledge about the population.

• The inferential accuracy of the finding

depends upon the size of the sample.

(19)

Probability Sampling

Stratified Random Sampling

(20)

• It is an improvement over the earlier method.

• When employing this technique, the researcher divides his population in strata on the basis of some

characteristics and from each of these smaller homogeneous groups (strata) draws at random a predetermined number of units.

• Researcher should choose that characteristic or criterion which seems to be more relevant in his research work.

• Stratified sampling may be of three types:

– Disproportionate stratified sampling.

– Proportionate stratified sampling.

– Optimum allocation stratified sampling.

(21)

• Disproportionate sampling means that the size of the sample in each unit is not proportionate to the size of the unit but depends upon considerations involving personal judgement and convenience. This method of sampling is more effective for comparing strata which have different error possibilities. It is less efficient for determining population characteristics.

• Proportionate sampling refers to the selection from each sampling unit of a sample that is proportionate to the size of the unit. Advantages of this

procedure include representativeness with respect to variables used as the basis of classifying categories and increased chances of being able to make comparisons between strata. Lack of information on proportion of the

population in each category and faulty classification may be listed as disadvantages of this method.

• Optimum allocation stratified sampling is representative as well as

comprehensive than other stratified samples. It refers to selecting units from each stratum should be in proportion to the corresponding stratum the

population. Thus sample obtained is known as optimum allocation stratified sample.

(22)

Probability Sampling

Stratified Random Sampling; Example

(23)

(24)

Stratified Random Sampling: Advantages

• It is (more precisely third way) a good representative of the population.

• It is an improvement over the earlier.

• It is an objective method of sampling.

• Observations can be used for inferential

purpose.

(25)

Stratified Random Sampling: Disadvantages

• Serious disadvantage of this method is that it is difficult for the researcher to decide the relevant criterion for stratification.

• Only one criterion can be used for stratification, but it generally seems more than one criterion relevant for stratification.

• It is costly and time consuming method.

• Selected sample may be epresentative with reference to the used criterion but not for the other.

• There is a risk in generalization.

(26)

Systematic Sampling

(27)

Systematic Sampling

• Systematic sampling is an improvement over the simple random sampling. This method requires thecomplete information about the population.

• There should be a list of informations of all the individuals of the population in any systematic way.

• Now we decide the size of the sample:

Let sample size = n and population size = N

• Now we select each N/nth individual from the list and thus we have the desired size of sample which is known as systematic sample.

• Thus for this technique of sampling population should be arranged in any systematic way.

(28)

Probability Sampling

Systematic Sampling

Step one: Develop a defined structural audience to start working on the sampling aspect.

Step two: As a researcher, figure out the ideal size of the sample, i.e., how many people from the entire population to choose to be a part of the sample.

Step three: Once you decide the sample size, assign a number to every member of the sample.

Step four: Define the interval of this sample. This will be the standard distance between the elements

(29)

Probability Sampling

Systematic Sampling: Advantages

• This is a simple method of selecting a sample.

• It reduces the field cost.

• Inferential statistics may be used.

• Sample may be comprehensive and representative of population.

• Observations of the sample may be used for

drawing conclusions and generalizations.

(30)

Probability Sampling

Systematic Sampling: Disadvantages

• This is not free from error, since there is

subjectivity due to different ways of systematic list by different individuals.

• Knowledge of population is essential.

• Information of each individual is essential.

• This method can’t ensure the representativeness.

• There is a risk in drawing conclusions from the

observations of the sample.

(31)

Cluster Sampling

(32)

Cluster Sampling

• To select the intact group as a whole is known as a Cluster sampling.

• In Cluster sampling the sample units contain groups of elements (clusters) instead of individual members or items in the population.

• Rather than listing all elementary school children in a given city and randomly selecting 15 per cent of these students for the sample, a researcher lists all of the

elementary schools in the city, selects at random 15 per cent of these clusters of units, and uses all of the

children in the selected schools as the sample.

(33)

Cluster Sampling

(34)

Cluster Sampling: Advantages

• It may be a good representative of the population.

• It is an easy method.

• It is an economical method.

• It is practicable and highly applicable in education.

• Observations can be used for inferential

purpose.

(35)

Cluster Sampling: Disadvantages

• Cluster sampling is not free from error.

• It is not comprehensive.

(36)

Stratified Sample vs Simple random sample

(37)

Stratified Sample vs Cluster sample

(38)

Non Probability Sampling

(39)

Characteristics of Non Probability Sampling

• There is no idea of population in non-probability sampling.

• There is no probability of selecting any individual.

• Non-probability sample has free distribution.

• The observations of non-probability sample are not used for generalization purpose.

• Non-parametric or non-inferential statistics are used in non probability sample.

• There is no risk for drawing conclusions from non-

probability sample.

(40)

Type of Non Probability Sampling

(41)

Convenience Sampling

(42)

• The term convenience or incidental or accidental applied to those samples that are taken because they are most frequently available, i.e. this refers to groups which are used as samples of a

population because they are readily available or because the researcher is unable to employ

more acceptable sampling methods.

(43)

Advantages

• It is very easy method of sampling.

• It is frequently used in behavioural sciences.

• It reduces the time, money and energy

i.e. it is an economical method.

Disadvantages

• It is not a

representative of the population.

• It is not free from error.

• Parametric statistics

cannot be used.

(44)

Purposive Sampling

(45)

Purposive Sampling

• The purposive sampling is selected by some arbitrary method because it is known to be representative of the total population, or it is known that it will produce well matched groups.

• The Idea is to pick out the sample in relation to some criterion, which are considered important for the

particular study.

• This method is appropriate when the study places

special emphasis upon the control of certain specific

variables.

(46)

Purposive Sampling

(47)

Purposive Sampling

(48)

Quota Sampling

(49)

Non

Probability Sampling

Quota Sampling

• This combined both judgement sampling and probability

sampling.

• The population is classified into several categories: on the basis of judgement or assumption or the previous knowledge, the proportion of population falling into each category is decided.

• Thereafter a quota of cases to be

drawn is fixed and the observer is

allowed to sample as he likes.

(50)

Quota Sampling

Advantages

• It is an improvement over the judgement sampling.

• It is an easy sampling technique.

• It is most frequently used in social

surveys.

Disadvantages

• It is not a

representative sample.

• It is not free from error.

• It has the influence of

regional geographical

and social factors.

(51)

Non Probability Sampling

Snowball Sampling

(52)

Snowball Sampling

• Snowball sampling or chain-referral sampling is defined as a non-probability sampling technique in which the samples have traits that are rare to find.

• This is a sampling technique, in which existing subjects provide referrals to recruit samples required for a research study.

• For example, if you are studying the level of customer

satisfaction among the members of an elite country club, you will find it extremely difficult to collect primary data sources unless a member of the club agrees to have a direct

conversation with you and provides the contact details of the

other members of the club.

(53)

Snowball Sampling

• This sampling method involves a primary data source

nominating other potential data sources that will be able to participate in the research studies.

• Snowball sampling method is purely based on referrals and that is how a researcher is able to generate a sample.

• Snowball sampling is a popular business study method. The snowball sampling method is extensively used where a

population is unknown and rare and it is tough to choose

subjects to assemble them as samples for research.

(54)

Snowball Sampling

Types of Snowball Sampling

• Linear Snowball Sampling: The formation of a sample group starts with one individual subject providing

information about just one other subject and then the chain continues with only one referral from one subject. This

pattern is continued until enough number of subjects are

available for the sample.

(55)

Snowball Sampling

• Exponential Non-

Discriminative Snowball Sampling: In this type, the first subject is

recruited and then he/she provides multiple referrals.

Each new referral then

provides with more data

for referral and so on, until

there is enough number of

subjects for the sample.

(56)

Snowball Sampling

• Exponential Discriminative Snowball Sampling: In this technique, each subject gives multiple referrals,

however, only one subject is recruited from each referral.

The choice of a new subject depends on the nature of

the research study.

(57)

Snowball Sampling

Advantages

• It’s quicker to find samples: Referrals make it easy and quick to find subjects as they come from reliable sources. An additional task is saved for a researcher, this time can be used in conducting the study.

• Cost effective: This method is cost effective as the referrals are obtained from a primary data source. It’s is convenient and not so expensive as compared to other methods.

• Sample hesitant subjects: Some people do not want to come

forward and participate in research studies, because they don’t want their identity to be exposed. Snowball sampling helps for this

situation as they ask for a reference from people known to each other. There are some sections of the target population which are hard to contact.

(58)

Snowball Sampling

Disadvantages

• Sampling bias and margin of error: Since people refer those whom they know and have similar traits this

sampling method can have a potential sampling bias and margin of error. This means a researcher might only be able to reach out to a small group of people and may not be able to complete the study with conclusive results.

• Lack of cooperation: There are fair chances even after

referrals, people might not be cooperative and refuse to

participate in the research studies.

(59)

Snowball Sampling

Disadvantages

• The sample design should yield a truly representative sample;

• The sample design should be such that it results in small sampling error;

• The sample design should be viable in the context of budgetary constraints of the research study;

• The sample design should be such that the systematic bias can be controlled; and

• The sample must be such that the results of the sample

study would be applicable, in general, to the universe at

a reasonable level of confidence.

(60)

Potential Source of Error

Sampling Error vs Non-sampling Error

(61)

Sampling Error

(62)

Sampling Error

• Population specification error: A population specification error occurs when researchers don’t know precisely who to survey. For example, imagine a research study about kid’s apparel. Who is the right person to survey? It can be both parents, only the mother, or the child. The parents make purchase decisions, but the kids may influence their choice.

• Sample frame error: Sampling frame errors arise when researchers target the sub-population wrongly while selecting the sample. For example, picking a sampling frame from the telephone white pages book may have erroneous inclusions because people shift their

cities. Erroneous exclusions occur when people prefer to un-list their numbers. Wealthy households may have more than one connection, thus leading to multiple inclusions.

(63)

Sampling Error

• Selection error: A selection error occurs when respondents self- select themselves to participate in the study. Only the interested ones respond. You can control selection errors by going the extra step to request responses from the entire sample. Pre-survey

planning, follow-ups, and a neat and clean survey design will boost respondents’ participation rate. Also, try methods like CATI surveys and in-person interviews to maximize responses.

• Sampling errors: Sampling errors occur due to a disparity in the representativeness of the respondents. It majorly happens when the researcher does not plan his sample carefully. These sampling

errors can be controlled and eliminated by creating a careful sample design, having a large enough sample to reflect the entire

population, or using an online sample or survey audiences to collect responses.

(64)

Sampling Error

What are the steps to reduce sampling errors?

• Increase sample size: A larger sample size results in a more accurate result because the study gets closer to the actual population size.

• Divide the population into groups: Test groups according to their size in the population instead of a random sample. For example, if people of a specific demographic make up 20% of the population, make sure that your study is made up of this variable to reduce sampling bias.

• Know your population: Study your population and

understand its demographic mix. Know what demographics

use your product and service and ensure you only target the

sample that matters.

(65)

Non-Sampling Error

What are the steps to reduce sampling errors?

(66)

How to Calculate Sample Size

• Scientific studies often rely on surveys distributed among a sample of some total population.

• Your sample will need to include a certain number of people, however, if you want it to accurately reflect the conditions of the overall population it's meant to

represent.

• To calculate your necessary sample size, you'll need to determine several set values and plug them into an

appropriate formula.

(67)

Determining Key Values

Know your population size

• Population size refers to the total number of people within your demographic. For larger studies, you can use an approximated value instead of the precise number.

• Precision has a greater statistical impact when you work with a

smaller group. For instance, if you wish to perform a survey among members of a local organization or employees of a small business, the population size should be accurate within a dozen or so people.

• Larger surveys allow for a greater deviance in the actual population.

For example, if your demographic includes everyone living in the United States, you could estimate the size to roughly 320 million people, even though the actual value may vary by hundreds of thousands.

(68)

Determining Key Values

Determine your margin of error

• Margin of error, also referred to as "confidence interval," refers to the amount of error you wish to allow in your results.[2]

• The margin of error is a percentage the indicates how close your sample results will be to the true value of the overall population discussed in your study.

• Smaller margin of errors will result in more accurate answers, but choosing a smaller margin of error will also require a larger sample.

• When the results of a survey are presented, the margin of error

usually appears as a plus or minus percentage. For example: "35%

of people agree with option A, with a margin of error of +/- 5%"

• In this example, the margin of error essentially indicates that, if the entire population were asked the same poll question, you are

"confident" that somewhere between 30% (35 - 5) and 40% (35 + 5) would agree with option A.

(69)

Determining Key Values

Specify your standard of deviation

• The standard of deviation indicates how much variation you expect among your responses. Extreme answers are more likely to be

accurate than moderate results.

• Plainly stated, if 99% of your survey responses answer "Yes" and only 1% answer "No," the sample probably represents the overall population very accurately.

• On the other hand, if 45% answer "Yes" and 55% answer "No," there is a greater chance of error.

• Since this value is difficult to determine you give the actual survey, most researchers set this value at 0.5 (50%). This is the worst case scenario percentage, so sticking with this value will guarantee that your calculated sample size is large enough to accurately represent the overall population within your confidence interval and confidence level.

(70)

Determining Key Values

Find your Z-score

• The Z-score is a constant value automatically set based on your confidence level. It indicates the "standard normal score," or the number of standard deviations between any selected value and the average/mean of the population.

• You can calculate z-scores by hand, look for an online calculator, or find your z-score on a z-score table. Each of these methods can be fairly complex, however.

• Since confidence levels are fairly standardized, most researchers simply memorize the necessary z-score for the most common confidence levels:

– 80% confidence => 1.28 z-score – 85% confidence => 1.44 z-score – 90% confidence => 1.65 z-score – 95% confidence => 1.96 z-score – 99% confidence => 2.58 z-score

(71)

Sample Size

Using the Standard Formula

(72)

Sample Size

(73)

Sample Size

(74)

Sample Size

Creating a Formula for Unknown or Very Large Populations

(75)

Sample Size

(76)

Sample Size

(77)

Sample Size

Using Slovin's Formula

(78)

Sample Size

(79)

Sample Size

(80)

THANK YOU!