The Medical Outcome Study Social Support Survey MO

(1)

J Adv Nurs. 2023;79:4521–4541. wileyonlinelibrary.com/journal/jan

|

4521 DOI: 10.1111/jan.15786

R E V I E W

The Medical Outcome Study Social Support Survey (MOS- SSS):

A psychometric systematic review

Tiet- Hanh Dao- Tran

¹

| Le- Trinh Lam

^2,3

| Namal N. Balasooriya

¹

| Tracy Comans

¹

1Centre of Health Services Research, Faculty of Medicine, University of Queensland, Brisbane, Australia

2University of Medicine and Pharmacy, Ho Chi Minh City, Vietnam

3Department of Nursing, College of Medicine, National Cheng Kung University, Tainan, Taiwan Correspondence

Tiet- Hanh Dao- Tran, Centre of Health Services Research, Faculty of Medicine, University of Queensland, Level 5, Health Science Building, Herston, Brisbane, Australia.

Email: [email protected];

[email protected]

Abstract

Aims: To evaluate and synthesize psychometric properties of the MOS- SSS and to identify quality versions of MOS- SSS for use in future research and practice.

Design: A psychometric systematic review.

Data Sources: Articles about the translation, adaptation, or validation of the MOS- SSS in Medline, PubMed, CINAHL, and Web of Science and their reference lists published before 11 November 2022.

Review Methods: The review followed the Consensus Standards for the Selection of Health Measurement Instruments guidelines.

Results: The review included 35 articles. Eleven versions of MOS- SSS (3, 4, 5, 6, 8, 12, 13, 16, 18, 19, and 22 items) have been validated in various populations and 13 languages. Of 14 studies developing a translated version of MOS- SSS, four studies performed both an experts' evaluation of content validity and a face validity test; two studies reported translation evaluation in the form of a content validity index. Of 35 studies, six performed both exploratory factor analysis and confirmatory factor analysis for structural validity; hypotheses and measurements for construct validity test- ings were often not clearly stated; two examined criterion validity; and four assessed cross- cultural validity. Internal consistency reliabilities were commonly examined by calculating Cronbach's alpha and reported satisfactory. Five studies analysed test–

retest reliabilities using intra correlation coefficient. Methodological concerns exist.

Conclusion: The English 19- item, Farsi Persian 19- item, and Vietnamese 19- item versions are recommended for future use in research and practice. Italian 19- item and Malaysian 13- item versions are not recommended to be used in future research and practice. All other versions considered in this review have potential use in future research and practice. Proper procedures for developing a translated version of MOS- SSS and validating the scale are recommended.

Impact: The review identified quality versions of MOS- SSS to measure social support in future research and practice. The study also indicated methodological issues in cur- rent validation studies. Application of the study findings and recommendations can be useful to improve outcome measurement quality and maximize the efficiency of resource use in future research and practice.

This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.

(2)

1 | INTRODUCTION

Quality outcome measurement is important because it can minimize measurement bias, improve measurement accuracy, and maximize the efficiency of resource use (Lam et al., 2022). Quality outcome measurement can be determined by its psychometric properties, including validities and reliabilities (Mokkink et al., 2016). Validities include content validity, structural validity, construct validity, and cross- cultural validity (Mokkink et al., 2016). Content validity should be evaluated by experts and participants from the target population (Mokkink et al., 2016). Sufficient content validity means the content is relevant, comprehensive, and comprehensible to the target population (Mokkink et al., 2016).

Structural validity is about the dimensions of the concept (Mokkink et al., 2016). Structural validity is assessed by exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) (Mokkink et al., 2016). EFA explores the number of factors for the construct (Izquierdo et al., 2014), while CFA confirms whether the formatted structure of the construct is supported (Mokkink et al., 2016).

Construct validities include convergent validity, divergent validity, known group validity, discriminant validity, concurrent validity, and predictive validity (Mokkink et al., 2016). Convergent and divergent validity considers the positive and negative correlations between outcome measurement and its theoretically correlated measures (Mokkink et al., 2016). The correlation between two theoretically correlated constructs that were measured at the same time indicates concurrent validity (Mokkink et al., 2016). The correlation between two theoretically correlated constructs, in which one was measured before the other, indicates predictive validity (Mokkink et al., 2016). Known group validity considers outcome measurement differences in different groups (Mokkink et al., 2016). Discriminant validity considers correlations between two different constructs (Rönkkö & Cho, 2022).

Criterion validity considers the correlation between the outcome measured by using the validating measurement and the gold standard measurement (Mokkink et al., 2016). Cross- cultural validity is the degree to which item measurement outcomes are similar across different groups (Mokkink et al., 2016). Groups can be languages, cultures, genders, education levels, etc. (Mokkink et al., 2016). One example of cross- cultural validity is similar measurement outcomes for the same participants while using different language versions of the measurement.

Reliabilities include internal consistency and test– retest reliability (Mokkink et al., 2016). Internal consistency reliability is about

the correlation of different items measuring a construct (Mokkink et al., 2016). Test– retest reliability is about the stability of the outcome over time (Mokkink et al., 2016).

Social support, which is defined as a process in which individuals exchange resources to enhance the recipients' health and well- being (Shumaker & Brownell, 1984), is a common measure in health and social science. The Medical Outcome Study Social Support Survey (MOS- SSS) is a widely used scale to measure social support because it is brief and easy to administer, but it can measure multidimensions of social support, including information support, tangible support, positive interaction, and affection with a disregard of the source of support (Sherbourne & Stewart, 1991).

Synthesis and evaluation of the psychometric properties of a measurement are useful to identify if a measurement or a version of a measurement is quality to use (Mokkink et al., 2016). Yet, the psychometric properties of the MOS- SSS have not been systemat- ically reviewed.

2 | THE REVIEW 2.1 | Aims

This review aimed to evaluate and synthesize the psychometric properties of the MOS- SSS and to identify quality versions of MOS- SSS for use in future research and practice.

2.2 | Design

A psychometric systematic review. The review followed the Consensus Standards for the Selection of Health Measurement Instruments (COSMIN) guideline (Prinsen et al., 2018).

2.3 | Search methods 2.3.1 | Information sources

Comprehensive searches on Medline, PubMed, CINAHL, and Web of Science database for peer- reviewed articles about translation, adaptation, or validation of MOS- SSS were conducted. Reference lists in identified articles were also inspected for additional relevant articles that may have been missed during the database searches.

No Patient or Public Contribution: This systematic review synthesized the evidence from previous research and did not involve any human participation.

K E Y W O R D S

COSMIN, instrument development, nursing, psychometric testing, research in practice, systematic review

(3)

2.3.2 | Search strategies and reference management

Keywords used for the database searches included (“Medical Outcome Study Social Support” OR “MOS social support” OR “MOS- SSS”) AND (translation OR reliability OR validity OR validation OR psychometric OR adaptation). All articles found were imported into reference management software (Zotero) to remove the duplicate outcomes. Searches were completed on 11 November 2022.

2.3.3 | Inclusion and exclusion criteria

Peer- reviewed quantitative articles published in full text in English about the translation, adaptation, or validation of the MOS- SSS were included in this review. Articles without full- text publica- tions or published in languages other than English were excluded.

Commentaries, reviews, and studies that did not report a psychometric test of the MOS- SSS and grey literature were also excluded.

2.4 | Search outcome

To report the search outcome, the Preferred Reporting Items for Systematic Reviews and Meta- Analyses (PRISMA) 2020 update guidelines were followed (Page et al., 2021). The database searches found 144 articles. After removing duplicates, 68 retrieved articles were imported into systematic review management software (Covidence) for two manuscript authors (THDT and NB) to independently screen

titles and abstracts and identify if articles satisfy inclusion and exclusion criteria. When differences between two reviewers could not be resolved in consensus discussions, a third reviewer (TC) was included to make a final decision by majority. After screening the titles and abstracts, 35 articles were retained, and retrieval of full texts was attempted. A further six articles were excluded as they did not have full texts or full texts were not in English. The hand search of the reference lists of the remaining 29 articles found another six articles, resulting in a final tally of 35 included in the review (see Figure 1).

2.5 | Data extraction

Extracted data included language, the number of items, research design, sample size and sampling method, the translation process, translators, and translation equivalence evaluation where applicable. Details of experts' evaluation of content validity (the number of experts, content validity index [CVI]), pilot test for examining face validity, and the tests to examine structural validity (EFA, CFA) were extracted. For EFA, the principle of analysis, rotation methods, and the number of extracted factors were extracted. For CFA, details of the data analysis approach, criteria for evaluating the goodness of fit, and the number of factors were extracted. Details about tests of convergent validity, divergent validity, criterion validity, known group validity, discriminant validity and predictive validity, and cross- cultural validity were also extracted. Finally, the review extracted data on internal consistency reliability (analysis methods, overall value, and all subscales' values) and test– retest reliability tests (analysis methods and duration between the two tests).

F I G U R E 1 PRISMA flowchart of the study search and selection.

43 articles from Medline 51 articles from PubMed 34 articles from CINAHL 16 articles from Web of science

Full text retrieved (n=35) Studies screen for title and abstract= 68 articles

35 articles included in this review.

Total identified articles (n=144)

Excluded irrelevant studies (n=33)

Remove duplicates from different databases (n=76)

Excluded articles (not published full text in English) (n=6)

Inclusion of relevant studies (n=6) Manual search referencing list (n=29)

(4)

2.6 | Appraisal process

The appraisal process was performed independently by two manuscript authors (THDT and LTL). Similar to the screening process, if differences between the two appraisers could not be resolved in consensus meetings, a third researcher (TC) was involved in a final decision by the majority.

For each study, the researchers used the COSMIN risk of bias checklist to rate the methodological risk of bias as very good (V), adequate (A), doubtful (D), or inadequate (I) (Mokkink et al., 2018).

The researchers also used the updated criteria for quality measurement properties to assess each psychometric property as sufficient (+), insufficient (−), or indeterminate (?) (Prinsen et al., 2018).

For each version of MOS- SSS, which was classified by language and the number of items, the researchers used the modified Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach to rate the overall quality of evidence about psychometric properties into the high, moderate, and low or very low category. Four criteria for this grading included the risk of bias, imprecision (small sample size), indirectness (different population from target population), and inconsistency of the results. The quality of evidence was downgraded (−1), (−2), (−3) for serious, very serious, extremely serious risk of bias. Evidence of serious risk of bias is based upon multiple studies of doubtful quality or one study of adequate quality. Evidence of a very serious risk of bias is based upon multiple studies of inadequate quality or one study of doubtful quality. Evidence of an extremely serious risk of bias is based on studies of inadequate quality only.

If the validation was done on a sample size <100, or between 50

and <100, the quality of evidence was downgraded (−1) and (−2),

respectively. The quality of evidence was also downgraded (−1) for having participants different from the population of interest or having inconsistent findings (Terwee et al., 2012; Terwee et al., 2018).

The researchers also synthesized the ratings on each psychometric property from all relevant studies to formulate an overall quality of each psychometric property. The highest rating from relevant studies was applied to rate the survey version. Rating options for each psychometric property included sufficient (+), insufficient (−), or indeterminate (?) (Terwee et al., 2012; Terwee et al., 2018).

The researchers finally used the modified GRADE approach to classify the quality of measurement of each available version of MOS- SSS into Category A, B, or C. Category A, which contained versions to be recommended for use in future research and practice, including measurements with sufficient content validity and at least low- quality evidence of a sufficient internal consistency reliability.

Category C, which contained versions not to be recommended for use in future research and practice, included measurements with high- quality evidence of insufficient quality for a measurement property. Category B contained versions having potential use in future research and practice and included measurements that did not fall in either Group A or C (Prinsen et al., 2018).

2.7 | Evidence synthesis

Findings of the review are presented in both tables and narrative texts. Tables outline extracted data or the evaluation, while narrative texts report a synthesis or summary of the findings.

3 | RESULTS

3.1 | Study characteristics

The review included 35 studies. The original version of MOS- SSS has 19 items and is in English to measure social support among adults with chronic conditions (Sherbourne & Stewart, 1991). There were eleven versions of the MOS- SSS (3, 4, 5, 6, 8, 12, 13, 16, 18, 19, or 22 items). MOS- SSS 19 item version was the most common version, used across 31 studies. The MOS- SSS has also been validated in 13 other languages, including Chinese, Taiwanese, French, Portuguese, Italian, Spanish, Turkish, Malaysian, Arabic, Farsi/

Persian, Vietnamese, Myanmar, and Greek. One study mentioned data collected in Pakistan but did not describe the language used.

The MOS- SSS has been validated in various populations, including community adults, patients with chronic or infectious disease (heart disease, human immunodeficiency virus [HIV] or tuberculosis), methadone users, cancer patients or cancer survivors, caregivers, community adults, women at various stages of life (such as postpar- tum, mothers of malnutrition children, or mother of children with cancer, detainees), and students. Several studies did not mention details about research designs and sampling methods. If described, cross- sectional design and convenience sampling were commonly used. Sample sizes used for validation ranged from 63 to 19,593 (see Table 1).

3.2 | Translation and translation evaluation

As seen in Table 2, 11 studies used the original English version of the MOS- SSS, 14 studies translated the MOS- SSS from English to another language, and 10 studies used the available translated version of the instrument. Among 14 studies that translated the MOS- SSS from English to another language, 13 studies described their translators; one study did not (Yu et al., 2004). Details about translators are limited. Eleven studies reported their back translation process; three studies did not (Khazaee- Pool et al., 2018; Norhayati et al., 2015;

Yilmaz & Bozo, 2019). The number of people involved in the translation and back- translation process varied from two individuals to two groups of translators. Ten studies mentioned if the translations were performed independently; four studies did not (Alaloul et al., 2021;

Mahmud et al., 2004; Nicolaou et al., 2015; Shyu et al., 2006).

Three studies did not report their translation equivalence evaluation (Alaloul et al., 2021; Nicolaou et al., 2015; Norhayati et al., 2015). Three studies had one person compare two English versions (Huang et al., 2021; Khazaee- Pool et al., 2018; Yilmaz

(5)

TA B L E 1 Study characteristics.

References

Language, number of

items Research design Sample, sampling methods

1. Sherbourne and Stewart (1991) English, 19 items Longitudinal 2987 community adult patients in the United States of America (US), stratified random

2. Gjesfjeld et al. (2008) English, 4, 12, 18 items Longitudinal 330 mothers with a child in mental health treatment in US, convenience

3. Moser et al. (2012)* English, 8 items Longitudinal and

cross- sectional 3241 women in the US? sampling method

4. Kim and Mazza (2014)* English, 19 items ? design 271 incarcerated women in the US, sampling method 5. Kim et al. (2017)* English, 19 items ? design 411 female detainees with HIV in the US? sampling

method

6. Holden et al. (2014)* English, 6, 19 items Cross- sectional 10,616 + 8977 Australian women, random 7. Levine et al. (2015)* English, 22 items Cross- sectional 135 breast cancer survivors in US, convenience 8. Conte et al. (2015) English, 19 items Cross- sectional 390 American Indian community people aged 55+,

random

9. Margolis et al. (2019)* English, 19 items Baseline data of RCT 199 care givers of African American children with asthma? sampling method

10. Yu et al. (2004) Chinese. 19 items Prospective 110 patients with heart failure in Hong Kong, convenience

11. Wang et al. (2013) Chinese, 19 items Cross- sectional 200 patients with chronic heart disease in China, convenience

12. Thompson et al. (2014)* Chinese, 19 items ? design 200 patients with coronary heart disease in China?

sampling method

13. Yu et al. (2015) Chinese 8, 19 items Cross- sectional 200 people with HIV in China, convenience

14. Shyu et al. (2006) Taiwanese, 19 items ? design 265 family caregivers of patients with cancer in Taiwan?

sampling method 15. Robitaille et al. (2011)* English, French, 19

items

Longitudinal 3113 community residents aged 55+ in Canada.

Stratified two stages

16. Soares et al. (2012) Portuguese, 19 items ? design 200 Hodgkin's lymphoma survivors 16+ in Brazil?

sampling method

17. Zucoloto et al. (2019) Portuguese, 19 items ? design 454 older Brazilian patients waiting for a medical appointment? sampling method

18. Giangrasso and Casale (2014) Italian, 19 items ? design 485 undergraduate students in Italy, convenience 19. Gómez- Campelo et al. (2014) Spanish, 8 items Cross- sectional 903 outpatients in Spain, simple random

20. Priede et al. (2018)* Spanish, 6, 8, 19 items Longitudinal 128 newly diagnosed cancer patients in Spain? sampling method.

21. Gálvez- Hernández et al. (2020) Spanish, 13, 19 items ? design 300 adult women with cancer in Mexico, convenience.

22. Dumitrache et al. (2021) Spanish, 19 items Cross- sectional 406 community older people (65+) in Spain? sampling method

23. Yilmaz and Bozo (2019) Turkish, 19 items Cross- sectional 241 University students in Turkey, convenience 24. Mahmud et al. (2004) Malaysian, 19 items ? design 215 post- partum women in Malaysia? sampling method 25. Norhayati et al. (2015) Malaysian, 13, 19 items Cross- sectional 144 post- partum mothers in Malaysia, convenience.

26. Saddki et al. (2017) Malaysian, 19 items ? design 120 patients with HIV in Malaysia, systematic random.

27. Din et al. (2020) Malaysian, 19 items Cross- sectional 296 Community older people in Malaysia? sampling method

28. Dafaalla et al. (2016) Arabic, 19 items Cross- sectional 487 medical students in Sudan, cluster random.

29. Alaloul et al. (2021) Arabic, 19 items Cross- sectional 63 cancer survivors? sampling method

30. Khazaee- Pool et al. (2018) Farsi, 19 items ? design 204 women in a healthcare centre in Iran, cluster random.

31. Bavarsad et al., 2021 Persian, 5 items Cross- sectional 420 Iranian older adults? sampling method

(Continues)

(6)

& Bozo, 2019). Seven studies reported that a group compared two English versions (Bavarsad et al., 2021; Dafaalla et al., 2016;

Giangrasso & Casale, 2014; Khuong et al., 2018; Saddki et al., 2017;

Shyu et al., 2006; Yu et al., 2004). One study used parallel forms to collect data from 30 bilingual participants over a period of a week and calculated a Kappa correlation coefficient of 0.98 to illustrate a high translation equivalence (Mahmud et al., 2004).

3.3 | Validities and reliabilities of the MOS- SSS

Table 2 provides details of the expert's evaluation on content validity, and face validity. Table 3 provides details of structural validity, hypothesis testing for construct validities (including convergent, divergent, known group, and discriminant validity), cross- cultural validity, interval consistency reliability, and test– retest reliability.

3.3.1 | Experts' evaluation of content validity, and face validity

Expert evaluation of content validity and face validity were examined in the study originally developed the scale (Sherbourne &

Stewart, 1991). Six out of 14 studies, which developed a translated version of the MOS- SSS, also reported having experts evaluation on content validity (Bavarsad et al., 2021; Khuong et al., 2018;

Mahmud et al., 2004; Norhayati et al., 2015; Saddki et al., 2017;

Yu et al., 2004). However, only two studies calculated an experts' content validity index (CVI) and indicated sufficient content validity (CVI > 0.70) (Bavarsad et al., 2021; Yu et al., 2004). Adaptation for cultural appropriateness was reported in two studies (Khuong et al., 2018; Norhayati et al., 2015). Nine studies conducted pilots to evaluate the face validity of the MOS- SSS.

3.3.2 | Structural validity

Thirty- four studies performed structural validity tests. Six studies performed on both EFA and CFA (Bavarsad et al., 2021; Gálvez- Hernández et al., 2020; Gómez- Campelo et al., 2014; Moser et al., 2012; Sherbourne & Stewart, 1991; Zucoloto et al., 2019).

Twelve studies performed EFA only, of which two performed Rasch analysis (Kim et al., 2017; Kim & Mazza, 2014). Sixteen studies performed CFA only.

Of 18 studies performing EFA, nine studies used principal component analysis (PCA), two studies used principal axis factoring (PAF; Gálvez- Hernández et al., 2020; Mahmud et al., 2004), two studies used maximum likelihood principle (Bavarsad et al., 2021;

Zucoloto et al., 2019) and five studies did not report a specific extraction principle (Gómez- Campelo et al., 2014; Kim et al., 2017;

Saddki et al., 2017; Yilmaz & Bozo, 2019; Yu et al., 2015). Ten studies used the orthogonal rotation method (varimax (9), unspecific (1) (Nicolaou et al., 2015)). Three studies used the oblique rotation method (oblimin (2) (Gálvez- Hernández et al., 2020; Mahmud et al., 2004), unspecific (1) (Yu et al., 2015)). One stated unrotated (Sherbourne & Stewart, 1991), and four studies did not report a specific rotation method (Gómez- Campelo et al., 2014; Kim et al., 2017;

Saddki et al., 2017; Yilmaz & Bozo, 2019). EFA found MOS- SSS 8- item version had two factors in its construct, and MOS- SSS 19 items had from three to five factors in its structure.

Of 22 studies performing CFA, most used the classical test theory approach, a few used Rasch analysis (Huang et al., 2021) or Mokken scale for data analysis (Thompson et al., 2014). One study did not mention the criteria for model fit clearly (Sherbourne &

Stewart, 1991). Criteria for model fit varied across these studies;

some studies had different criteria for model fit from what was mentioned in COSMIN guidelines (Gjesfjeld et al., 2008; Moser et al., 2012). Because of the different criteria for identifying model fit, conclusions from three studies about model fit were inconsistent with COSMIN guidelines (Gjesfjeld et al., 2008; Moser et al., 2012). To be more specific, while the studies concluded that models were not fit, they were considered fit according to COSMIN guidelines. For example, Gjesfjeld et al. (2008) concluded that their 18- item model did not fit as not all their criteria for model fit were met. However, according to COSMIN guidelines, their model was fit because it has Comparative Fit Index (CFI) = 0.96.

The CFA found one to two- factor construct was reported for MOS- SSS 3, 4, 5, 6, and 8 items. The MOS- SSS 12, 16, and 22 items had four factors. The MOS- SSS 19 items appeared to have from two to five factors in different contexts, even though it was commonly reported as having four factors.

References

Language, number of

items Research design Sample, sampling methods

32. Khuong et al. (2018) Vietnamese, 19 items Cross- sectional 300 Methadone maintenance patients in Vietnam, convenience.

33. Huang et al. (2021) Myanmar, 19 items Cross- sectional 250 people living with HIV in Myanmar, convenience.

34. Saqib et al. (2019) ? language, 19 items Cross- sectional 269 people with TB in Pakistan, multistage sampling 35. Nicolaou et al. (2015) Greek, 19 items Cross- sectional 260 mothers of children with and without cancer in

Cyprus, purposive Note: (*) secondary data analysis; (?) indeterminate.

Abbreviation: RCT, randomized control trial.

TA B L E 1 (Continued)

(7)

TABLE 2 Translation and content validity. References

TranslationContent validity ProcessTranslatorsTranslation equivalenceExpert evaluationFace validityOverall 1. Sherbourne and Stewart (1991)nananaSix behavioural scientists.Pilot on patients with chronic conditions in rural clinic

A/+ 2. Gjesfjeld et al. (2008)nanana 3. Moser et al. (2012)nanana 4. Kim and Mazza (2014)nanana 5. Kim et al. (2017)nanana 6. Holden et al. (2014)nanana 7. Levine et al. (2015)nanana 8. Conte et al. (2015)nanana 9. Margolis et al. (2019)nanana 10. Yu et al. (2004)Translation, back translation (Brislin, 1986)20 bilingual health professionals compare two English versions

CVI (20 bilingual health professionals) = .82D/? 11. Wang et al. (2013)Used an available translated version (Yu et al.,2004) 12. Thompson et al. (2014)Used an available translated version (Yu et al.,2004) 13. Yu et al. (2015)Used an available translated version (D. S. Yu et al.,2004) 14. Shyu et al. (2006)Translation, back translation (Brislin, 1970)Two bilingual- speaking expertsresearch team compare two English versions 15. Robitaille et al. (2011)Used an available translated version (Anderson et al.,2005) 16. Soares et al. (2012) 17. Zucoloto et al.,2019Used an available translated version (published in Portuguese [2014]) 18. Giangrasso and Casale (2014)Translation, back translation processOne bilingual translator did the translation, and one bilingual translator did back translation

Colleagues and researchers compare two English versions 19. Gómez- Campelo et al. (2014)Used an available translated version (Revilla et al.,2005) 20. Priede et al. (2018)Used an available translated version (published in Spanish [2005]) 21. Gálvez- Hernández et al. (2020)Used an available translated version (published in Spanish [2007]) (Continues)

(8)

References

TranslationContent validity ProcessTranslatorsTranslation equivalenceExpert evaluationFace validityOverall 22. Dumitrache et al. (2021)Used an available translated version (published in Spanish [2005]) 23. Yilmaz and Bozo (2019)Translation, back translation process.Three bilinguals independently did forward translation, missing details on the back translation.

One bilingual psychologist chose the best translation! 24. Mahmud et al. (2004)Translation, backward translation (Brislin, 1970)A group of bilingual teachers did forward translation. A group of bilingual primary care doctors did the backward translation.

A psychiatrist, a physician, and two medical practitioners evaluate semantic, and conceptual equivalence (Flaherty et al.,1988). Parallel form reliability (30 bilingual participants, a week) = 0.98 A psychiatrist, a physician, and two medical practitioners evaluate content.

a psychiatrist, a physician, and two medical practitioners evaluated

D/? 25. Norhayati et al. (2015)Translation, backward translationA Family Health specialist and a public health physician with English proficiency did forward translation independently. Missing details on back translation.

Not providedevaluated by a panel of experts, rephrasing for cultural adaptation was reported.

10 female staff evaluatedA/+ 26. Saddki et al. (2017)Translation, backward translation (Beaton et al.,2000)Two translators did translation independently; two translators did back translation independently.

Panel and expert compare two English versions (Beaton et al.,2000)

Panel and expert (Beaton et al.,2000)Not providedD/? 27. Din et al. (2020)Used an available translated version (Saddki et al.,2017) 28. Dafaalla et al. (2016)Translation, back translation process.A certified translator did the translation, and a bilingual speaker did the backward translation.

Two authors compare two English versionsPilotD/? 29. Alaloul et al. (2021)Forward, backward translation.Two bilingual doctoral degree- qualified nurses and one registered nurse were involved in the process. No further details.

Bilingual doctoral student, RN, and monolingual layperson evaluated

D/?

TABLE 2 (Continued)

(9)

References

TranslationContent validity ProcessTranslatorsTranslation equivalenceExpert evaluationFace validityOverall 30. Khazaee- Pool et al. (2018)Translation, back translation processA bilingual professional translator whom was a master's prepared health promotion expert translated to Farsi. No details in the back translation.

A second bilingual translator compares two English versions

Pilot on 29 Iranian womenD/? 31. Bavarsad et al. (2021)Translation, back translation processTwo bilingual translators independently did the translation. Two bilingual translators independently did back translation.

The research team compare two English versionsCVI (19 reviewers) = 0.79, kappa (inter- rater correlation) = 0.74

Pilot on 10 older peopleA/+ 32. Khuong et al. (2018)Translation, back translation processA bilingual English teacher did the translation, and another Bilingual English teacher did the back translation.

Two translators and a researcher compare two English versions

Adaptation (item 10) for cultural appropriatenessPilot on 10 patientsA/+ 33. Huang et al. (2021)Translation, back translation processA bilingual translator did the translation. A bilingual researcher did the back translation

A researcher compared two English versionsPilot on 10 People living with HIVD/? 34. Saqib et al. (2019)nanana 35. Nicolaou et al. (2015)Translation, back translation process.Two translators translated into Greek, and two translators did back translations.

Pilot on 10 mothers' of healthy childrenD/? Note: Risk of bias assessment: Very good (V); Adequate (A); Doubtful (D); Inadequate (I); Quality of psychometric properties compared to the gold criteria: sufficient (+); insufficient (−); indeterminate (?). Abbreviations: CVI, content validity index; na, not applicable.

TABLE 2 (Continued)

(10)

TABLE 3 Risk of bias and quality of psychometric properties for each study. References Structural validity Cross cultural validity

Construct validities Internal consistency reliabilityTest– retest reliabilityOverallEFACFAConvergent/ divergentCriterionKnown group/ discriminantPredictive 1. Sherbourne and Stewart (1991)V/+19 items: PCA, unrotated, 4 factors19 items: V/?, 4 factorsV/+V/+V/+V/+V (1 year)/? 2. Gjesfjeld et al. (2008)V/+4 items: V/+ 1 factor.V/+V/+ V/+12 items: V/+ 4 factorsV/+V/+V/+ V/−18 items: V/+V/+I/? 3. Moser et al. (2012)V/+8 items: V/+, PFA, varimax, 2 factors8 items: V−, 2 factorsV/+I/+V/+V/+ 4. Kim and Mazza (2014)V/+19 items: A/+, PCA, varimaxV/+ 5. Kim et al. (2017)A/+19 items: A/+, PCA? rotationV/+V/+ 6. Holden et al. (2014)V/+ V/?6 items: V/+1 factor 19 items: V/+V/+ V/+I/+V/+ V/+V/+ 7. Levine et al. (2015)A/?22 items: A/+, PCA, varimax, 4 factorsV/+D/− 8. Conte et al. (2015)V/+19 items: V/+, 4 factorsV/+V/+ 9. Margolis et al. (2019)18 items: V/+V/+I/? 10. Yu et al. (2004)A/−19 items: V/−, 4 factorsV/+V/+V (2 weeks)/+ 11. Wang et al. (2013)V/−19 items: V/−, 4 factorsV/+V/+V (2 weeks)/+ 12. Thompson et al. (2014)V/?V/?, 1 factor 13. Yu et al. (2015)A/?8 items: A/+? principle, oblique, 2 factorsV/+V/+ A/+19 items: A/+? principle, oblique, 5 factorsV/+ 14. Shyu et al. (2006)A/?19 items: A/+, PCA, varimax 2 factorsV/+V/+V/+ 15. Robitaille et al. (2011)V/+19 items: V/+, 4 factorsV/+V/+ 16. Soares et al. (2012)A/?19 items: A/+, PCA, varimax, 3 factorsV/+V/+V/+ 17. Zucoloto et al. (2019)V/+19 items: Maximum likelihood, varimax, 3– 4 factors19 items: V/+, 3 factors V/+, 4 factorsV/+ 18. Giangrasso and Casale (2014)V/−19 items: V/−, 4 factorsV/+V/+V/+A (10 weeks)/? 19. Gómez- Campelo et al. (2014)V/+8 items:? principle, rotation, 1 factor8 items: V/+, 1 factorA/+V/+V/+

(11)

References Structural validity Cross cultural validity

Construct validities Internal consistency reliabilityTest– retest reliabilityOverallEFACFAConvergent/ divergentCriterionKnown group/ discriminantPredictive 20. Priede et al. (2018)V/+6 items: V/+1 factorV/+V/+ V/+8 items: V/+ 2 factorsV/+V/+ A/−19 items: A/−, 3– 5 factorsV/+V/+ 21. Gálvez- Hernández et al. (2020)V/+ V/−13 items: PAF, oblimin. 3 factors13 items: V/+ 3 factors 19 items: V/−

V/+ 22. Dumitrache et al. (2021)V/+V/+ 19 items: 5 factorsV/+ 23. Yilmaz and Bozo (2019)A/?19 items: A/+? principle, rotation, 4 factorsV/+V/+V/+D (1 month)/? 24. Mahmud et al. (2004)A/?19 items: A/+, PAF, oblimin, 3 factorsV/+V/+A/? V (Parallel form)/ + 25. Norhayati et al. (2015)V/+13 items: V/+13, 3 factorsV/− 26. Saddki et al. (2017)A/?19 items: A/+, PCA? rotation, 4 factorsV/+V/+V (1– 2 weeks)/ + 27. Din et al. (2020)V/−19 items: V/+ 4 factorsV/+V/+ 28. Dafaalla et al. (2016)A/?19 items: A/+, PCA, varimax, 4 factorsD/+V/+A (split half, 10 days)/? 29. Alaloul et al. (2021)I/+19 items: I/+ 4 factorsV/+V/+ 30. Khazaee- Pool et al. (2018)V/+19 items: PCA, varimax. 3 factors19 items: V/+, 3 factorsV/+V/+D/? 31. Bavarsad et al. (2021)V/+19 items: Maximum likelihood, varimax. 2 factors.19 items: V/+, 2 factorsV/+V/+V/+V (2 weeks)/ + 32. Khuong et al. (2018)V/+19 items: V/+, 4 factorsV/+V/+V/+V (2 weeks)/− 33. Huang et al. (2021)V/+19 items: V/+, 4 factorsV/+V/+V/+ 34. Saqib et al. (2019)V/+ 35. Nicolaou et al. (2015)A/?19 items: A/+? principle, orthogonal, 3 factorsV/+V/+ Note: Risk of bias assessment: Very good (V); Adequate (A); Doubtful (D); Inadequate (I); Quality of psychometric properties compared to the gold criteria: sufficient (+); insufficient (−); indeterminate (?). Abbreviations: CFA, confirmatory factor analysis; EFA, exploratory factor analysis; PAF, Principal axis factoring; PCA, principal component analysis.

TABLE 3 (Continued)

(12)

3.3.3 | Convergent, divergent validity

Four studies claimed convergent validity by assessing the correlations between items and their subscales or correlations between the subscales, which is not in line with the COSMIN guidelines (Din et al., 2020; Dumitrache et al., 2021; Gálvez- Hernández et al., 2020;

Norhayati et al., 2015). Their reports of this “convergent validity”

were not included in the review. Of 25 studies that examined convergent or divergent validity, they included the Short Form 36 items (SF- 36), Health Survey, Spirituality Well- being Scale, Ryff's Scales of Psychological Well- being, World Health Organization Quality Of Life— Brief version (WHOQOL- BREF), Beck Depression Inventor- II (BDI- II) for depression, anxiety and stress, Post- Natal Depression Scale, Depression, Anxiety and Stress Scale 21 items (DASS- 21), Hospital Anxiety and Depression Scale, 10- item Perceived Stress Scale (PSS- 10), and the Optimism for life satisfaction (LOT- R) and loneliness. However, hypotheses were not clearly stated in several studies, and descriptions of these measurements were limited re- garding their validities and reliabilities in several studies.

3.3.4 | Known group, discriminant validity

One study claimed discriminant validity by comparing the correlation between an item and its two standard errors in their own subscale and the correlation between an item and its two standard errors in any other subscale (Dafaalla et al., 2016). Four compared correlations between factors and Average Variance Extracted (AVE) square root for discriminant validity (Din et al., 2020; Dumitrache et al., 2021; Gálvez- Hernández et al., 2020; Norhayati et al., 2015).

These approaches were not in line with COSMIN guidelines. Their reports of the “discriminant validity” were not included in this review. Fourteen studies examined known group or discriminant validity and found significant differences in social support among people with different marital statuses, levels of education, income category, employment status, and presented mood and anxiety disorders.

3.3.5 | Predictive validity

One study tested correlations between social support and theoretically correlated measures in a longitudinal study and found a significant association between MOS- SSS with the expected outcome (Sherbourne & Stewart, 1991).

3.3.6 | Criterion validity

Three studies examined correlations between the MOS- SSS- 19 and the Multidimensional Scale of Perceived Social Support (MSPSS) (Din et al., 2020; Khuong et al., 2018; Yilmaz & Bozo, 2019) and claimed this test for criterion validity, which is not in line with the COSMIN guideline. One found a strong correlation between MOS- SSS and

MSPSS (correlation is >.7, p < .05; Khuong et al., 2018), two found significant correlations between MOS- SSS and MSPSS (Din et al., 2020;

Yilmaz & Bozo, 2019). Their reports on the “criterion validity” were not included in this review. In line with the COSMIN guideline, two studies examined the correlations between the short form of MOS- SSS- 6 items and the MOS- SSS- 19 items and found strong correlations for criterion validity (Holden et al., 2014; Moser et al., 2012).

3.3.7 | Cross- cultural validity

Four studies explored cross- cultural validity, including one exploring measurement invariance in data collected by using the questionnaire in different languages (Robitaille et al., 2011) and three examining the differential item functioning (DIF) across two groups (Huang et al., 2021; Kim et al., 2017; Soares et al., 2012). When DIF was examined, the studies found that respondents from different groups responded similarly to the measurement items (non- DIF), indicating sufficient cross- cultural validity.

3.3.8 | Internal consistency reliability

Thirty- two studies tested internal consistency reliabilities using Cronbach alpha; one study calculates item reliability using Rasch analysis (Kim et al., 2017). Thirty- one studies reported sufficient internal consistency reliabilities (higher than a desirable value of 0.70), and two studies had a subscale of the MOS- SSS with Cronbach's alpha slightly lower than the desirable value (0.61– 0.65 vs. 0.70) (Levine et al., 2015; Norhayati et al., 2015). Three studies reported overall internal consistency reliabilities but not internal consistency reliabilities for all subscales of the measurement (Gjesfjeld et al., 2008; Mahmud et al., 2004; Margolis et al., 2019).

3.3.9 | Test– retest reliability

Ten studies performed test– retest analysis. Two did not report the method to examine test– retest reliability (Khazaee- Pool et al., 2018;

Yilmaz & Bozo, 2019). Five studies calculated test– retest reliability using the intra- correlation coefficient (ICC). Their ICC varied from 0.50– 0.97.

Three studies used Pearson's (r) or Spearman's rho (ƍ) test (Dafaalla et al., 2016; Giangrasso & Casale, 2014; Mahmud et al., 2004). Of these three studies, one further calculated Kappa for the correlation between two parallel forms and found a satisfactory result (Mahmud et al., 2004).

The interval time between the two tests ranged from a week to a year.

3.4 | The quality of psychometric properties for MOS- SSS versions

Table 4 describes findings on the quality of psychometric properties for different MOS- SSS versions based on four evaluation criteria