A Scoping Review of Rankings, Smileys, and Other Survey Item Formats


  • Jacob Brauner Copenhagen Center for Arthritis Research, Center for Rheumatology and Spine Diseases, Rigshospitalet, Glostrup, Denmark https://orcid.org/0000-0002-6533-6336




Technological development has allowed researchers to apply numerous item formats in web-based surveys. A growing body of research suggests that the use of formats for web and paper other than multiple choice, such as ranking, sorting, questions with pictures e.g., may offer relevant alternatives that can strengthen data quality. These formats are referred to as Innovative Item Formats (IIF). Existing literature in the field is not able to present a systematic overview and functional typology of IIFs and their impact on data quality. Therefore, a review of the research is needed for each IIF.

This review is designed with the purpose of covering which typers of IIF that exist and what type of evidence there is about data quality on these IIFs. Based on a scoping review, this article presents the existing research literature on specific IIFs. A total of 62 research articles with data from 89,365 participants were identified. A more extensive typification of IIFs than previously used, one that includes a total of 23 IIFs and 13 subcategories, is suggested. Researchers designing questionnaires can use this knowledge to obtain higher-quality data.


Alwin, Duane F. and Jon A. Krosnick. 1985. “The measurement of values in surveys: A comparison of ratings and rankings”. Public Opinion Quarterly 49: 535-552. doi: 10.1086/268949

American Educational Research Association. 2014. Standards for Educational and Psychological Testing. Washington DC, Washington: American Educational Research Association.

Armstrong, R., Hall, B. J., Doyle, J., & Waters, E. (2011). “'Scoping the scope’ of a Cochrane review”. Journal of Public Health, 33(1):147-150. https://doi.org/10.1093/pubmed/fdr015

Balram, Shivanand and Suzana Dragićević. 2005. “Attitudes toward urban green spaces: integrating questionnaire survey and collaborative GIS techniques to improve attitude measurements”. Landscape and Urban Planning 71(2-4):147-162. doi: 10.1016/j.landurbplan.2004.02.007

Bell, Douglas S., Carol M. Mangione, and Charles E. Kahn. 2001. “Randomized testing of alternative survey formats using anonymous volunteers on the world wide web”. Journal of the American Medical Informatics Association 8:616-620. doi: 10.1136/jamia.2001.0080616

Bennett, Randy E., William C. Ward, Donald A. Rock, and Colleen LaHart. 1990. “Toward a framework for constructed response items.” ETS Research Report Series 1990:1-66. doi: 10.1002/j.2333-8504.1990.tb01348.x

Bennett, Randy E. and Marc M. Sebrechts. 1997. “A Computer-Based Task for Measuring the Representational Component of Quantitative Proficiency.” Journal of Educational Measurement 34(1):64-77. doi: 10.1111/j.1745-3984. 1997.tb00507.x

Bennett, Randy E., Mary Morley, Dennis Quardt, Donald A. Rock, Mark K. Singley, Irvin R. Katz, and Adisack Nhouyvanisvong. 1999. “Psychometric and cognitive functioning of an under-determined computer-based response type for quantitative reasoning.” Journal of Educational Measurement 36:233-252. doi: 10.1111/j.1745-3984.1999.tb00556.x

Bennett, Randy E., Mary Morley, Dennis Quardt, and Donald A. Rock. 2000. “Graphical modeling: A new response type for measuring the qualitative component of mathematical reasoning.” Applied Measurement in Education 13:303-322.

Berg, Irwin A. and Gerald M. Rapaport. 1954. “Response bias in an unstructured questionnaire”. The Journal of Psychology 38:475-481. doi: 10.1080/00223980. 1954.9712954

Bird, Chris M., Kyriaki Papadopoulou, POL Ricciardelli, Martin N. Rossor, and Lisa Cipolotti. 2004. “Monitoring cognitive changes. Psychometric properties of six cognitive tests.” British Journal of Clinical Psychology 43:197-210. doi: 10.1348/014466504323088051

Björkstén, M. G., B. Boquist, M. Talbäck, and C. Edling. 1999. “The validity of reported musculoskeletal problems. A study of questionnaire answers in relation to diagnosed disorders and perception of pain.” Applied Ergonomics 30(4):325-330. doi: 10.1016/s0003-6870(98)00033-7

Borkenhagen, Ada, Burghard F. Klapp, Frank Schoeneich, and Elmar Brähler. 2005. “Differences in body image between anorexics and in-vitro-fertilization patients - a study with Body Grid.” Psychosoc Med 2:1-9.

Bosch, Oriol J and Melanie Revilla. 2020. “Using emojis in mobile web surveys for Millenials? A study in Spain and Mexico.” Quality and Quantity 55:39-61.

Broekens, Joost and Willem-Paul Brinkman.(2013. ”AffectButton: A method for reliable and valid affective self-report.” Human-Computer Studies 71(6):641-667.

Buchanan, H. & N. Niven. 2002. “Validation of a facial image scale to assess child dental anxiety.” International Journal of Paediatric Dentistry 12:47-52.

Buchanan, Heather and N. Niven. 2003. “Further evidence for the validity of the Facial Image Scale.” International Journal of Paediatric Dentistry 13:368-369. doi: 10.1046/j.1365-263X.2003.00488.x

Buchanan, H. 2005. “Development of a computerised dental anxiety scale for children: validation and reliability.” British Dental Journal 199:359-362. doi: 10.1038/sj.bdj.4812694

Callegaro, Mario, Jeffrey Shand-Lubbers, and J. Michael Dennis. 2009. Presentation of a Single Item versus a Grid: Effects on the Vitality and Mental Health Scales of the SF36v2 Health Survey. Retrieved June 1, 2022 (https://www.researchgate.net/publication/253454379_Presentation_of_a_Single_Item_versus_a_Grid_Effects_on_the_Vitality_and_Mental_Health_Scales_of_the_SF36v2_Health_Survey)

Castle, Nicholas G. and John Engberg. 2004. “Response Formats and Satisfaction Surveys for Elders.” The Gerontologist 44(3):358-367. DOI: 10.1093/geront/44.3.358

Caudery, Tim. 1990. “The Validity of timed essay tests in the assessment of writing skills.” ELT Journal 44:122-131.

Chesnut, John. 2008. “Effects of using a grid versus a sequential form of the ACS basic demographic data.” Retrieved June 1, 2022 (https://www.census.gov/content/dam/Census/library/working-papers/2008/acs/2008_Chesnut_01.pdf)

Chiarotto, Alessandro, Lara J. Maxwell, Raymond W. Ostelo, Maarten Boers, Peter Tugwell, and Caroline B. Terwee. 2019. “Measurement Properties of Visual Analogue Scale, Numeric Rating Scale, and Pain Severity Subscale of the Brief Pain Inventory in Patients With Low Back Pain: A Systematic Review.” The Journal of Pain 20(3):245-263. doi: 10.1016/j.jpain.2018.07.009

Chyung, Seung Y., Megan Kennedy, and Ingrid Campbell. 2018. “Evidence-based survey design: The use of ascending or descending order of response options”. Performance Improvement Journal 57(9):9-16. doi: 10.1002/pfi.21800

Chyung, Seung Y., Ieva Swanson, Katherine Roberts, and Andrea Hankinson. 2018 II. “Evidence-based survey design: The use of continuous rating scales in surveys.” Performance Improvement Journal 57(5):38-48. doi: 10.1002/pfi.21763

Conrad, Frederick G., Mick P. Couper, Roger Tourangeau, and Mirta Galesic. 2005. “Interactive feedback can improve the quality of responses in Web surveys”. Retrieved June 1, 2022


Couper, Mick P., Michael W. Traugott, and Mark J. Lamias. 2001. “Web survey design and administration.” Public Opinion Quarterly, 65, pp. 230-253. DOI: 10.1086/322199

Couper, Mick P., Frederick G. Conrad and Roger Tourangeau. 2002. “Visual Context Effects in Web Surveys.” Online Social Sciences. Batinic, B., U. D. Reips, and M. Bosnjak, Seattle: Hogrefe & Huber.

Couper, Mick P, Roger Tourangeau, and Kristin Kenyon. 2004. “Picture This! Exploring Visual Effects in Web Surveys.” The Public Opinion Quarterly 68(2):255-266. doi: 10.1093/poq/nfh013

Couper, Mick P., Roger Tourangeau, Frederick G. Conrad, and Chan Zhang. 2012. “The design of grids in web surveys.” Social Science Computer Review 31(3):322-345. doi: 10.1177/0894439312469865

Crabtree, Ashleigh R. 2016. “Psychometric properties of technology-enhanced item formats: an evaluation of construct validity and technical characteristics.” Retrieved June 1, 2022 (https://iro.uiowa.edu/esploro/outputs/doctoral/Psychometric-properties-of-technology-enhanced-item-formats/9983777220902771) doi: 10.17077/etd.922fbj4d

Davies, Julie and Ivy Brember. 2006. “The Reliability and Validity of the 'Smiley' Scale”. British Educational Research Journal 20(4):447-454. DOI: 10.1080/0141192940200406

Derham, Philip A. J. 2011. “Using preferred, understood or effective scales? How scale presentations affect online survey data collection.” Australasian Journal of Market & Social Research 19(2):13-26.

Desmet, Pieter M. A. 2005. “Measuring emotions: development and application of an instrument to measure emotional responses to products.” Pp. 111-124 in Funology: from Usability to Enjoyment, edited by M.A. Blythe and A. Monk. Switzerland: Springer, Cham.

Dolan, Robert P., Joshua Goodman, Ellen Strain-Seymour, Jeremy Adams, and Sheela Sethuraman. 2011. “Cognitive lab evaluation of innovative items in mathematics and English/language arts assessment of elementary, middle, and high school students: Research Report”. Retrieved June 1, 2022 (http://www.pearsonassessments. com/hai/images/tmrs/Cognitive_Lab_Evaluation_of_Innovative_Items. pdf).

Downing, Steven M., and Thomas M. Haladyna. 2006. Handbook of Test Development. Mahwah, NJ: L. Erlbaum.

Elliott, Jacquelyn, Steven W. Lee, and Nona Tollefson. 2001. “A reliability and validity study of the dynamic indicators of basic early literacy skills—modified”. School Psychology Review 30:33. doi: 10.1080/02796015.2001.12086099

Elliot, Statia and Nicolas Papadopoulos. 2012. “Beyond Tourism Destination Image: Mapping country image from a psychological perspective” Retrieved November 12, 2021 (https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1722&context=ttra)

Emde, Matthias and Marek Fuchs. 2012. “Exploring Animated Faces Scales in Web Surveys: Drawbacks and Prospects.” Survey Practice 5(1):1-6. DOI: 10.29115/SP-2012-0006

Escalante, A., M. J. Lichtenstein, K. White, N. Rios, and H. P. Hazuda. 1995. “A method for scoring the pain map of the McGill pain questionnaire for use in epidemiologic studies.” Aging Clinical and Experimental Research 7:358-366. doi: 10.1007/BF03324346

Fisher, Ronald (1926). ”The Arrangement of Field Experiments.” Journal of the Ministry of Agriculture of Great Britain 33:503-513. doi: 10.23637/rothamsted.8v61q

Freyd, M. 1923. “The graphic rating scale.” Journal of Educational Psychology 14:83-102. doi: 10.1037/h0074329

Funke, Frederik and Ulf-Dietrich Reips. 2006. “Visual Analogue Scales in Online Surveys: Non-Linear Data Categorization by Transformation with Reduced Extremes.” Retrieved June 1, 2022 (http://www.frederikfunke.de/papers/Funke%20&%20Reips%20-%20VAS%20-%20GOR06.pdf)

Galesic, M., R. Tourangeau, M. P. Couper, and F. G. Conrad. 2007. “Using change to improve navigation in grid questions.” Leipzig: General Online Research Conference (GOR’07).

Graybill, Daniel and Lorene R. Heuvelman. 1993. “Validity of the Children's Picture-Frustration Study: A Social-Cognitive Perspective.” Journal of Personality Assessment 60(2):379-389. doi: 10.1207/s15327752jpa6002_13

Gummer, T., Vogel, V., Kunz, T., & Roßmann, J. (2020). ”Let’s put a smile on that scale: Findings from three web survey experiments”. International Journal of Market Research, 62(1):18–26.

Guyatt G. H., D. L. Sackett, J. C. Sinclair, R. Hayward, D. J. Cook, R. J. Cook. 1995. “Users' guides to the medical literature. IX. A method for grading health care recommendations. Evidence-Based Medicine Working Group”. JAMA 274:1800–1804. doi:10.1001/jama.1995.03530220066035. PMID 7500513

Haladyna, Thomas M., Steven M. Downing, and Michael C. Rodriguez. 2010. “A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment.” Applied Measurement in Education 15:309-333. doi: 10.1207/S15324818AME1503_5

Hayes, M. H. S. and Patterson, D. G. 1921. “Experimental development of graphic rating method”. Psychological Bulletin 18:98-99.

Hofmans, Joeri and Peter Theuns. 2008. “On the linearity of predefined and self-anchoring visual analogue scales.” British Journal of Mathematical and Statistical Psychology 61:401-413. doi: 10.1348/000711007X206817

Holbrook, Allyson L., Jon A. Krosnick, David Moore, and Roger Tourangeau. 2005. “Response Order Effects In Dichotomous Categorical Questions Presented Orally: The Impact of Question and Respondent Attributes.” Public Opinion Quarterly 71(3):325-348. doi: 10.1093/poq/nfm024

Holbrook, Allyson, Jon Krosnick, and Roger Tourangeau. 2007. “Response order effects in dichotomous categorical questions presented orally.” Public Opinion Quarterly 71(3):325-348. doi: 10.1093/poq/nfm024

Huesman, Ronald L. 2000. “The Validity of ITBS Reading Comprehension Test Scores for Learning Disabled and Non Learning-Disabled Students under Extended-Time Conditions”. Annual Meeting of the American Educational Research Association.

Hu, Jingwei. (2019). “Horizontal or Vertical? The Effects of Visual Orientation of Categorical Response Options on Survey Responses in Web Surveys”. Social Science Computer Review, 779-792. doi: 10.1177/0894439319834296

Iglesias, C. P., Birks, Y. F. & Torgerson, D. J. 2001. “Improving the measurement of quality of life in older people: the York SF-12.” QJM, 94:695-698. doi: 10.1093/qjmed/94.12.695

Ijmker, S., J. Mikkers, B. M. Blatter, A. J. van der Beek, W. van Mechelen, and P. M. Bongers. 2008. “Test–retest reliability and concurrent validity of a web-based questionnaire measuring workstation and individual correlates of work postures during computer work.” Applied Ergonomics 39(6):685-696. doi: 10.1016/j.apergo.2007.12.003

Jelínek, Martin, Petr Květon, and Dalibor Vobořil. 2015. ”Innovative testing of spatial ability: interactive responding and the use of complex stimuli material.” Cognitive Processing 16(1):45-55.

Kaczmirek, Lars. 2008. “Human-Survey Interaction Usability and Nonresponse in Online Surveys”. Retrieved June 1, 2022 (https://d-nb.info/992375924/34)

Kaczmirek, L. (2011). “Attention and usability in internet surveys: Effects of visual feedback in grid questions”. Pp. 191-214 in Social and Behavioral Research and the Internet, edited by M. Das, P. Ester and L. Kaczmirek. Routledge.

Kersting, Nicole. 2008. “Using Video Clips of Mathematics Classroom Instruction as Item Prompts to

Measure Teachers' Knowledge of Teaching Mathematics.” Educational and Psychological Measurement 68(5):845-861. doi: 10.1177/0013164407313369

Knäuper, Bärbel. 1999. “The Impact of Age and Education on Response Order Effects in Attitude Measurement.” The Public Opinion Quarterly 63(3):347-370.

Krebs, Dagmar and Juergen H. P. Hoffmeyer-Zlotnik. 2010. “Positive First or Negative First? Effects of the Order of Answering Categories on Response Behavior.” Methodology: European Journal of Research Methods for the Behavioral and Social Sciences 6(3):118-127. doi: 10.1027/1614-2241/a000013

Krosnick, Jon A. and Duane F. Alwin. 1988. “A test of the form-resistant correlation hypothesis: Ratings, rankings, and the measurement of values.” Public Opinion Quarterly 52:526-538. doi: 10.1086/269128

Krosnick, Jon A. 1991. “Response strategies for coping with the cognitive demands of attitude measures in surveys.” Applied Cognitive Psychology 5:213-236. doi: 10.1002/acp.2350050305

Kunz, T. 2015. “Rating scales in web surveys: A test of new drag-and-drop rating procedures. Dissertation”. Retrieved October 16, 2021 (http://tuprints.ulb.tu-darmstadt. de/5151/7/Kunz_2015_Rating_scales_in_web_surveys.pdf).

Lam, Manuel Y., Hang Lee, Renee Bright, Joshua R. Korzenik, and Bruce, E. Sands. 2009. “Validation of Interactive Voice Response System Administration of the Short Inflammatory Bowel Disease Questionnaire”. Inflammatory Bowel Diseases 15(4):599-607. doi: 10.1002/ibd.20803

Lesaux, Nonie K., M. Rufina Pearson, and Linda S. Siegel. 2006. “The Effects of Timed and Untimed Testing Conditions on the Reading Comprehension Performance of Adults with Reading Disabilities.” Reading and Writing 19:21-48.

Leutner, Franziska, Adam Yearsley, Sonia-Cristina Codreanu, Yossi Borenstein, Gorkan Ahmetoglu. 2016. “From Likert scales to images: Validating a novel creativity measure with image based response scales.” Personality and Individual Differences 106:37-40. doi: 10.1016/j.paid.2016.10.007

Lim, En-Mi, Tsuyoshi, Honjo, Kiyoshi Umeki. 2006. “The validity of VRML images as a stimulus for landscape assessment.” Landscape and Urban Planning 77(1-2):80-93.

Liu, Mingman and Florian Keusch. 2017. “Effects of Scale Direction on Response Style of Ordinal Rating Scales.” Journal of Official Statistics 33(1):137-154. doi: 10.1515/jos-2017-0008

Louviere, Jordan J. and Towhidul Islam. 2006. “A comparison of importance weights and willingness-to-pay measu

res derived from choice-based conjoint, constant sum scales and best–worst scaling.” Journal of Business Research 61:903-911. doi: 10.1016/j.jbusres.2006.11.010

Lu, Ying and Stephen G. Sireci. 2007. “Validity Issues in Test Speededness”. Educational Measurement 26:29-37. doi: 10.1111/j.1745-3992.2007.00106.x

Maio, Gregory R., Neal J. Roese, Clive Seligman, and Albert Katz. 1996. ”Rankings, Ratings, and the Measurement of Values: Evidence for the Superior Validity of Ratings.” Journal of Basic and Applied Social Psychology 18(2):171-181. doi: 10.1207/s15324834basp1802_4

Martinez, Michael E. 1991. “A comparison of multiple-choice and constructed figural response items.” Journal of Educational Measurement 28:131-145. doi: 10.1111/j.1745-3984.1991.tb00349.x

McKelvie, Stuart J. 1978. “Graphic rating scales - how many categories.” British Journal of Psychology 69:185-202. doi: 10.1111/j.2044-8295.1978.tb01647.x

McReynolds, Paul and Klaus Ludwig. 1987. “On the history of rating scales”. Personality and Individual Differences 8:281-283. doi: 10.1016/0191-8869(87)90188-7

Medin, Anine C., Monica H. Carlsen, and Lene F. Andersen. 2016. “Associations between reported intakes of carotenoid-rich foods and concentrations of carotenoids in plasma: A validation study of a web-based food recall for children and adolescents.” Public Health Nutrition. 19:3265-3275. doi: 10.1017/S1368980016001622

Mills, C.Wright (1961). The sociological imagination. New York, NY: Grove Press.

Mullane, Jennifer and Stuart J. McKelvie. 2000. ”Effects of Removing the Time Limit on First and Second Language Intelligence Test Performance.” Practical Assessment, Research, and Evaluation 7(23):1-6. doi: 10.7275/ph8y-yz89

Munn Zachary, Micah D. J. Peters, Cindy Stern, Catalin Tufanaru, Alexa McArthur, and Edoardo Aromataris. 2018. “Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach.” BMC Medical Research Methodology 18:1-7. doi: 10.1186/s12874-018-0611-x

Murad, M. H., Asi, N., Alsawas, M., & Alahdab, F. (2016). New evidence pyramid. Evid Based Med. 21(4):125-127. doi: 10.1136/ebmed-2016-110401

van Ooijen, P.M.A., A. Broekema, and M. Oudkerk. 2011. “Design and implementation of I2Vote – An interactive image-based voting system using windows mobile devices.” International Journal of Medical Informatics 80(8):562-569. doi: 10.1016/j.ijmedinf.2011.05.002

Rankin, William L. and Joel W. Grube. 1980. “A comparison of ranking and rating procedures for value system measurement.” European Journal of Social Psychology 10(3):233-246. doi: 10.1002/ejsp.2420100303

Reynolds-Keefer, L., and Johnson, R. (2011). “Is a Picture Worth a Thousand Words? Creating Effective Questionnaires with Pictures”. Practical Assessment, Research & Evaluation, 16(8):1-7.

Rofé, Yodan. 2004. “Mapping the sense of well being in a neighborhood: survey technique, and analysis of agreement and variation”. Retrieved April 26, 2021



Scalise, Kathleen and Bernard Gifford. 2006. “Computer-Based Assessment in E-Learning: A Framework for Constructing ‘Intermediate Constraint’ Questions and Tasks for Technology Platforms.” The Journal of Technology, Learning and Assessment 4(6).

Scherpenzeel, Annette and Willem Saris. 1997. “The Validity and Reliability of Survey Questions - A Meta-Analysis of MTMM Studies”. Sociological Methods & Research 25:341-383. doi: 10.1177/0049124197025003004

Schubert, Thomas and Sabine Otten. 2002. “Overlap of Self, Ingroup, and Outgroup: Pictorial Measures of Self-Categorization.” Self and Identity 1:353-376. doi: 1529-8868/2002

Schwarz, Norbert, Carla E. Grayson, and Barbel Knäuper. (1998). “Formal Features of Rating Scales and the Interpretation of Question Meaning”. International Journal of Public Opinion Research, 10:177–183.

Shamir, Boas and Ronit Kark. 2004. “A single-item graphic scale for the measurement of organizational identification.” Journal of Occupational and Organizational Psychology 77(1):115-123. doi: 10.1348/096317904322915946

Shulman, K. I. 2000. “Clock-drawing: is it the ideal cognitive screening test?” International Journal of Geriatric Psychiatry 15:548-561. doi: 10.1002/1099-1166(200006)15:6<548::aid-gps242>3.0.co;2-u

Sikkel, Dirk, Reinder Steenbergen, and Stoerd, Gras. 2014. ”Clicking vs. dragging: Different uses of the mouse and their implications for online surveys.” Public Opinion Quarterly 78:177-190. doi: 10.1093/poq/nft077

Sinadinovic Kristina, Peter Wennberg, and Anne H. Berman. 2011. “Population screening of risky alcohol and drug use via Internet and Interactive Voice Response (IVR): a feasibility and psychometric study in a random sample.” Drug and Alcohol Dependence 114:55-60. doi: 10.1016/j.drugalcdep.2010.09.004

Sireci, Stephen G. & Zenisky, April L. (2006). “Innovative Item Formats.” Pp. 329-348 in Handbook of Test Development, edited by Steven M. Downing and Thomas M. Haladyna. London, UK: Lawrence Erlbaums Associates. Retrieved October 1 2021 (https://fatihegitim.files.wordpress.com/2014/03/hndb-t-devt.pdf)

Skedgel, Chris D., Allan J. Wailoo, Ron L. Akehurst. 2013. “Choosing vs. allocating: discrete choice experiments and constant-sum paired comparisons for the elicitation of societal preferences.” Health Expectations 18:1-14, doi: 10.1111/hex.12098

Stutts, Jane C., J. Richard Stewart, and Carol Martell. 1998. “Cognitive test performance and crash risk in an older driver population.” Accident Analysis & Prevention 30(3):337-346. doi: 10.1016/s0001-4575(97)00108-5

Svalastoga, Kaare. 1959. Prestige, class and mobility. New York, NY: Arno.

Sørensen, J.L., L. Thellesen, J. Strandbygaard, K. D. Svendsen, K. B. Christensen, M. Johansen, P. Langhoff-Roos, K. Ekelund, B. Ottesen, and C. van der Vleuten. 2014. ”Development of knowledge tests for multi-disciplinary emergency training: a review and an example.” Acta Anaesthesiologica Scandinavica 59:123-33. doi: 10.1111/aas.12428

Thorndike, Frances P., Per Carlbring, Frederick L. Smyth, Joshua C. Magee, Linda Gonder-Frederick, Lars-Göran Ost, and Lee M. Ritterband. 2009. “Web-based measurement: Effect of completing single or multiple items per webpage”. Computers in Human Behavior 25:393-401.

Timbrook, Jerry P. 2013. “A Comparison of a Traditional Ranking Format to a Drag-and-Drop Format with Stacking.” Retrieved June 1, 2022 (http:// rave. ohiolink.edu/etdc/view?acc_num=dayton1367241685)

Timbrook, Jerry and William F. Moroney. 2016. “Ranking: Perceptions of Tied Ranks and Equal Intervals on a Modified Visual Analog Scale”. Retrieved June 1, 2022 (https://ecommons.udayton.edu/psy_fac_pub/26)

Toepoel, V., Vermeeren, B., & Metin, B. (2019). ”Smileys, stars, hearts, buttons, tiles or grids: Influence of response format on substantive response, questionnaire experience and response time”. Bulletin of Sociological Methodology, 142:57–74.

Tourangeau, R., M. P. Couper, and F. Conrad. 2004. “Spacing, Positioning, and Order. Interpretative Heuristics for Visual Features of Survey Questions.” Public Opinion

Quarterly 68:368-393. doi: 10.1093/poq/nfh035

Tricco, A. C., Lillie, E., Zarin, W., O’Brien, K., Colquhoun, H., Kastner, M., Levac, D., Ng, C., Sharpe, J. P., Wilson, K., Kenny, M., Warren, R., Wilson, C., Stelfox, H. T., & Straus, S. E. (2016). A scoping review on the conduct and reporting of scoping reviews. BMC Medical Research Methodology, 16(15). https://doi.org/10.1186/s12874-016-0116-4

Turvey, Carolyn, Tom Sheeran, Lilian Dindo, Bonnie Wakefield, and Dawn Klein. 2012. “Validity of the Patient Health Questionnaire, PHQ-9, administered through interactive-voice-response technology”. Journal of Telemedicine and Telecare 18:348-351. doi: 10.1258/jtt.2012.120220

University Libraries Health Science Library, “Scoping Reviews: Step 3: Conduct Literature Searches”, accessed Nov 28th, 2023: https://guides.lib.unc.edu/scoping-reviews/search#s-lg-box-29819393

Voyer, Daniel. 2011. “Time limits and gender differences on paper-and-pencil tests of mental rotation: a meta-analysis.” Psychonomic Bulletin & Review 18:267-277. doi: 10.3758/s13423-010-0042-0

Wall, Eric J., Matthew D. Milewski, James L. Carey, Kevin G. Shea, Theodore J. Ganley, John D. Polousky, Nathan L. Grimm, Emily A. Eismann, Jake C. Jacobs, Lucas Murnaghan, Carl W. Nissen, Gregory D. Myer. 2017. “The reliability of assessing radiographic healing of osteochondritis dissecans of the knee.” The American Journal of Sports Medicine 45(6):1370-1375. doi: 10.1177/0363546517698933

Waller, Rosemary, Peter Manuel, and Lyn Williamson. 2012. “The Swindon Foot and Ankle Questionnaire: Is a Picture Worth a Thousand Words?” International Scholarly Research Network 1:1-8. doi: 10.5402/2012/105479

Wan, Lei and George A. Henly. 2012. “Measurement Properties of Two Innovative Item Formats in a Computer-Based Test” Applied Measurement in Education 25(1):58-78.

Weaver, Susan M. 1993. “The Validity of the use of extended and untimed testing for postsecondary students with learning disabilities.” Dissertation Abstracts International 55.

Wewers, M. E. and N. K. Lowe. 1990. “A critical review of visual analogue scales in the measurement of clinical phenomena.” Researh in Nursing & Health 13:227-236. doi: 10.1002/nur.4770130405

Wong, John K. and R. Kenneth Teas. 2001. “A test of the stability of retail store image mapping based on multientity scaling data.” Journal of Retailing and Consumer Services 8(2):61-70.

Xia, Wei, Caihong Sun, Li Zhang, Xin Zhang, Jiajia Wang, Hui Wang, and Lijie Wu. 2011. “Reproducibility and Relative Validity of a Food Frequency Questionnaire Developed for Female Adolescents in Suihua, North China”. PLOS ONE 6(5). doi: 10.1371/journal.pone.0019656

Yan, Ting and Florian Keusch. 2015. “The Effects of the Direction of Rating Scales on Survey Responses in a Telephone Survey”. Public Opin Quarterly 79(1):145-165. doi: 10.1093/poq/nfu062

Yost, Kathleen J., Kimberly Webster, David W. Baker, Seung W. Choi, Rita K. Bode, Elizabeth A. Hahn. 2009. “Bilingual health literacy assessment using the Talking Touchscreen/la Pantella Parlanchina: development and pilot testing.” Patient Education and Counseling 75:295-301. doi: 10.1016/j.pec.2009.02.020

Zenisky, April L. and Stephen G. Sireci. 2002. “Technological Innovations in Large-Scale Assessment”. Applied Measurement in Education 15(4):337-362. doi: 10.1207/S15324818AME1504_02

Zheng, Ying. 2011. “Research Note: Establishing Construct and Concurrent Validity of Pearson Test of English Academic”. Retrieved June 1, 2022 (https:// pdfs. semanticscholar.org/04e0/27c798140fb269bd43338bd972b2a0b3cabe.pdf).