Gatherplot: A Non-Overlapping Scatterplot

Main Article Content

Deokgun Park
Sung-Hee Kim
Niklas Elmqvist
https://orcid.org/0000-0001-5805-5301

Abstract






Introduction. Scatterplots are a common tool for exploring multidimensional datasets, especially in the form of scatterplot matrices (SPLOMs). However, scatterplots suffer from overplotting when categorical variables are mapped to one or two axes, or the same continuous variable is used for both axes. Previous methods such as histograms or violin plots use aggregation, which makes brushing and linking difficult.


Conclusion. We propose gatherplots, an extension of scatterplots designed to manage the overplotting problem. Gatherplots are a form of unit visualization, which avoid aggregation and maintain the identity of individual objects to ease visual perception. In gatherplots, every visual mark that maps to the same position coalesces to form a packed entity, thereby making it easier to see the overview of data groupings. The size and aspect ratio of marks can also be changed dynamically to make it easier to compare the composition of different groups. In the case of a categorical variable vs. a categorical variable, we propose a heuristic to decide bin sizes for optimal space usage. Results from a crowdsourced user study show that gatherplots enable people to assess data distribution more correctly than when using jittered scatterplots.


Materials. Source code for Gatherplots can be found at https://github.com/intuinno/gatherplot. Research materials associated with the crowdsourced user study can be found on OSF at https://osf.io/bk9cx/.


Data Collection. We conducted a crowdsourced user study on Amazon Mechanical Turk involving participants drawn from the general population. We collected completion time, accuracy, and confidence for five different retrieval, ranking, and comparison tasks under four conditions: scatterplots with jittering, gatherplots with absolute mode, gatherplots with normalized mode, and gatherplots with a toggle to switch between absolute and normalized mode.


Data Analysis. Data collected from the crowdsourced survey were analyzed with respect to the accuracy (correct or incorrect), time spent, and confidence of estimation. Based on our hypotheses, we analyzed the different modes of layout for each type of question: retrieve value, absolute value task, and relative value task.


Analysis Results. Gatherplots outperform jittering for accuracy as well as for the subjective confidence measure.


Implementation. We implemented a web-based demonstration of Gatherplots using D3 as well as using Observable JS in this article.


Demonstration. Beyond the embedded demonstration in this article, you can also find a live demo of Gatherplots at https://www.journalovi.org/2023-park-gatherplots/.






Article Details

How to Cite
[1]
Park, D. et al. 2024. Gatherplot: A Non-Overlapping Scatterplot. Journal of Visualization and Interaction. 1, 1 (Nov. 2024). DOI:https://doi.org/10.54337/jovi.v1i1.8540.
Section
Articles

References

Bederson, Benjamin B, Ben Shneiderman, and Martin Wattenberg. 2002. “Ordered and Quantum Treemaps: Making Effective Use of 2D Space to Display Hierarchies.” ACM Transactions on Graphics 21 (4): 833–54. https://doi.org/10.1145/571647.571649.

Bertin, Jacques. 1983. Semiology of Graphics. Madison, WI, USA: University of Wisconsin Press.

Bezerianos, Anastasia, Fanny Chevalier, Pierre Dragicevic, Niklas Elmqvist, and Jean-Daniel Fekete. 2010. “GraphDice: A System for Exploring Multivariate Social Networks.” Computer Graphics Forum 29 (3): 863–72. https://doi.org/10.1111/j.1467-8659.2009.01687.x.

Chen, Helen, Sophie Engle, Alark Joshi, Eric D. Ragan, Beste F. Yuksel, and Lane Harrison. 2018. “Using Animation to Alleviate Overdraw in Multiclass Scatterplot Matrices.” In Proceedings of the ACM Conference on Human Factors in Computing Systems, 417:1–12. ACM. https://doi.org/10.1145/3173574.

Chernoff, Herman. 1973. “The Use of Faces to Represent Points in k-Dimensional Space Graphically.” Journal of the American Statistical Association 68 (342): 361–68. https://doi.org/10.2307/2284077.

Cleveland, William S., and Marylyn E. McGill. 1988. Dynamic Graphics for Statistics. Belmont, CA, USA: Wadsworth & Brooks/Cole.

Cleveland, William S., and R. McGill. 1985. “Graphical Perception and Graphical Methods for Analyzing Scientific Data.” Science 229 (4716): 828–33. https://doi.org/10.1126/science.229.4716.828.

Cosmides, Leda, and John Tooby. 1996. “Are Humans Good Intuitive Statisticians After All? Rethinking Some Conclusions from the Literature on Judgment Under Uncertainty.” Cognition 58 (1): 1–73. https://doi.org/10.1016/0010-0277(95)00664-8.

Dang, Tuan Nhon, Leland Wilkinson, and Anushka Anand. 2010. “Stacking Graphic Elements to Avoid over-Plotting.” IEEE Transactions on Visualization and Computer Graphics 16 (6): 1044–52. https://doi.org/10.1109/TVCG.2010.197.

Dix, Alan, and Geoffrey Ellis. 2002. “By Chance - Enhancing Interaction with Large Data Sets Through Statistical Sampling.” In Proceedings of the ACM Conference on Advanced Visual Interfaces, 167–76. New York, NY, USA: ACM. https://doi.org/10.1145/1556262.1556289.

Eklund, Aron, and James Trimble. 2021. Beeswarm: The Bee Swarm Plot, an Alternative to Stripchart. https://cran.r-project.org/package=beeswarm.

Ellis, Geoffrey, and Alan Dix. 2007. “A Taxonomy of Clutter Reduction for Information Visualisation.” IEEE Transactions on Visualization and Computer Graphics 13 (6): 1216–23. https://doi.org/10.1109/TVCG.2007.70535.

Elmqvist, Niklas, Pierre Dragicevic, and Jean-Daniel Fekete. 2008. “Rolling the Dice: Multidimensional Visual Exploration Using Scatterplot Matrix Navigation.” IEEE Transactions on Visualization and Computer Graphics 14 (6): 1539–1148. https://doi.org/10.1109/TVCG.2008.153.

Elmqvist, Niklas, and Jean-Daniel Fekete. 2010. “Hierarchical Aggregation for Information Visualization: Overview, Techniques and Design Guidelines.” IEEE Transactions on Visualization and Computer Graphics 16 (3): 439–54. https://doi.org/10.1109/TVCG.2009.84.

Fekete, Jean-Daniel, and Catherine Plaisant. 2002. “Interactive Information Visualization of a Million Items.” In Proceedings of the IEEE Symposium on Information Visualization, 117–24. Los Alamitos, CA, USA: IEEE Computer Society. https://doi.org/10.1109/INFVIS.2002.1173156.

Fua, Ying-Huey, Matthew O. Ward, and Elke A. Rundensteiner. 1999. “Hierarchical Parallel Coordinates for Exploration of Large Datasets.” In Proceedings of the IEEE Conference on Visualization, 43–50. https://doi.org/10.1109/VISUAL.1999.809866.

Haroz, Steve, and David Whitney. 2012. “How Capacity Limits of Attention Influence Information Visualization Effectiveness.” IEEE Transactions on Visualization and Computer Graphics 18 (12): 2402–10. https://doi.org/10.1109/TVCG.2012.233.

Havre, Susan, Beth Hetzler, and Lucy Nowell. 2000. “ThemeRiver: Visualizing Theme Changes over Time.” In Proceedings of the IEEE Symposium on Information Visualization, 115–23. https://doi.org/10.1109/INFVIS.2000.885098.

Hofmann, Heike, Arno P. J. M. Siebes, and Adalbert F. X. Wilhelm. 2000. “Visualizing Association Rules with Interactive Mosaic Plots.” In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining, 227–35. https://doi.org/10.1145/347090.347133.

Hoque, Md. Naimul, and Niklas Elmqvist. 2024. “Dataopsy: Scalable and Fluid Visual Exploration Using Aggregate Query Sculpting.” IEEE Transactions on Visualization and Computer Graphics 30 (1): 186–96. https://doi.org/10.1109/TVCG.2023.3326594.

Huron, Samuel, Romain Vuillemot, and Jean-Daniel Fekete. 2013. “Visual Sedimentation.” IEEE Transactions on Visualization and Computer Graphics 19 (12): 2446–55. https://doi.org/10.1109/TVCG.2013.227.

Im, Jean-François, Michael J. McGuffin, and Rock Leung. 2013. “GPLOM: The Generalized Plot Matrix for Visualizing Multidimensional Multivariate Data.” IEEE Transactions on Visualization and Computer Graphics 19 (12): 2606–14. https://doi.org/10.1109/TVCG.2013.160.

Keim, Daniel A., Ming C. Hao, Umeshwar Dayal, Halldor Janetzko, and Peter Bak. 2010. “Generalized Scatter Plots.” Information Visualization 9 (4): 301–11. https://doi.org/10.1057/ivs.2009.34.

Kosara, Robert, Fabian Bendix, and Helwig Hauser. 2006. “Parallel Sets: Interactive Exploration and Visual Analysis of Categorical Data.” IEEE Transactions on Visualization and Computer Graphics 12 (4): 558–68. https://doi.org/10.1109/TVCG.2006.76.

Mayorga, A., and Michael Gleicher. 2013. “Splatterplots: Overcoming Overdraw in Scatter Plots.” IEEE Transactions on Visualization and Computer Graphics 19 (9). https://doi.org/10.1109/TVCG.2013.65.

McDonnel, Bryan, and Niklas Elmqvist. 2009. “Towards Utilizing GPUs in Information Visualization: A Model and Implementation of Image-Space Operations.” IEEE Transactions on Visualization and Computer Graphics 15 (6): 1105–12. https://doi.org/10.1109/TVCG.2009.191.

Micallef, Luanna, Pierre Dragicevic, and Jean-Daniel Fekete. 2012. “Assessing the Effect of Visualizations on Bayesian Reasoning Through Crowdsourcing.” IEEE Transactions on Visualization and Computer Graphics 18 (12): 2536–45. https://doi.org/10.1109/TVCG.2012.199.

Microsoft Research. 2011. “SandDance.” https://www.microsoft.com/en-us/research/project/sanddance/.

Munzner, Tamara, François Guimbretière, Serdar Tasiran, Li Zhang, and Yunhong Zhou. 2003. “TreeJuxtaposer: Scalable Tree Comparison Using Focus+Context with Guaranteed Visibility.” ACM Transactions on Graphics 22 (3): 453–62. https://doi.org/10.1145/1201775.882291.

Paolacci, Gabriele, Jesse Chandler, and Panagiotis Ipeirotis. 2010. “Running Experiments on Amazon Mechanical Turk.” Judgment and Decision Making 5 (5): 411–19. https://doi.org/10.1017/S1930297500002205.

Park, Deok Gun, Steven M. Drucker, Roland Fernandez, and Niklas Elmqvist. 2018. “ATOM: A Grammar for Unit Visualization.” IEEE Transactions on Visualization and Computer Graphics 24 (12): 3032–43. https://doi.org/10.1109/TVCG.2017.2785807.

Rzeszotarski, Jeffrey M., and Aniket Kittur. 2014. “Kinetica: Naturalistic Multi-Touch Data Visualization.” In Proceedings of the ACM Conference on Human Factors in Computing Systems, 897–906. ACM. https://doi.org/10.1145/2556288.2557231.

Sarikaya, Alper, and Michael Gleicher. 2018. “Scatterplots: Tasks, Data, and Designs.” IEEE Transactions on Visualization and Computer Graphics 24 (1): 402–12. https://doi.org/10.1109/TVCG.2017.2744184.

Shneiderman, Ben. 1996. “The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations.” In Proceedings of the IEEE Symposium on Visual Languages, 336–43. https://doi.org/10.1109/VL.1996.545307.

Shneiderman, Ben, David Feldman, Anne Rose, and Xavier Ferré Grau. 2000. “Visualizing Digital Library Search Results with Categorical and Hierarchical Axes.” In Proceedings of the ACM Conference on Digital Libraries, 57--66. https://doi.org/10.1145/336597.336637.

Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis. Boca Raton, FL, USA: Chapman & Hall/CRC.

Stevens, S. S. 1946. “On the Theory of Scales of Measurement.” Science 103 (2684): 677–80. https://doi.org/10.1126/science.103.2684.677.

Trutschl, Marjan, Georges Grinstein, and Urska Cvek. 2003. “Intelligently Resolving Point Occlusion.” In Proceedings of IEEE Symposium on Information Visualization, 131–36. https://doi.org/10.1109/INFVIS.2003.1249018.

Tufte, Edward R. 1983. The Visual Display of Quantitative Information. Cheshire, CT, USA: Graphics Press.

Utts, Jessica M. 1996. Seeing Through Statistics. London, UK: Duxbury Press.

Waskom, Michael L. 2021. “Seaborn: Statistical Data Visualization.” Journal of Open Source Software 6 (60). https://doi.org/10.21105/joss.03021.

Wickham, Hadley, and Lisa Stryjewski. 2011. “40 Years of Boxplots.” American Statistician.

Wilkinson, Leland. 1999. “Dot Plots.” The American Statistician 53 (3): 276–81. https://doi.org/10.1080/00031305.1999.10474474.

Willett, Wesley, Shiry Ginosar, Avital Steinitz, Björn Hartmann, and Maneesh Agrawala. 2013. “Identifying Redundancy and Exposing Provenance in Crowdsourced Data Analysis.” IEEE Transactions on Visualization and Computer Graphics 19 (12): 2198–2206. https://doi.org/10.1109/TVCG.2013.164.

Yi, Ji Soo, Rachel Melton, John T. Stasko, and Julie A. Jacko. 2005. “Dust & Magnet: Multivariate Information Visualization Using a Magnet Metaphor.” Information Visualization 4 (3): 239–56. https://doi.org/10.1057/PALGRAVE.IVS.9500099.

Zhai, Shumin, William Buxton, and Paul Milgram. 1996. “The Partial-Occlusion Effect: Utilizing Semitransparency in 3D Human-Computer Interaction.” ACM Transactions on Computer-Human Interaction 3 (3): 254–84. https://doi.org/10.1145/234526.234532.