This event will be via Zoom, and you need to sign up using Eventbrite https://www.eventbrite.co.uk/e/visualizing-the-scale-complexity-of-data-quality-tickets-127981589379
24th November 2020, 12:00-13:00 (online)
Abstract: Descriptive statistics are typically presented as text, but that quickly becomes overwhelming when datasets contain many variables or analysts need to compare multiple datasets. In this seminar, I will describe visualization designs for three categories of descriptive statistic (cardinalities, distributions and patterns), which scale to more than 100 variables and use multiple channels to encode important semantic differences (e.g., zero vs. 1+ missing values). I will also describe a novel tool, which exploits set visualization techniques to allow users to explain patterns of missing values that involve many fields. The visualizations were evaluated using large (multi-million record) datasets of electronic health records (EHRs), and provided users with a variety of important insights.
Bio: Roy Ruddle is a Professor of Computing at the University of Leeds, and Deputy Director (Research Technology) of the Leeds Institute for Data Analytics (LIDA). He has worked in both academia and industry, and researches visualization, visual analytics and human-computer interaction in spaces that range from high-dimensional data to virtual reality. In a 12-year collaboration with pathologists at the Leeds Teaching Hospitals NHS Trust (LTHT), he developed the Leeds Virtual Microscope (LVM) for visualizing tera-pixel image collections on Powerwall and ultra-high definition displays, leading to its use for pathology training in NHS hospitals and commercialisation by Roche. (from https://www.turing.ac.uk/people/researchers/roy-ruddle)