SIGKDD Explorations Special Issue on Visual Analytics and Knowledge Discovery

The papers have been published in the December 2009 issue (Volume 11, Issue 2) of the SIGKDD Explorations.

Introduction to the Special Issue on Visual Analytics and Knowledge Discovery

Abstract:

The papers in this Special Issue present the state of the art in Visual Analytics and Knowledge Discovery and Data Mining (KDD), as well as propose potential extensions and research questions to further advance and integrate these two fields.

Invited paper

Visual Analytics: How Much Visualization and How Much Analytics?

Abstract:

The term Visual Analytics has been around for almost five years by now, but still there are on-going discussions about what it actually is and in particular what is new about it. The core of our view on Visual Analytics is the new enabling and accessible analytic reasoning interactions supported by the combination of automated and visual analysis. In this paper, we outline the scope of Visual Analytics using two problem and three methodological classes in order to work out the need for and purpose of Visual Analytics. By examples of analytic reasoning interaction, the respective advantages and disadvantages of automated and visual analysis methods are explained leading to a glimpse into the future of how Visual Analytics methods will enable us to go beyond what is possible when separately using the two methods.

Regular papers

Investigating and Reflecting on the Integration of Automatic Data Analysis and Visualization in Knowledge Discovery

Abstract:

The aim of this work is to survey and reflect on the various ways visualization and data mining can be integrated to achieve effective knowledge discovery by involving the best of human and machine capabilities. Following a bottom-up bibliographic research approach, the article categorizes the observed techniques in classes, highlighting current trends, gaps, and potential future directions for research. In particular it looks at strengths and weaknesses of information visualization (infovis) and data mining, and for which purposes researchers in infovis use data mining techniques and reversely how researchers in data mining employ infovis techniques. The article then proposes, on the basis of the extracted patterns, a series of potential extensions not found in literature. Finally, we use this information to analyze the discovery process by comparing the analysis steps from the perspective of information visualization and data mining. The comparison brings to light new perspectives on how mining and visualization can best employ human and machine strengths. This activity leads to a series of reflections and research questions that can help to further advance the science of visual analytics.

Interactive Cluster Analysis of Diverse Types of Spatiotemporal Data

Abstract:

We suggest an approach to exploratory analysis of diverse types of spatiotemporal data with the use of clustering and interactive visual displays. We can apply the same generic clustering algorithm to different types of data owing to the separation of the process of grouping objects from the process of computing distances between the objects. In particular, we apply the density-based clustering algorithm OPTICS to events (i.e. objects having spatial and temporal positions), trajectories of moving entities, and spatial distributions of events or moving entities in different time intervals. Distances are computed in a specific way for each type of objects; moreover, it may be useful to have several different distance functions for the same type of objects. Thus, multiple distance functions available for trajectories support different analysis tasks. We demonstrate the use of our approach by example of two datasets from the VAST Challenge 2008: evacuation traces (trajectories of moving entities) and landings and interdictions of migrant boats (events).

Visual Analysis of Mixed Data Sets Using Interactive Quantification

Abstract:

It is often difficult to analyse data sets including a combination of categorical and numerical variables (mixed data sets) since there does not exist any similarity measure which is as straight forward and general as the numerical distance between numerical items. Quantification of categorical variables enables analysis using commonly used visual representations and analysis techniques for numerical data. This paper presents a tool for exploratory analysis of categorical and mixed data which uses a quantification process introduced in [17]. The application enables analysis of mixed data sets by providing an environment for exploratory analysis using common visual representations in multiple coordinated views and algorithmic analysis that facilitates detection of potentially interesting patterns within combinations of categorical and numerical variables. The generality and usefulness of the quantification process and of the features of the application is demonstrated through a case scenario using a data set from the IEEE VAST 2008 Challenge [13].

FpVAT: A Visual Analytic Tool for Supporting Frequent Pattern Mining

Abstract:

As frequent pattern mining plays an essential role in many knowledge discovery and data mining (KDD) tasks, numerous algorithms for finding frequent patterns have been proposed over the past 15 years. However, most of these algorithms return the mining results in the form of textual lists containing frequent patterns showing those frequently occurring sets of items. It is well known that "a picture is worth a thousand words". The use of visual representation can enhance the user's understanding of the inherent relations in a collection of frequent patterns. In this paper, we develop a simple yet useful visual analytic tool for supporting frequent pattern mining called FpVAT. Such a visual analytic tool consists of two modules: One module gives users an overview so that they can derive insight from a massive amount of raw data; another module enables users to perform analytical reasoning on the mining results via interactive visual interfaces so that users can detect the expected frequent patterns and discover the unexpected frequent patterns. As a visual analytic tool, our FpVAT is equipped with several interactive features for effective visual support in the data analysis and KDD process for various real-life applications.

Hierarchical Difference Scatterplots: Interactive Visual Analysis of Data Cubes

Abstract:

Data cubes as employed by On-Line Analytical Processing (OLAP) play a key role in many application domains. The analysis typically involves to compare categories of different hierarchy levels with respect to size and pivoted values. Most existing visualization methods for pivoted values, however, are limited to single hierarchy levels. The main contribution of this paper is an approach called Hierarchical Difference Scatterplot (HDS). A HDS allows for relating multiple hierarchy levels and explicitly visualizes differences between them in the context of the absolute position of pivoted values. We discuss concepts of tightly coupling HDS to other types of tree visualizations and propose the integration in a setup of multiple views, which are linked by interactive queries on the data. We evaluate our approaches by analyzing social survey data in collaboration with a domain expert.