VAKD '09 Home PageSpecial Issue | Proceedings | Program | CfP (Special Issue) | CfP (Workshop) | Challenge | Organizers | Contact


Proceedings

You can find all papers in the proceedings, published in the ACM Digital Library.

Proceedings as a pdf file (ISBN 978-1-60558-670-0) - 12.7 MB

Front matter as a pdf file (Cover, Title page, Committees, TOC) - 75 KB

Contents

Oral Presentation

Interactive Spatio-Temporal Cluster Analysis of VAST Challenge 2008 Datasets

Gennady Andrienko & Natalia Andrienko

[full citation, pdf]

Abstract:

We describe a visual analytics method supporting the analysis of two different types of spatio-temporal data, point events and trajectories of moving agents. The method combines clustering with interactive visual displays, in particular, map and space-time cube. We demonstrate the use of the method by applying it to two datasets from the VAST Challenge 2008: evacuation traces (trajectories of people movement) and landings and interdictions of migrant boats (point events).

Surveying the complementary role of automatic data analysis and visualization in knowledge discovery

Enrico Bertini & Denis Lalanne

[full citation, pdf]

Abstract:

The aim of this work is to survey and reflect on the various ways to integrate visualization and data mining techniques toward a mixed-initiative knowledge discovery taking the best of human and machine capabilities. Following a bottom-up bibliographic research approach, the article categorizes the observed techniques in classes, highlighting current trends, gaps, and potential future directions for research. In particular it looks at strengths and weaknesses of information visualization and data mining, and for which purposes researchers in infovis use data mining techniques and reversely how researchers in data mining employ infovis techniques. The article further uses this information to analyze the discovery process by comparing the analysis steps from the perspective of information visualization and data mining. The comparison permits to bring to light new perspectives on how mining and visualization can best employ human and machine skills.

Visual Exploration of Categorical and Mixed Data Sets

Sara Johansson

[full citation, pdf]

Abstract:

For categorical data there does not exist any similarity measure which is as straight forward and general as the numerical distance between numerical items. Due to this it is often difficult to analyse data sets including categorical variables or a combination of categorical and numerical variables (mixed data sets). Quantification of categorical variables enables analysis using commonly used visual representations and analysis techniques for numerical data. This paper presents a tool for exploratory analysis of categorical and mixed data, which uses a quantification process introduced in [Johansson2008]. The application enables analysis of mixed data sets by providing an environment for exploratory analysis using common visual representations in multiple coordinated views and algorithmic analysis that facilitates detection of potentially interesting patterns within combinations of categorical and numerical variables. The effectiveness of the quantification process and of the features of the application is demonstrated through a case scenario.

Multiple Coordinated Views Supporting Visual Analytics

Bianchi Meiguins & Aruanda Gonçalves Meiguins

[full citation, pdf]

Abstract:

This paper proposes the use of multiple coordinated views to support visual analytics. The Information Visualization tool PRISMA includes the implementation of three coordinated visualization techniques (treemap, parallel coordinates and scatterplot) and other auxiliary charts. The coordination is supported by filter, brushing, visual representation and other mechanisms. The mini-challenge 4 (Evacuation Traces) from IEEE VAST 2008 Challenge was used as a case study on this paper.

Hierarchical Difference Scatterplots - Interactive Visual Analysis of Data Cubes

Harald Piringer & Matthias Buchetics & Helwig Hauser & Eduard Gröller

[full citation, pdf]

Abstract:

Data cubes as employed by On-Line Analytical Processing (OLAP) play a key role in many application domains. The analysis typically involves to compare categories of different hierarchy levels with respect to size and pivoted values. Most existing visualization methods for pivoted values, however, are limited to single hierarchy levels. The main contribution of this paper is an approach called Hierarchical Difference Scatterplot (HDS). A HDS allows for relating multiple hierarchy levels and explicitly visualizes differences between them in the context of the absolute position of pivoted values. We discuss concepts of tightly coupling HDS to other types of tree visualizations and propose the integration in a setup of multiple views, which are linked by interactive queries on the data. We evaluate our approaches by analyzing social survey data in collaboration with a domain expert.

Algebraic Visual Analysis: The Catalano Phone Call Data Set Case Study

Anna Shaverdian & Hao Zhou & H. Jagadish & George Michailidis

[full citation, pdf]

Abstract:

While many clever techniques have been proposed for visual analysis, most of these are "one of" and it is not easy to see how to combine multiple techniques. We propose an algebraic model capable of representing a large class of visual analysis operations on graph data. We demonstrate the value of this model by showing how it can simulate the analyses performed by several groups on the Catalano family cell phone call record data set as part of the VAST 2008 challenge.

Poster Presentation with a Poster Spotlight Talk

FpViz: A Visualizer for Frequent Pattern Mining

Carson Kai-Sang Leung & Christopher L. Carmichael

[full citation, pdf]

Abstract:

Over the past 15 years, numerous algorithms have been proposed for frequent pattern mining as it plays an essential role in many knowledge discovery and data mining (KDD) tasks. Most of these frequent pattern mining algorithms return the mined results in the form of textual lists containing frequent patterns showing those frequently occurring sets of items. It is well known that "a picture is worth a thousand words". The use of visual representation can enhance the user understanding of the inherent relations in a collection frequent patterns. A few visualizers have been developed to visualize the input data or the mined results. However, most of these visualizers were not designed for visualizing the mined frequent patterns. In this paper, we develop a visualizer for frequent pattern mining. Such a visualizer--called FpViz--gives users an insight about the data, allows them to zoom in and zoom out, and provides details on demand. Moreover, FpViz is also equipped with several interactive features for effective visual support in the data analysis and KDD process for various real-life applications.

Exploration and Visualization of OLAP Cubes with Statistical Tests

Carlos Ordonez & Zhibo Chen

[full citation, pdf]

Abstract:

In On-Line Analytical Processing (OLAP), users explore a database cube with roll-up and drill-down operations in order to find interesting results. Most approaches rely on simple aggregations and value comparisons in order to validate findings. In this work, we propose to combine OLAP dimension lattice traversal and statistical tests to discover significant metric differences between highly similar groups. A parametric statistical test allows pair-wise comparison of neighboring cells in cuboids, providing statistical evidence about the validity of findings. We introduce a two-dimensional checkerboard visualization of the cube that allows interactive exploration to understand significant measure differences between two cuboids differing in one dimension along with associated image data. Our system is tightly integrated into a relational DBMS, by dynamically generating SQL code, which incorporates several optimizations to efficiently explore the cube, to visualize discovered cell pairs and to view associated images. We present an experimental evaluation with medical data sets focusing on finding significant relationships between risk factors and disease.

Visual Analysis of Documents with Semantic Graphs

Delia Rusu & Blaž Fortuna & Dunja Mladenić & Marko Grobelnik & Ruben Sipoš

[full citation, pdf]

Abstract:

In this paper, we present a technique for visual analysis of documents based on the semantic representation of text in the form of a directed graph, referred to as semantic graph. This approach can aid data mining tasks, such as exploratory data analysis, data description and summarization. In order to derive the semantic graph, we take advantage of natural language processing, and carry out a series of operations comprising a pipeline, as follows. Firstly, named entities are identified and co-reference resolution is performed; moreover, pronominal anaphors are resolved for a subset of pronouns. Secondly, subject - predicate - object triplets are automatically extracted from the Penn Treebank parse tree obtained for each sentence in the document. The triplets are further enhanced by linking them to their corresponding co-referenced named entity, as well as attaching the associated WordNet synset, where available. Thus we obtain a semantic directed graph composed of connected triplets, where the nodes describe the subject and object, and the edge is represented by the predicate. The document's semantic graph is a starting point for automatically generating the document summary. The model for summary generation is obtained by machine learning, where the features are extracted from the semantic graph structure and content. The summary also has an associated semantic representation. The size of the semantic graph, as well as the summary length can be manually adjusted for an enhanced visual analysis. We also show how to employ the proposed technique for the Visual Analytics challenge.

Heidi Matrix: Nearest Neighbor Driven High Dimensional Data Visualization

Soujanya Vadapalli & Kamalakar Karlapalem

[full citation, pdf]

Abstract:

Identifying patterns in large high dimensional data sets is a challenge. As the number of dimensions increases, the patterns in the data sets tend to be more prominent in the subspaces than the original dimensional space. A system to facilitate presentation of such subspace oriented patterns in high dimensional data sets is required to understand the data.

Heidi is a high dimensional data visualization system that captures and visualizes the closeness of points across various subspaces of the dimensions; thus, helping to understand the data. The core concept behind Heidi is based on prominence of patterns within the nearest neighbor relations between pairs of points across the subspaces.

Given a d-dimensional data set as input, Heidi system generates a 2-D matrix represented as a color image. This representation gives insight into (i) how the clusters are placed with respect to each other, (ii) characteristics of placement of points within a cluster in all the subspaces and (iii) characteristics of overlapping clusters in various subspaces.

A sample of results displayed and discussed in this paper illustrate how Heidi Visualization can be interpreted.


VAKD '09 Home PageSpecial Issue | Proceedings | Program | CfP (Special Issue) | CfP (Workshop) | Challenge | Organizers | Contact

http://www.hiit.fi/vakd09/papers.html
2009-11-26

vakd09@hiit.fi