# Programme

#### Monday, September 19

Venue: Minerva Building, Siltavuorenpenger 5 A, Room K213

12.45 - 13.00: Opening of the workshop: Jorma Rissanen

13.00 - 14.00: Plenary Talk: Andrew Barron

14.00 - 15.30: Set 1:
15.30 - 16.00: Coffee break

16.00 - 17.30: Set 2:
• Jean Honorio: On the Sample Complexity of Learning Sparse Graphical Games
• Prasad Santhanam: Data derived estimation in slow mixing Markov processes and applications to community detection
• Paavo Nevalainen: Heuristicallly targeted Minimum Description Length test for stone detection from public point cloud data
19.00 - 21:30: Welcoming Reception at Restaurant Suomenlinnan Panimo

#### Tuesday, September 20

Aurora Building, Siltavuorenpenger 10, Auditorium 229

09.00 - 09.30: Coffee break

09.30 - 11.00: Set 3: 11.00 - 13.00: Lunch

13.00 - 14.30: Set 4:
14.30 - 15.00: Coffee break

15.00 - 17.00: Set 5:
18.30: Banquet Dinner at Restaurant Savu

#### Wednesday, September 21

Before lunch: Aurora Building, Siltavuorenpenger 10, Auditorium 229
After lunch: Athena Building, Siltavuorenpenger 3 A, Room 302
09.00 - 09.30: Coffee break

09.30 - 10.30: Plenary Talk: John Shawe-Taylor

10.30 - 11.30: Plenary Talk: Wojciech Szpankowski

11.30 - 13.30: Lunch

13.30 - 15.00: Set 6:
15.00 - 15.30: Coffee break

15.30 - 17.00: Set 7:

## Invited speakers (confirmed):

• Dmitri Pavlichin, Stanford, USA
• David R. Bickel, University of Ottawa, Canada
• Ivo Grosse, Martin-Luther-Universität Halle-Wittenberg, Germany
• Jean Honorio, Purdue, USA
• Ryota Tomioka, Microsoft Research Cambridge, England
• Kenji Yamanishi, University of Tokyo, Japan
• Ioan Tabus, Tampere University of Technology, Finland
• Joe Suzuki, Osaka University, Japan
• Marc Boullé, Orange Labs, France
• Ralf Eggeling, University of Helsinki, Finland
• Kazuho Watanabe, Toyohashi University of Technology, Japan
• Tjalling Tjalkens, Eindhoven University of Technology, the Netherlands
• Peter Grünwald, Centrum voor Wiskunde en Informatica, the Netherlands
• Jukka Corander, University of Oslo, Norway
• Kohei Miyaguchi, University of Tokyo, Japan
• Samuel Kaski, Aalto University, Finland
• Boris Ryabko, Institute of Computational Technologies of the SB RAS, Russia
• Tara Javidi, UC San Diego, USA
• Narayana Santhanam, University of Hawaii at Manoa, USA
• Tomi Silander, Xerox Research Centre Europe, France
• Michael Mahoney, UC Berkeley, USA
• Paavo Nevalainen, University of Turku, Finland

## Plenary speakers

Andrew Barron, Yale:
Probability Models for Normalized Maximum Likelihood Calculation

• Abstract: We show how a probability model in which the counts are independent with probability function $p(k)$ proportional to the Stirling ratio $k^k e^{-k} /k!$ produces exactly the Normalized Maximum Likelihood (NML) Shtarkov distribution upon conditioning on the total counts. This Stirling ratio distribution leads to clean and fast algorithms for computing the conditional distributions needed for arithmetic coding of data according to the NML. Also, it allows for simple computation of the Shtarkov constant analogous to what was obtained by Petri Kontkanen and Petri Myllymaki (2005). Bayes mixtures with possibly negative weights (Barron, Roos, Watanabe 2014) provide an alternative method of computing the joint and conditional Shtarkov distribution. However, the Stirling ratio method is superior when the alphabet size is large. This is joint work with Xiao Grace Yang.

John Shawe-Taylor, University College London:
Learning to Detect an Immune Response from T-Cell Sequences

• Abstract: Adaptive immunity can be viewed as a computational system that learns to respond to different pathogens. The response results in changes in the frequencies of T-Cells. The problem of detecting this response requires detecting these changes. The information is distributed across many different T-cells and an approach using Fisher kernels to highlight the relevant features is presented. Links between Fisher kernels and string kernels from finite state automata suggest extensions to Fisher kernel feature sets that allow biologically significant features to be extracted efficiently.

Wojciech Szpankowski, Purdue:
Analytic Pattern Matching: From DNA to Twitter

• Abstract: Repeated patterns and related phenomena in words are known to play a central role in many facets of computer science, telecommunications, coding, data compression, data mining, and molecular biology. One of the most fundamental questions arising in such studies is the frequency of pattern occurrences in a given string known as the text. Applications of these results include gene finding in biology, executing and analyzing tree-like protocols for multiaccess systems, discovering repeated strings in Lempel-Ziv schemes and other data compression algorithms, evaluating string complexity and its randomness, synchronization codes, user searching in wireless communications, and detecting the signatures of an attacker in intrusion detection. In this talk after a brief motivation, we review several pattern matching problems (e.g., exact string matching, constrained pattern matching, generalized pattern matching, and subsequence pattern matching), and then we discuss a few applications (e.g., spike trains of neuronal data, Google search, Lempel-Ziv'77 and Lempel-Ziv'78 data compression schemes, and string complexity used in Twitter classification). Finally, we illustrate our approach to solve these problems using tools of analytic combinatorics, which we discuss in some depth. We also present several open problems.