### Description

here's the metric:

Say we have samples

andand we want to have a distance between them.

first, just run an SVM on the two samples, considering each as a class-0 example and each as class-1 example. Train the svm and measure the number of (training) examples of class 0 classified as 0. call this Then measure the number of (training) examples of class 1 classified also (!) as 0. call this

then take

ok now we'll construct in the same way, for each : take the k-tuples and call them class 0 examples, and the same with k-tuples for Y, take and call them class 1 examples.

Train an SVM, get and in the same way, and obtain

finally,

where

### Implementation core

library("e1071") #for svm #Computation of the distance myreshape <- function(x,k){ n = nrow(x) dimension = ncol(x) X = matrix(0, nrow=n-k+1, ncol=k*dimension) for (i in 1:nrow(X) ) { X[i,] = as.vector(t(x[i:(k+i-1),])) #flatten by rows } return(X) } distance <- function(x, y, ... ) { if (nrow(x)>nrow(y)){ tmp = y y = x x = tmp } n = nrow(x) m = nrow(y) dimension = ncol(x) d = rep(0,sqrt(n)) w = 1/(1:sqrt(n))^2 for ( k in 1:sqrt(n) ) { X = myreshape(x,k) Y = myreshape(y,k) mydata = rbind(X,Y) myclasses = c( rep(0,nrow(X)),rep(1,nrow(Y)) ) model = svm(mydata,myclasses,type="C",...) classification = predict(model, mydata) Tx = sum(classification[ myclasses==0 ] == 0) Ty = sum(classification[ myclasses==1 ] == 0) d[k] = abs( Tx/(n-k+1) - Ty/(m-k+1) ) } return( sum(w*d) ) }

### Tests on artifical datas

Now a quick test, with 10 observations of a U(0,1) and 10 observations of a N(0,1) for 200 time steps.

With a linear kernel, here it is a plot a the distances matrix (red means high difference, white no difference) :

For fun a classic hierarchical clustering on it (of course clustering method doesn't matters with a such easy matrix)

With a RBF kernel (no tuning of parameters)

### On real datas

A last year ICML paper on clustering for human gesture (CLDS approach)

Here it is the conditional entropies S of clusters (lower is better)

We have an implementation of a parrallel version for distance computation and launchers (use `oar.sh`

). Computation takes 7 minutes with 40 cores for MOCAPOS dataset.

We only keep the right foot as in the paper (we should also try with everything). I also included "jumps" class which seems to have been removed in the ICML paper.

real.classes 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 1

#### Linear kernel

real.classes predicted.classes 0 1 2 1 5 1 1 2 0 18 12 3 5 7 9

Soit une entropie conditionnelle de 0.80 (jump included).

Without the jump (class 0)

predicted.classes 1 2 1 18 14 2 8 8

**Conditional entropy de 0.65. We are defeated (0.37 for the best).**

#### Gaussian kernel (default parameters)

predicted.classes 0 1 2 1 8 0 0 2 1 20 4 3 1 6 18 Conditional Entropy = 0.5666671 predicted.classes 1 2 1 0 10 2 26 12 Conditional entropy empirical estimator (predicted.classes|real.classes) 0.3157959 empirical estimator (real.classes|predicted.classes) 0.4937268 <- used in the paper

With 3 clusters and no jumps :

predicted.classes 1 2 1 0 10 2 20 4 3 6 8 Conditional Entropy 0.4244621

**So we still loose with this kernel (0.37 for the best).**

So we start tuning parameters… gamma = 2 and C = 16 (not really optimized, but after a very few tuning tests is was looking good)

predicted.classes 0 1 2 1 9 0 0 2 1 26 7 3 0 0 15 Conditional Entropy 0.3717996 predicted.classes 1 2 1 0 15 2 26 7 Conditional Entropy 0.3552681

**V for Victory. It is even good with the removed class**