Telescopic Distance

Description

here's the metric:

Say we have samples

Formula : $$X=(X_1 \ldots X_n)$$ and Formula : $$Y=(Y_1 \ldots Y_m)$$

and we want to have a distance between them.

first, just run an SVM on the two samples, considering each Formula : $$X_i, i=1,\ldots, n$$as a class-0 example and each Formula : $$Y_i, i=1,\ldots, m$$ as class-1 example. Train the svm and measure the number of (training) examples of class 0 Formula : $$(X_i)$$ classified as 0. call this Formula : $$T^1_x$$Then measure the number of (training) examples of class 1 Formula : $$(Y_i)$$ classified also (!) as 0. call this Formula : $$T^1_y$$

then take Formula : $$d_1= |T^1_x/n - T^1_y/m|$$

ok now we'll construct Formula : $$d_k$$ in the same way, for each Formula : $$k=2,\ldots, \sqrt n$$ : take the k-tuples Formula : $$(X_1, \ldots , X_k), (X_2,\ldots , X_{k+1}), \ldots (X_{n-k+1} ,\ldots, X_n)$$and call them class 0 examples, and the same with k-tuples for Y, take Formula : $$(Y_1,\ldots, Y_k), (Y_2,\ldots, Y_{k+1}), \ldots, (Y_{m-k+1},\ldots ,Y_m)$$and call them class 1 examples.

Train an SVM, get Formula : $$T^k_x$$ and Formula : $$T^k_y$$ in the same way, and obtain

Formula : $$d_k= |T^k_x/(n-k+1) - T^k_y/(m-k+1)|$$

finally,

Formula : $$d= \sum_{k=1}^{\sqrt n} w_k d_k$$
where Formula : $$w_k= 1/k^2$$

Implementation core

library("e1071") #for svm


#Computation of the distance 

myreshape <- function(x,k){
		n = nrow(x)
		dimension = ncol(x)
		X = matrix(0, nrow=n-k+1, ncol=k*dimension)
		for (i in 1:nrow(X) ) {
			X[i,] = as.vector(t(x[i:(k+i-1),])) #flatten by rows
			}
		return(X) 	
	}

distance <- function(x, y, ... ) {
	if (nrow(x)>nrow(y)){
		tmp = y
		y = x
		x = tmp
		}
	
	n = nrow(x)
	m = nrow(y)
	dimension = ncol(x)
	
	d = rep(0,sqrt(n))
	w = 1/(1:sqrt(n))^2
		
	for ( k in 1:sqrt(n) ) {
		X = myreshape(x,k)		
		Y = myreshape(y,k)
		
		mydata    = rbind(X,Y)
		myclasses = c( rep(0,nrow(X)),rep(1,nrow(Y)) )   
	
		model = svm(mydata,myclasses,type="C",...)
		classification = predict(model, mydata)
		Tx = sum(classification[ myclasses==0 ] == 0)
		Ty = sum(classification[ myclasses==1 ] == 0)
		d[k] = abs( Tx/(n-k+1) - Ty/(m-k+1) ) 
	}
	
	return( sum(w*d) )
}

Tests on artifical datas

Now a quick test, with 10 observations of a U(0,1) and 10 observations of a N(0,1) for 200 time steps.

With a linear kernel, here it is a plot a the distances matrix (red means high difference, white no difference) :

R plot

For fun a classic hierarchical clustering on it (of course clustering method doesn't matters with a such easy matrix)

R plot

With a RBF kernel (no tuning of parameters)

R plot

R plot

On real datas

A last year ICML paper on clustering for human gesture (CLDS approach)

Here it is the conditional entropies S of clusters (lower is better)

We have an implementation of a parrallel version for distance computation and launchers (use oar.sh). Computation takes 7 minutes with 40 cores for MOCAPOS dataset.

We only keep the right foot as in the paper (we should also try with everything). I also included "jumps" class which seems to have been removed in the ICML paper.

real.classes
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 1

Linear kernel

                 real.classes
predicted.classes  0  1  2
                1  5  1  1
                2  0 18 12
                3  5  7  9

Soit une entropie conditionnelle de 0.80 (jump included).

Without the jump (class 0)

 predicted.classes  1  2
                1 18 14
                2  8  8

  

Conditional entropy de 0.65. We are defeated (0.37 for the best).

Gaussian kernel (default parameters)

predicted.classes  0  1  2
                1  8  0  0
                2  1 20  4
                3  1  6 18
Conditional Entropy = 0.5666671
                 
predicted.classes  1  2
                1  0 10
                2 26 12

Conditional entropy 
 empirical estimator (predicted.classes|real.classes)  0.3157959
 empirical estimator (real.classes|predicted.classes)  0.4937268 <- used in the paper

With 3 clusters and no jumps :

predicted.classes  1  2
                1  0 10
                2 20  4
                3  6  8

Conditional Entropy 0.4244621
So we still loose with this kernel (0.37 for the best).

So we start tuning parameters… gamma = 2 and C = 16 (not really optimized, but after a very few tuning tests is was looking good)

predicted.classes  0  1  2
                1  9  0  0
                2  1 26  7
                3  0  0 15
Conditional Entropy 0.3717996
                 
predicted.classes  1  2
                1  0 15
                2 26  7
Conditional Entropy 0.3552681

V for Victory. It is even good with the removed class

Design selector