over 3 years ago

Spectral Hashing想做到的事情其實很簡單,他想對每個dataset都學出一個好的codeword representation,並且希望能符合下列三項條件: (1)easily computer for a novel input、(2) requires a small number of bits (3) maps similar items to similar binary codewords

要達到這些事情,可以先從下面的方程式看起:

因為minimize上面這個function,加上下面的限制後,可以想成各有一半的bit是1,一半是-1。
其相當於,balanced graph partitioning,而這個問題就是已知的NP-Hard Problem.

所以可以繼續對這個問題做些Relaxation,把Y(i, j)屬於{-1,1}這個限制拿掉,並把他變成vector的形式,讓解變成eigen vector的型式。

再經過一連串激烈的優化後,可以得出最後的Spectral hasing algorithm:

given a training set of points {xi} and a desired number of bits k
1. Finding the principal components of the data using PCA.
2. Calculating the k smallest single-dimension analytical eigenfunctions of Lp using a rectangular approximation along every PCA direction. This is done by evaluating the k smallest eigenvalues for each direction using (eigenfunctions Φk(x)), thus creating a list of dk eigenvalues, and then sorting this list to find the k smallest eigenvalues.
3. Thresholding the analytical eigenfunctions at zero, to obtain binary codes.

如此這般,就知道如何在semantic hashing中找到好的codeword的方法,簡單卻有效,實在是篇好paper。

← [Paper critique] Representation Learning: A Review and New Perspectives [Paper critique] Representation Learning: A Review and New Perspectives →