2008年12月20日星期六

matlab和C语言混合的编译NCA

调试 Neighborhood Component Analysis的代码

http://www.cs.berkeley.edu/~fowlkes/software/nca/

发现里面有一个nca.cc文件和 mexit.m 和 nca.mexglx,mexit.m的文件内容:
mex nca.cc CC=g++ COPTIMFLAGS=-O3 CDEBUGFLAGS=-g CFLAGS='-fPIC -ansi -D_GNU_SOURCE -pthread -Wall' CXX=g++ CXXOPTIMFLAGS=-O3 CXXDEBUGFLAGS=-g CXXFLAGS='-fPIC -ansi -D_GNU_SOURCE -pthread -Wall'

开始试图在windows下弄一个g++编译器,用mex -setup 安装,但matlab识别不了, 后来查了 那本书, 发现matlab 不支持Dev C++. 并且发现nca.mexglx 是在linux下编译好的mex文件.

后来在windows下装了VC6.0,把 nca.cc改为nca.cpp,又改了下不同编译器对语法的不同的地方,编译过了. 生成nca.mexw32 , 该mex文件相当于让nca成为了maltab的内建函数.

Note:

1, 代码里面有一个函数minimize, 没看懂啥意思. 查了下, 以后再看.http://www.kyb.tuebingen.mpg.de/bs/people/carl/code/minimize/

2, 原文:http://www.cs.toronto.edu/~hinton/absps/nca.pdf 有点难理解

Prof. Sam Roweis的presentation pdf and talk about NCA:

http://videolectures.net/sam_roweis/


3, Prof. Chen cansong的一篇辅助理解下:
Discriminant common vectors versus neighbourhood components analysis and Laplacianfaces: A comparative study in small sample size problem
http://parnec.nuaa.edu.cn/jliu/papers/comparative.pdf

PCA & SVD

PCA的资料:
http://en.wikipedia.org/wiki/Principal_components_analysis

里面提到了: limitation of PCA, 以及PCA的一些局限性,主要是在assumption上

SVD的资料:
http://www.uwlax.edu/faculty/will/svd/index.html
A writes A as a product (hanger)(stretcher)(aligner). It's just SVD
matrix action 觉得很有用.

http://web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm
里面提到了eigenvectors and eigenvalue 怎么得来的.

http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/4000/pdf/imm4000.pdf
提到了PCA 和SVD的关系.

http://www.cc.gatech.edu/~sangmin/pubs/Oh_svd.pdf

The SVD-Fundamental Theorem of Linear Algebra:
http://www.lana.lt/journal/21/Akritas.pdf

2008年12月4日星期四

一些机器学习的概念

From: http://www.cs.utexas.edu/

Ensemble Learning:

Ensemble Learning combines multiple learned models under the assumption that "two (or more) heads are better than one." The decisions of multiple hypotheses are combined in ensemble learning to produce more accurate results. Boosting and bagging are two popular approaches. Our work focuses on building diverse committees that are more effective than those built by existing methods, and, in particular, are useful for useful for active learning.

Active Learning :

Active learning differs from passive "learning from examples" in that the learning algorithm itself attempts to select the most informative data for training. Since supervised labeling of data is expensive, active learning attempts to reduce the human effort needed to learn an accurate result by selecting only the most informative examples for labeling.

Transfer Learning:(*)

Traditional machine learning algorithms operate under the assumption that learning for each new task starts from scratch, thus disregarding any knowledge they may have gained while learning in previous domains. Naturally, if the domains encountered during learning are related, this tabula rasa approach would waste both data and computer time to develop hypotheses that could have been recovered by simply examining and possibly slightly modifying previously acquired knowledge. Moreover, the knowledge learned in earlier domains could capture generally valid rules that are not easily recoverable from small amounts of data, thus allowing the algorithm to achieve even higher levels of accuracy than it would if it starts from scratch.
The field of transfer learning, which has witnessed a great increase in popularity in recent years, addresses the problem of how to leverage previously acquired knowledge in order to improve the efficiency and accuracy of learning in a new domain that is in some way related to the original one. In particular, our current research is focused on developing transfer learning techniques for Markov Logic Networks (MLNs), a recently developed approach to statistical relational learning .


Reinforcement Learning:

consists of a set of machine learning methods that address a particular kind of learning task, in which the learner is placed in an unknown environment and is allowed to take actions that bring it rewards and can change its state in the environment. In general terms, the goal of the agent is to develop a policy, or a mapping from states to actions, that maximizes the reward it obtains while interacting with the environment.

Unsupervised and Semi-Supervised Learning and Clustering:

In many learning tasks, there is a large supply of unlabeled data but insufficient labeled data since it can be expensive to generate. Semi-supervised learning combines labeled and unlabeled data during training to improve performance. Semi-supervised learning is applicable to both classification and clustering. In supervised classification, there is a known, fixed set of categories and category-labeled training data is used to induce a classification function. In semi-supervised classification, training also exploits additional unlabeled data, frequently resulting in a more accurate classification function. In unsupervised clustering, an unlabeled dataset is partitioned into groups of similar examples, typically by optimizing an objective function that characterizes good partitions. In semi-supervised clustering , some labeled data is used along with the unlabeled data to obtain a better clustering.