Thursday, September 25, 2008

(paper for fun) An Introduction to quantum computing

Title: An Introduction to Quantum Computing.
Author: Norson S. Yanofsky

It introduces a taste of quantum computing targeting for computer science undergraduate and even advanced high school student.

Hilbert space: regular vector space except each axis is a complex number.

One of key points in Quantum Computing.
: Quantum can be existed in SEVERAL states AT THE SAME TIME.
: when quantum is measured, it is aggregated either 0 or 1. (in case of 2 bit quantum computer)

(Paper)Information-Theoretic definition of Similarity

Title: Information-Theoretic Definition of Similarity.
Conference: ICML 1998

Paper provides a general similarity measure applicable across many domains.
Previous similarity measure is specific for each domain.

Tuesday, March 4, 2008

Report for 3/4

What I did:
Collected over 60 UCI data set.
Currently, 105 UCI data set is available. However, about 70 data sets are applicable to classcification.

Very fortunately, I think, that I've found some interesting patterns for 3 or 4 meta features.

The problem I got:
For some data set, MLP ate too much time. Even I used three computers, I haven't got MLP results yet for 6-7 data sets.

What I will do next time.
There are many things to be done.
1. Cluster data set based on the vector of accuracies.
2. Examine meta features per cluster

Tuesday, February 19, 2008

Report for 2/19

The things I did:
I implemented 16 meta-attributes and some more from STARLOG and METAL project.
I found some bugs in my code and took times to figure out.
I will report the preliminary result ASAP this week.

The problem I got:
I was busy with some odd works for last weekend. It ate lots of my time. so I couldn't follow my original research schedule. I learned that I have to save my reserach time with any costs.

My pre-coded meta-attributes functions are spread out through diverse projects.
and it took time to integrate all of them into one project.

The thing I plan to do:
Get some preliminary positive result ASAP.(no later than next report)

Monday, February 11, 2008

Report for 2/13

** The thing I did this week:

1. Additional result on 5 attributes data comparison
data set: 100 5-attribute random data.
Size of data instances: about 4950(=all possible pairs out of 100)
1.1
Training accruacy from ID3
Similarity accuracy: 91.0303%(C4.5),90.9293%(RandomCommittee),89.63%(SVM),89.5758%(MLP)

1.2
Training accuracy from MLP
Similarity accuracy: 89.4545(C4.5), 89.798%(RandomCommittee), 90.1212%(SVM),88.5859%(MLP), 89.9394%(Bagging)

2. I implemented deterministic Q-learning algorithm as a starting point for future research.

3. Reading
Kate A. Smith-miles(Cross-Disciplinary Perspectives On Meta-Learning For Algorithm Selection): Good paper to review diverse disciplines regarding to best algorithm selection for various problem domains.
Question: how our research goal is different from landmarking. According to her paper, landmarking is preidicting the peformance of one algorithm based upon the performance of cheaper and effective algorithm.

** Problem I confronted.
Generating random data with broad accuracies is hard.
When I generated 100 5-attribute data set, the accuracy range is only between 2.24% and 24.20 %

** Plan for next week.

1. experiment with more data set having different # of attribute, experiment with different algorithm.

2. Read 5 papers (Q-Decompsiiont paper in ICML 2003, Task decomposition in IEEE 1997, Recognizing Enviromental Change, IEEE 1999, Environmental adoption IEEE 1999) and do summary.

3. extend deterministic Q-learning into 'non-deterministic' Q-learning

Tuesday, January 29, 2008

Report for 1/30

1. What you have done
Steve implemented random-arff-file-generator for me.

With that code, I generated twenty 4-attribute arff files.
I genearted one file consisting of 180 data instances where each instance contains entropy,infomation gain,kai-square, and difference(DT accuracy) in a pair arff files.

I obtained MLP 64.7368%, SVM 42.1053 %, DT 57.3684%

2. What problems you have encountered
For data similairty, I don't have problems. Just lack of accuracy.

3. Next week plan.
During this week, I plan to implement standard Q-learning as the start and refine the idea.

Wednesday, January 16, 2008

Conference Information

Click this

Report for 1/23

1. What you have done
I examined the tree depth for each data set and calculate the correlation.
It turned out that adult+stretch and soybean data have same height of the tree. It was good. HOWEVER, sometimes other pair of data set that shows huge difference in accuracy also has very similar height.
2. I ran our old sort-data -and -area-comparison method and the correlation of adult+stretch and soybean data were around 0.45
3. I rerun entropy-based-comparison method with different comparison methods. It hasn't showed the good result yet.
2. What problems you have encountered
1. My problem is ignorance of which property of data would be compared to find some correlation to a pair of data. For example, as we talked, adult+stretch and soybean data produces very similar accuracy even though so many differences exist between them. 3. What possible solutions you are considering for these problems
The entropy-based-comparison method produced about 0.6 correlations. That means entropy-based can somehow reflect the accuracy. But the problem is that it is not enough and not robust across any pair of data. So I think we may need to push this method a little more.
We group data by some property and run entropy- based-comparison method again.4. What you plan on doing in the coming week
I will spend one or two days to think about above idea (how to cluster data sets) and run entropy-based-comparison method with diverse vector comparison methods.5. New ideas, specific topics/issues you wish or us to focus on in our discussion
Well, I spent almost a year to think about data comparison and result hasn't very satisfactory yet. But I don't want to drop this since you and I put so much effort on this. However I may want to continue this work as a side work but practically I do the same thing as I do.

I have no specific or solid idea of what to do for the next project. My other mind is toward Reinforcement learning area.
Here is two very abstract idea of agent learning.
1. The agent reaches the goal by direction of the goal instead of checking the reward every state.
or
2. Agent learns two obstacles in which we assume that each obstacle consists of two subcomponents in a source task. (Say obstacle 1 = A + B, obstacle 2= C+ D)
In a typical learning, learner (or agent) can indentify other obstacles similar with obstacle1 or obstacle 2. What happen if new obstacle is consists of A +C or B +D. In other words, new unseen obstacle consists of components derived from each learned obstacle. Here is the scenario.
Frog tries to reach home from a remote place by crossing some obstacles.
One obstacle is rat (rat is represented by moving motion and its stench)
The other obstacle is snake (snake is identified by its temperature and its nasty sound).
After learning, now frog is very good at avoiding poison pond and snake.
One day, frog encounter new object which is moving but produce similar nasty sound (say, raven.).
I want frog to decompose his obstacle experience and learn new obstacles if they are consists of components from learned obstacles.
I don't know this is possbile. It's just an idea.