Tuesday, January 29, 2008

Report for 1/30

1. What you have done
Steve implemented random-arff-file-generator for me.

With that code, I generated twenty 4-attribute arff files.
I genearted one file consisting of 180 data instances where each instance contains entropy,infomation gain,kai-square, and difference(DT accuracy) in a pair arff files.

I obtained MLP 64.7368%, SVM 42.1053 %, DT 57.3684%

2. What problems you have encountered
For data similairty, I don't have problems. Just lack of accuracy.

3. Next week plan.
During this week, I plan to implement standard Q-learning as the start and refine the idea.

Wednesday, January 16, 2008

Conference Information

Click this

Report for 1/23

1. What you have done
I examined the tree depth for each data set and calculate the correlation.
It turned out that adult+stretch and soybean data have same height of the tree. It was good. HOWEVER, sometimes other pair of data set that shows huge difference in accuracy also has very similar height.
2. I ran our old sort-data -and -area-comparison method and the correlation of adult+stretch and soybean data were around 0.45
3. I rerun entropy-based-comparison method with different comparison methods. It hasn't showed the good result yet.
2. What problems you have encountered
1. My problem is ignorance of which property of data would be compared to find some correlation to a pair of data. For example, as we talked, adult+stretch and soybean data produces very similar accuracy even though so many differences exist between them. 3. What possible solutions you are considering for these problems
The entropy-based-comparison method produced about 0.6 correlations. That means entropy-based can somehow reflect the accuracy. But the problem is that it is not enough and not robust across any pair of data. So I think we may need to push this method a little more.
We group data by some property and run entropy- based-comparison method again.4. What you plan on doing in the coming week
I will spend one or two days to think about above idea (how to cluster data sets) and run entropy-based-comparison method with diverse vector comparison methods.5. New ideas, specific topics/issues you wish or us to focus on in our discussion
Well, I spent almost a year to think about data comparison and result hasn't very satisfactory yet. But I don't want to drop this since you and I put so much effort on this. However I may want to continue this work as a side work but practically I do the same thing as I do.

I have no specific or solid idea of what to do for the next project. My other mind is toward Reinforcement learning area.
Here is two very abstract idea of agent learning.
1. The agent reaches the goal by direction of the goal instead of checking the reward every state.
or
2. Agent learns two obstacles in which we assume that each obstacle consists of two subcomponents in a source task. (Say obstacle 1 = A + B, obstacle 2= C+ D)
In a typical learning, learner (or agent) can indentify other obstacles similar with obstacle1 or obstacle 2. What happen if new obstacle is consists of A +C or B +D. In other words, new unseen obstacle consists of components derived from each learned obstacle. Here is the scenario.
Frog tries to reach home from a remote place by crossing some obstacles.
One obstacle is rat (rat is represented by moving motion and its stench)
The other obstacle is snake (snake is identified by its temperature and its nasty sound).
After learning, now frog is very good at avoiding poison pond and snake.
One day, frog encounter new object which is moving but produce similar nasty sound (say, raven.).
I want frog to decompose his obstacle experience and learn new obstacles if they are consists of components from learned obstacles.
I don't know this is possbile. It's just an idea.