Friday, February 22, 2013

Lab work in 22nd

Today we experienced areally tough time.
This week we try to use Java programming to do sentiment analysis on some tweets downloaded from websites.

The first idea was using keyword_searching method. We try to meke some lists of keywords. We make two lists, one contains positive words and another contains negative words. We contained the tweets in another csv files.
Our method is quite straight and easy.
We want to compare the three lists.
If the tweet contain negative words,the tweet is negative.
If the tweet contain positive words, the tweet is positive.

This idea is quite easy but with a lot of weakness.
Although this method is simple,it is not easy to do this job.

Here is what we did:






(well,apparently,wrong,cause our idea is so buggy.:()

Well,we did not carry on completing our project using this method.

Maybe we can finally improve our program by adding more complex idea in this method, but...

In the end,we decided to give up our idea.

We decided to go back to use weka.

We learned that weka has a machine learning function which can achieve our aim easily.

Monday, February 18, 2013

Lab work in 15th

We did another lab day in 15th Feb. This is the third lab working day since the beginning the project.

We solve several remaining questions from last week:


1. The problem of how to transfer data stored in xls file into arff file is solved completely.
The method of transfer xls to arff is to store the data in xls as csv file, in which every message are stored and divided with comma.  And then a heading was added into the csv file. In the heading, the line of data and their type are identified.
Date and time in the original data should be merged to certain form so that they can be recognized in Weka.

2. We found that only few tweets contain the location data. In the existing circumstances, we can not write advanced algorithm  and programs to collect and filter the messages. So all the suggested topics involved with location is not doable any more from now on.


3. Some other suggested topic are raised this week.  Movie classification is more practicable then original topics.  The original topic is very creative. But due to the time and our capability, it is not practical.

This is another problem that we have to solve the next week.

1. We have to learn how to use Weka. Weka is a very powerful tool in data mining. But the operation and algorithm is very complicated. Certain filter and classifier have to be used to ensure the accuracy of the result.

2.  We  have to learn to combine some online services, Weka result and human analysis to give the right answer .   

(We finally successfully import the arff file into Weka)