Saturday, February 9, 2013

8th Feb lab

We e-mailed Dr Coenen in CS department last week and he suggested we use the Twitter API in order to get twitter messages and then preprocess the text and prepare it for the data mining process.

Before the lab session in Friday, we have already made some common view. We should get the twitter message(tweets) from twitter through twitter API. [1]To use the API to search and save message, we have to code and make a small program to use the API and then we can get the message we want. The language used in the programming is not decided. Yu suggested to use python  because there is  a lot programs  are written in python and it is not very hard to learn. Jana and Boning suggested to program in JAVA because we learnt JAVA before.

Twitter users can post tweets about a variety of different topics. For our project we need to concentrate on a specific topic and we were thinking about how users react to having flu but haven't settled on that yet. 

We also decided to use Weka as our data mining tool. [3] And uclassify is also an alternative. [4]

However in the website[1], we found that when using the twitter API,only twitter messages within 6-7 days can be found. So our original plan "to search the tweets between Oct 2012 to Jan 2013 and do the data mining" is limited.

We found some code in github [2] about using the searching API in Twitter. We tried to run the code but failed.

A online text data mining tool , Discovery Text[6] can use the API without coding. It need 49 dollar every month. But the mining function seems weak.

We found another website (offered be another group) can offer the tool of data fetching and mining [5]. It will cost 14.99 dollar to monitoring every archive. And the result will update hourly. The most important thing is that we can download the related tweets as xls file. So that it can be possible to use the data in WEKA [6].

After download a test data file from tweetarchivist, we tried to do mining using weka. But an error has occurred when we tried to convert the xls file into an arff file since weka only works with arff files.

So our next step is to fix the previous problem and decide an interesting topic for getting the twitter messages.



Related website:
[1]:https://dev.twitter.com/docs/using-search
[2]: https://github.com/search?q=twitter+search&ref=commandbar
[3]: http://www.yorku.ca/jhuang/files/weka.pdf
http://www.robertomarchetto.com/www/data_mining_weka_twitter
[4]http://www.uclassify.com/Default.aspx
[6]https://discovertext.com/app/selectDatasetToCode.aspx
[5]http://www.tweetarchivist.com/
[6]http://www.cs.waikato.ac.nz/~ml/publications/2010/Twitter-crc.pdf

No comments:

Post a Comment