2014年10月2日星期四

Second Summary for the Course of Social Network Analytics. Come and Leave Your Comments!

Time flies, it has been over four weeks since I took the class. I have absorbed something new since I wrote the last blog. Hence, it is the time to make a summary for these two weeks’ knowledge.

I extract four key words in the last two class. They are “document comparison”, “text classification”, “text clustering” and “sentiment analysis and opinion mining”. Let’s talk about these topics in order. When it comes to the similarity between several file documents, in my perspective, we should only pay attention to several keywords. If two pieces of paper have several words which appear frequently in common, there is a high possibility that these two documents focus on the same research realm. This is an empirical method after all. There are several quantitative solutions with higher accuracy.

TF*IDF rule is one of these solutions, actually, it is simple and effective. In the TF*IDF rule, TF means the keyword frequency, it is simply the number of times a given keyword appears within a specific document. Meanwhile, the IDF (Inverse Document Frequency) is obtained by dividing the number of documents in the whole collection by the number of documents containing the given keyword. The bigger the IDF index is, there are fewer documents with specific keywords in the whole collection. These given keyword are of great specificity. Namely, they are unique. If two documents have several unique keywords in common, obviously and undoubtedly, they are similar and they can be classified into the same group.

Actually, the first three topics can be mentioned at the same time. In this information-explosion era, documents can be generated in an amazing speed, under this circumstance, people have realized the importance of comparing documents and classifying similar files into the same group.  It will increase the efficiency and make works easier.

Except from text classification, analyzing people’s moods is also a useful tool in social network analytics. People are likely to sharing their feelings with their friends on the platforms such as Facebook and Twitter. Some people express themselves explicitly while some people prefer implicit expressions. In the most cases, people’s moods can be divided into three types. Positive, negative and objective (neutral). Three types of moods represent three classes. People’s expressions are text files. Analyzer’s task is text classification, as what we have discussed above. We may create dictionaries for each class respectively. Several typical given keywords contain in the dictionary. Comparing people’s expressions with keywords in each class and it is easy to determine people’s current feelings.

All in all, contents mentioned in the last two classes are interesting. Although it is simple for us to understand, it also needs to be measured by rigorous mathematical methods.

3 条评论:

  1. Obviously, you have understand the key points about the content we have learned in class and it is a good summary as well.

    回复删除
    回复
    1. Hi, Weibin, thanks for your praise, I just record something important after the class. We may share with each other at some time.

      删除
  2. Anyhow ,text cluster remains unmature for the time being.Hoping to be updated.

    回复删除