Reading a paper

2007/03/27
Today, I read a paper (Extracting Topics From Weblogs Through Frequency Segments, WWW’06 Workshop on WWE) which has a little relevance to my research. I introduced this paper briefly in the following:
Goal: Using terms that appear in blogs for extracting topics and related terms.
Step:
First: identifying a collection of main terms that were characteristic to some topics.
Second: tracking the relations between terms that describe a topic by the flow of terms over time.
Input: Blogs Corpos (provided by Intelliseek)
Output: The topic and some terms related to it.
My opinion:
1. They only extracted the nouns (Unigram) for their candidate topic. How about the noun phrase?
2. In this paper, they only adopt the term frequency to calculate the term sum and the term deviation for their consideration. How about using the links which between the weblogs?
3. Because this paper is published on the Workshop and the space is limited. Hence, there is no specific section to describe the experiment in detail.
4. Actually, I don’t know in this kind of study how they evaluated the final result? Does it have a benchmark or a defined standard which are made by human.

No comments: