Making the Slides

2007/04/02

1. Writing the assembly’s chapter 5 overview.
2. I made the slides for one of my chosen paper whose title is “A study on automatically extracted keywords in text categorization”. The goal of this paper is to investigate if automatically extracted keywords can be used to improve text categorization. In this study, the authors do some experiments on different feature with assigned feature values. In summary, the results indicated that the full-text representation combined with the keywords can achieve higher performance on text categorization.
I compared this paper to my research, there are some different aspects as following:
(1) In this paper, they extracted the keywords for each document. My research will extract the technical terms.
(2) They combined the keyword and full-text with many kinds of feature values for text categorization. My research will use the technical term as my feature and weight (from scoring function) and tf*idf as my feature values for clustering.
My opinion:
Although they didn’t propose a new approach for automatic text categorization, they did a series of different experiment with many kinds input feature (such as, keyword only, title only, keyword + full text) and many kinds feature values (boolean, tf*idf, …etc.).

No comments: