Extracting Topic Words and Discussing the Project

2007/04/09

In the morning, I wrote some comment for my presentation. (A Study on Automatically Extracted Keywords in Text Categorization).

Charis, Tom and I discussed our project in the afternoon and we got some conclusions as following: Firstly, we will adopt the English web-page as our data source; in the future we may consider the Chinese format. Secondly, in order to get two web pages for Kayed’s IE system, we will rewrite the original query which is produced from the end user. Lastly, we will generate the check box for each XML Schema ID which containing the content.

In the evening, I continued my program about topic words. However, there exists a problem that not all of the paper has the ACM’s categorization. According to my testing paper, the total documents are 656, only 582 documents have the category.

No comments: