Supervised Learning results.

2007/07/27
今天花了不少的時間在跑實驗, 此次主要係透過一個supervised learning 方法 (Linear SVM), 來對technical term (來自於本論文的ATR步驟)進行分類:
1. feature selection: 採用更多的維度, 由noun, adjective, and verb所組成, 經過stemming及過濾低頻的字詞後, 約有60000維.
2. feature value assignment: 分別指派boolean value and tfidf value.

Experimental setting:
1. technical term的個數: 8624.
2. Gold-standard: domain experts.
3. Training and Testing: Four-fold cross validation.
4. The number of categories: 4 (machine learning, natural language processing, statistics probability, and world wide web)

Results:
(1): Boolean value model: Precision (85.25%), Recall (68.5%) and F-measure (72.75%)
(2): TF-IDF value model: Precision (92.75%), Recall (48.25%) and F-measure (62.5%)

接下來, 將boolean model及tf-idf model 產生出來的結果, 結合time-stamp及research subjects (取自於ACM 的階層分類, 例如: Information System, Artificial Intelligence…等), 進行趨勢的分析比較.

No comments: