Introducing Stemming and Un-gram

2007/04/23
Due to the high dimension, I introduced the stemming and uni-gram methods in my programs for reducing the dimension of technical term. Because my programs are suitable for intact data (without stemming or paragraphing) initially. Thus, I spent some time for rewriting my system. Afterward, the dimension of technical terms can be reduced from 11848 to 3838. Then, I will put this feature into Auto-Class for clustering.

No comments: