Stemming

Stemming is the process by which various methods are used to reduce words to just the root portion or stem of a word.  For example the stem of running would be run.  The English language has numerous exceptions which makes this process not so straight forward.  It remains questionable however as a method of simplifying the context as by its nature stemming loses information.  Information that can be important in understanding the sentence.  When comparing text, being able to compare text in a tense indifferent way however can be very useful and as a result selective stemming can improve matching.

Alski has written an English language stemmer (based upon the Porter Stemmer) written in c#.  The code can be found here

Leave a comment