While working with a Naive Bayes Classifier in PHP, I needed to do some stemming. In particular I needed Porter stemming in Swedish, but most libraries provide only English.
PECL stem to the rescue!
It is easily installed using:
pecl install stem
And adding the following to your php.ini:
(Don’t forget to restart Apache!)
The stem package is based on the Snowball API. Currently the PECL package supports the following languages:
- Russian (UTF8)
- Turkish (UTF8)
Using it is as simple as stem_LANGUAGE($word).
For example, to stem an english word:
echo stem_english('judges'); //Returns the stem, "judg"
Stemming a swedish word is just as easy:
echo stem_swedish('affärscheferna'); //Returns the stem, "affärschef"
If you are looking for a PHP-only solution which does not need an additional Apache module, I can recommend the Porter Stemmer by Cam Spiers.