While working with a Naive Bayes Classifier in PHP, I needed to do some stemming. In particular I needed Porter stemming in Swedish, but most libraries provide only English.
PECL stem to the rescue!
It is easily installed using:
pecl install stem
And adding the following to your php.ini:
extension=stem.so
(Don’t forget to restart Apache!)
The stem package is based on the Snowball API. Currently the PECL package supports the following languages:
- Danish
- Dutch
- English
- Finnish
- French
- German
- Hungarian
- Italian
- Norwegian
- Portuguese
- Romanian
- Russian
- Russian (UTF8)
- Spanish
- Swedish
- Turkish (UTF8)
Using it is as simple as stem_LANGUAGE($word).
For example, to stem an english word:
echo stem_english('judges'); //Returns the stem, "judg"
Stemming a swedish word is just as easy:
echo stem_swedish('affärscheferna'); //Returns the stem, "affärschef"
Alternatives
If you are looking for a PHP-only solution which does not need an additional Apache module, I can recommend the Porter Stemmer by Cam Spiers.