Useful Snippets

Welcome!


This blog is used to collect useful snippets related to Linux, PHP, MySQL and more. Feel free to post comments with improvements or questions!

RSS Latest posts from my personal blog


Most viewed posts


Subscribe to RSS feed


Stemming different languages in PHP

Stanislav KhromovStanislav Khromov

While working with a Naive Bayes Classifier in PHP, I needed to do some stemming. In particular I needed Porter stemming in Swedish, but most libraries provide only English.

PECL stem to the rescue!

It is easily installed using:

pecl install stem

And adding the following to your php.ini:

extension=stem.so

(Don’t forget to restart Apache!)

The stem package is based on the Snowball API. Currently the PECL package supports the following languages:

Using it is as simple as stem_LANGUAGE($word).

For example, to stem an english word:

echo stem_english('judges'); //Returns the stem, "judg"

Stemming a swedish word is just as easy:

echo stem_swedish('affärscheferna'); //Returns the stem, "affärschef"

Alternatives

If you are looking for a PHP-only solution which does not need an additional Apache module, I can recommend the Porter Stemmer by Cam Spiers.

Web Developer at Aftonbladet (Schibsted Media Group)
Any opinions on this blog are my own and do not reflect the views of my employer.
LinkedIn
Twitter
WordPress.org Profile
Visit my other blog

Comments 16
  • Himanshu
    Posted on

    Himanshu Himanshu

    Reply Author

    Hi,

    Thanks for the tutorial. It was extremely helpful as no documentation of pecl stem exists.
    I tried stemming using stem_english($word) but I am getting the following error:


    PHP Fatal error: Call to undefined function stem_english()

    However, when I use stem($word), it is stemming it (but not efficiently).

    Am I doing something wrong?

    Regards,
    Himanshu Joshi


    • Stanislav Khromov
      Posted on

      Stanislav Khromov Stanislav Khromov

      Reply Author

      Hi Himanshu,

      After you run “pecl install stem” you get asked a bunch of questions about which languages you would like to compile into your stemmer. Make sure you select “yes” for the english stemmer. It looks like this:

      ...
      Compile English stemmer? [yes] : yes
      ...
      

      Afterwards, you can use the stem_english function.

      If you only require english stemming, you may also use the porter-stemmer written by Cam Spiers, which requires no additional modules: https://github.com/camspiers/porter-stemmer

      Edit: You can see which languages are available in your stemmer by checking phpinfo(); under “stem support”


  • Benjamin Intal
    Posted on

    Benjamin Intal Benjamin Intal

    Reply Author

    Very helpful article! For me I had to install pecl first since my server didn’t have it yet:

    apt-get install php-http
    pecl install pecl_http

    I’ll be using this to build my search index.


  • Xavier
    Posted on

    Xavier Xavier

    Reply Author

    Hi,

    After installing PEAR, and placing the file “stem-1.5.1.tgz” under the “php5.4.3” folder, I tried “install pecl stem”, but got an error “The DSP stem.dsp does not exist”. See the log below. It is strange because the “stem.dsp ” file was included in “stem-1.5.1.tgz”. Any idea to fix that ? Also how can I uninstall it ? Thanks for your help.

    C:\wamp\bin\php\php5.4.3>pecl install stem
    downloading stem-1.5.1.tgz …
    Starting to download stem-1.5.1.tgz (82,665 bytes)
    ………………..done: 82,665 bytes
    43 source files, building
    WARNING: php_bin C:\wamp\bin\php\php5.4.3\php.exe appears to have a suffix \php5
    .4.3\php.exe, but config variable php_suffix does not match
    ERROR: The DSP stem.dsp does not exist.


  • Xavier
    Posted on

    Xavier Xavier

    Reply Author

    Hi Stanislav,

    Thanks for your reply. I copied locally the zip file “php_stem-1.5.1-5.5-ts-vc11-x86” and placed the dll “php_stem” Under the ext folder. I also added the line “extension=php_stem.dll” in the php.ini file.
    Probably did I miss some steps since the stem method is still generating an error when running my php method : Fatal error: Call to undefined function stem_english(). Do you have some idea what is wrong here ?


    • Stanislav Khromov
      Posted on

      Stanislav Khromov Stanislav Khromov

      Reply Author

      Hi Xavier,

      Can you check if the extension appears when you do a phpinfo(); ? (Search for stem on the page)

      If it does not appear, the extension was not installed correctly. I know many LAMP stacks on Windows have multiple PHP versions, make sure you added the extension to the correct folder.


  • Alex
    Posted on

    Alex Alex

    Reply Author

    Looks like stem_russian() is not working correct:

    echo stem_russian("букеты");

    Gives the same string, always.


    • Stanislav Khromov
      Posted on

      Stanislav Khromov Stanislav Khromov

      Reply Author

      If I recall correctly, you get to pick which languages should be included during the pecl install. Make sure you did select Y(es) for english.


  • Gokhan
    Posted on

    Gokhan Gokhan

    Reply Author

    I instal with PECL. I added English and Turkish Languages. But when I write stem_turkish() php gives error “Fatal error: Call to undefined function stem_turkish() in /var/www/servis/srvc/index.php on line 7”.

    When I try stem_english() works fine. What is the problem?


  • GÖKHAN ÇANCILAR
    Posted on

    GÖKHAN ÇANCILAR GÖKHAN ÇANCILAR

    Reply Author

    I checked. I have already installed 1.5.1. I don’t understand :(


  • GÖKHAN ÇANCILAR
    Posted on

    GÖKHAN ÇANCILAR GÖKHAN ÇANCILAR

    Reply Author

    Any help for turkish_stem() ?