ezSearch - mySQL full text search

For a project I recently worked on I needed to make a full text search engine -- a prospect I know 'all too well' having built a number of these in the past.

I usually end up making full text search with separate word indexe tables (so that I can add special weightings to different words found in different parts of the document and also so that I can do porter stemming etc). Along with that comes things like cron driven search index update scripts, etc. (to maintain the system).

This time I decided to do a little research into mySQL to see if the full text search was any faster than the last time I looked (about 4 years ago), last time I looked at it was too slow and not a viable option.

This time round, after a bit of digging, I came up with the following fully weighted full text search engine with hardly any lines of code at all :)

Lovely if your lazy!!!

Best thing is it seems to be returning results from a million row table in less than a second (on my local machine).

So here it is. A quick, weighted, full text search engine, for lazy sods (in mySQL). Enjoy.

(Requires ezSQL)


    // Table structure
    CREATE TABLE stories
    (
    story_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
    category VARCHAR(255),
    title VARCHAR(255),
    body TEXT,
    FULLTEXT (title,category,body),
    FULLTEXT (title),
    FULLTEXT (category),
    FULLTEXT (body),
    PRIMARY KEY (story_id)
    )

    // Incoming search
    $query = "this is my search";

    // Main query
    $db->get_results("
    SELECT
    *,
    (MATCH (category) AGAINST ('$query' IN BOOLEAN MODE)*100) +
    (MATCH (title) AGAINST ('$query' IN BOOLEAN MODE)*10) +
    MATCH (body) AGAINST ('$query' IN BOOLEAN MODE) AS rating
    FROM
    stories
    WHERE
    MATCH (title,category,body) AGAINST ('$query' IN BOOLEAN MODE)
    ORDER BY
    rating DESC
    LIMIT 0,10
    ");

Note 1: 'category' is weighted as 100 times more important than 'body'. 'title' is weighted as 10 times more important than 'body'. Tweak as required.

Note 2: It requires a 'big' data set to work nicely

Note 3: mySQL ignores search terms of 4 chars or less by default

Note 4: Try removing 'IN BOOLEAN MODE' (all instances). The results 'might' be more accurate.

Note 5: Does not include porter stemming.