Setting up Full Text Search

For a recent project, we needed to install a full text search engine.

After doing some research and speaking with colleagues, we found the combination of Sphinx and Thinking_Sphinx to be a great way to get started.

Thinking_Sphinx also has a great Google Group for research and questions.

Install Sphinx

To install sphinx, its best to install it from source. If you are on Leopard, we followed Clinton Nixon's tut.

Otherwise, checkout the Sphinx homepage for info on other platforms.

Clinton also details installing Iconv and Expat before installing Sphinx.

It is a pretty straight forward walk through of how to install Sphinx in usr/local. Just be sure to visit the sphinx home page to get the latest version number for when you make the download call:

curl -O http://www.sphinxsearch.com/downloads/sphinx-version-here.tar.gz

Clinton's tut will save a lot of time versus trying to install it via macports.

Install Thinking_Sphinx

Now that Sphinx is installed locally, it is time to integrate Sphinx with your app.

download the plugin to your app:

script/plugin install git://github.com/freelancing-god/thinking-sphinx.git

You can also install it for merb or as a gem by:

#For Merb:
require 'thinking_sphinx'

From here its pretty easy to get Sphinx searching your app.

Index your model

To tell Sphinx what to search is simply:

#Define Index for Sphinx
define_index do
  indexes :name, :sortable => true
  indexes description
  indexes body1
  indexes reviews.content, :as => :reviews_content    
end

Just put the Index block in your Model.rb after any validations and associations.

There are a lot of options you can use including:

  • :sortable tells what to sort the results by
  • :as lets you include an associated model in the search.

I would recommend heading to the Thinking_Sphinx site to reference them all.

When you have the properties you want indexed, run the rake command to build the index.

rake ts:index #will index all the properties
rake ts:start #starts the searchd daemon
rake ts:stop #for when you want to shut it down

Call the Search from your Controller

To view the results, you can make a search find on you model like:

@businesses = Business.search params[:search], :include => :reviews, :field_weights => { :name => 20, :body1 => 15, :description => 10, :reviews_content => 5 }

:field weights is a great option that lets you set which properties are considered a better match.

The higher the number, results in each property will be list first.

:include allows you to include the associated models.

Pagination is built in and will interface with Will_Paginate.

Just use:

:page => params[:page], :per_page => 42

in the options. Then you can use the will_paginate call in your view.

This is just the tip of iceberg with Sphinx, it is very powerful and can do much more.

Check out Delta Indexes if you have data that changes regularly between scheduled indexes.

Running in Production

A few tips for running Sphinx in production:

  • Ignore the db/sphinx folder, searchd logs, and production.sphinx.conf in your .gitignore
  • Create a Sphinx folder in your shared folder outside of your releases and symlink app/db/sphinx to the shared folder.
  • Use rake ts:index RAILS_ENV=production if you are running the command.
  • If you are having path issues, specify the sphinx path by creating a sphinx.yml in app/config
production:
  bin_path: 'path/to/searchd'
#find path by running which searchd on your server

Whenever

The only downside to Thinking_Sphinx is that is does not index itself automatically.

Whenever make it easy to create cron jobs to index sphinx.

To get started, just install the gem:

gem install javan-whenever

Include it in your app:

config.gem 'javan-whenever', :lib => false, :source => 'http://gems.github.com'

or

require 'javan-whenever'

cd to Your app and Run:

whenever .

This adds the schedule.rb to your app/config folder.

From there, you can use ruby code to create any number of cronjobs like:

every 2.hours do
  rake "thinking_sphinx:index"
end

every :reboot do
  rake "thinking_sphinx:start"
end

You can also change where the log is sent by:

set :cron_log, "/path/to/your/whenever_log"

Whenever defaults to your app's root path, but you can override it with:

set :path, "/path/new/.../"

Update Deploy

If you want to have the crontab generated on deploy, Whenever has this recipe:

after "deploy:symlink", "deploy:update_crontab"

  namespace :deploy do
    desc "Update the crontab file"
    task :update_crontab, :roles => :db do
      run "cd #{release_path} && whenever --update-crontab #{application}"
    end
  end

Troubleshooting Whenever

If you are getting path issues with rake, try updating to the latest version of the gem, or explicitly stating the rake path using the command helper instead of rake.

command, "cd /your/app/path && your/rake/path/rake ts:index RAILS_ENV=production"

Again, there is more info to reference on the Whenever site.