Simple Search

Since converting to a rake powered site, I've missed having search. To that end I've started experimenting coding search on the client side.

My (naive) implementation involves creating an index of all words, with a list of the documents that contain them (in json):


  var words = {
    'hello': [0,1],
    'world': [1],
  }
  var articles = [
    ['/article/first', 'Title of post with hello world in it'],
    ['/article/second', 'Title of post about world of svg'],      
  ]

You can see my articles.json that I create with a rake task:


  file 'html/search/articles.json' => ['html/articles/index.html', 'html/search', 'theme/articles.json'] do
    articles = FileList['data/articles/*.yml'].collect { |fn| 
      data = YAML.load(File.open(fn).read) 
      data['permalink'] = File.basename(fn)[0...-4]
      data
    }
    articles = articles.sort_by { |d| d['created_at'] }.reverse
    all_words = {}
    articles.each do |article|
      body = article['body']
      body.gsub!(/<[^>]*>/m, ' ')
      body.gsub!(/[^a-zA-Z0-9]/, ' ')
      words = body.downcase.split(' ')
      words.uniq!
      words.each do |w| 
        all_words[w] ||= []
        all_words[w] << article
      end
    end
    article_json_template = ERB.new File.open('theme/articles.json').read
    File.open('html/search/articles.json', 'w') { |f| f.write article_json_template.result(binding) }
  end

Then I can provide a simple "live search" for a single word via:


  function search(word) {
    var matches = {}
    if (word in words) {
      for (idx in words[word]) {
        matches[words[word][idx]] = true;
      }
    }

    var results = '';

    for (var article_idx in matches) {
      results += "<a href=\""+ articles[article_idx][0]+ "\">" + articles[article_idx][1] +"</a>" + "<br />"
    }
    document.getElementById('results').innerHTML = results;
  }

A somewhat more complicated version is in use on my search page (view source.) This version searches when the input field changes and allows for searching multiple words.

It would be nice to add substring matching, support for boolean operations on multiple words, and a compact representation (bloom filters?). Anyone else doing search on the client side?


Share/Save/Bookmark

Published

Sun, 01 Oct 2006

View Comments


Want more like this?

Subscribe via RSS
or by email:

New Relic