Since converting to a rake powered site, I've missed having search. To that end I've started experimenting coding search on the client side.
My (naive) implementation involves creating an index of all words, with a list of the documents that contain them (in json):
var words = {
'hello': [0,1],
'world': [1],
}
var articles = [
['/article/first', 'Title of post with hello world in it'],
['/article/second', 'Title of post about world of svg'],
]
You can see my articles.json that I create with a rake task:
file 'html/search/articles.json' => ['html/articles/index.html', 'html/search', 'theme/articles.json'] do
articles = FileList['data/articles/*.yml'].collect { |fn|
data = YAML.load(File.open(fn).read)
data['permalink'] = File.basename(fn)[0...-4]
data
}
articles = articles.sort_by { |d| d['created_at'] }.reverse
all_words = {}
articles.each do |article|
body = article['body']
body.gsub!(/<[^>]*>/m, ' ')
body.gsub!(/[^a-zA-Z0-9]/, ' ')
words = body.downcase.split(' ')
words.uniq!
words.each do |w|
all_words[w] ||= []
all_words[w] << article
end
end
article_json_template = ERB.new File.open('theme/articles.json').read
File.open('html/search/articles.json', 'w') { |f| f.write article_json_template.result(binding) }
end
Then I can provide a simple "live search" for a single word via:
function search(word) {
var matches = {}
if (word in words) {
for (idx in words[word]) {
matches[words[word][idx]] = true;
}
}
var results = '';
for (var article_idx in matches) {
results += "<a href=\""+ articles[article_idx][0]+ "\">" + articles[article_idx][1] +"</a>" + "<br />"
}
document.getElementById('results').innerHTML = results;
}
A somewhat more complicated version is in use on my search page (view source.) This version searches when the input field changes and allows for searching multiple words.
It would be nice to add substring matching, support for boolean operations on multiple words, and a compact representation (bloom filters?). Anyone else doing search on the client side?
