While moving this site back to rails (edge) from a rake based static site, I added HTML validation to the articles model.
The first step is to install libxml-ruby. This can be done via rubygems: gem install libxml-ruby
To validate if a string is valid html, you will need to wrap it inside a div, otherwise you will get:
parser error : Extra content at the end of the document
parser = XML::Parser.new
parser.string = "<div>#{html}</div>"
parser.parse
If you run the previous code in a IRB session, parser.parse returns an XML::Document even if the document has problems. If the document has problems stderr will contain the errors (pointing to them with a carrot.) In a web app, having the errors go to stderr is probably not what you want to do. To show the errors to the user, capture the errors by creating a custom error handler.
parser = XML::Parser.new
parser.string = "<div>#{self.body}</div>"
msgs = []
XML::Parser.register_error_handler lambda { |msg| msgs << msg }
begin
parser.parse
rescue Exception => e
errors.add("body", '<pre>' + msgs.collect{|c| c.gsub('<', '<') }.join + '</pre>')
end
I added a <pre> around the error messages so that they can be presented to the user using the standard helper method error_messages_for. Then adding some css to make the errors fixed width, I get useful error reporting on invalid html.
.errorExplanation pre {
font-family: monospace;
}
Responses to "Validating HTML in Ruby with libxml"
Leave a response