overstimulate

powered by web.py + appengine

Mon, 28 Apr 2008 appengine webpy comments

I've managed to rewrite my blog again. This time to appengine using web.py.

I started with the demo app Aaron put together the nite appengine was released, and with some pointers from Kragen I was quickly 80% done with the new site. The next 15% involved figuring out how to get reCaptcha and HTML sanitization/cleanup working. Once that was done a few DNS changes and the new site is live.

To get reCaptcha integrated, I started with recaptcha-client 1.0.1, and modified it to use appengine's urlfetch:

def submit(recaptcha_challenge_field,
           recaptcha_response_field,
           private_key,
           remoteip):
    """
    Submits a reCAPTCHA request for verification. Returns RecaptchaResponse
    for the request

    recaptcha_challenge_field -- The value of recaptcha_challenge_field from the form
    recaptcha_response_field -- The value of recaptcha_response_field from the form
    private_key -- your reCAPTCHA private key
    remoteip -- the user's ip address
    """

    if not (recaptcha_response_field and recaptcha_challenge_field and
            len (recaptcha_response_field) and len (recaptcha_challenge_field)):
        return RecaptchaResponse(is_valid = False, error_code = 'incorrect-captcha-sol')

    params = {
        'privatekey': private_key,
        'remoteip' : remoteip,
        'challenge': recaptcha_challenge_field,
        'response' : recaptcha_response_field,
    }

    result = urlfetch.fetch(
        url = "http://%s/verify" % VERIFY_SERVER,
        payload = urlencode(params),
        method = urlfetch.POST,
        headers = {
            "Content-type": "application/x-www-form-urlencoded",
            "User-agent": "reCAPTCHA Python/AppEngine"
        }
    )
    
    if result.status_code == 200:
        return_values = result.content.splitlines()
        return_code = return_values[0]

        if (return_code == "true"):
            return RecaptchaResponse(is_valid=True)
        else:
            return RecaptchaResponse(is_valid=False, error_code = return_values[1])

Grabbing remote IP from web.py via web.ctx['ip']) now allows a simple to query to the reCAPTCHA service to check if you are human.

For HTML sanitization, I used Beautiful Soup. My sanitization code is run when a comment is added (as sanitizing comments when viewing an article caused appengine CPU utilization warnings.) The code is a modification of a django snippet

First I only allow absolute URLs that begin with http[s]:// instead of removing javascript: from the urls (since there are other ways to build bad urls)

absolute_url_matcher = re.compile("^https?://")

def url(URI):
    if absolute_url_matcher.match(URI):
        return URI

...

        tag.attrs = [(attr, val) for attr, val in tag.attrs
                     if attr in valid_attrs and url(val)]

As comments containing code snippets isn't uncommon, I tweaked how PRE tags are handled:

BeautifulSoup.QUOTE_TAGS['pre'] = None  # don't parse inside of PRE tags

...
        if tag.name == 'pre':
            # convert < into &lt;
            tag.replaceWith('<pre>%s</pre>' % tag.contents[0].replace('<', '&lt;'))

Finally I add a BR tag whenever I see two returns to create "paragraphs."

Unfortunately I need to make a few more tweaks as some of the old comments on my blog aren't formated nicely. I always prefer to store both the user's original input and the sanitized version, both so I can re-run the conversion and I can quickly see the offending html if a XSS hole is discovered.

So, why did I do this? I'm a fan of cloud computing, and have used every Amazon Web Service I could find a use/experiment for. While I prefer ruby to python, Google's cloud offering is very enticing, and only by using it can you really know the power/limitations.


Responses to "powered by web.py + appengine"

  1. Mon, 28 Apr 2008 says:
  2. Mon, 28 Apr 2008 says:
  3. Mon, 28 Apr 2008 says:
  4. Mon, 28 Apr 2008 Jesse Andrews says:
    The first three comments left were not recorded properly. I'm not sure why, but I've added some debug code to try to trace the issue down (and not lose data).
  5. Thu, 01 May 2008 Daffy Duck says:
    The cloud ate the comments.

Leave a response

My Card Add to your Address Book

Jesse Andrews
open source, web browsers, web services, web sites & folk dancing. contacts/sites

Keep Up To Date

Get updates via RSS or
get email when I blog

Previous Blog Posts