Sinatra, Sequel, HAML, PostgreSQL, UTF-8 and Ruby 1.9.1

Lately, with my friend and colleague Joseph, we made some experiments with the Sinatra Ruby Framework.

As part of the experiment we’ve chosen to stick with our DB of choice PostgreSQL and our preferred template engine HAML, but we decided to give the Sequel ORM a try as well as using Ruby 1.9.1. We also wanted to store our datas as UTF-8 (this part is the most painful of all).

Our first goal was to have a simple application tying everything together and testable with RSpec and Cucumber.

Using Sinatra and HAML is a snap, as it’s a core feature of Sinatra, just be sure to use HAML version >= 2.2.0 as it includes some work to support new Ruby 1.9 String Encoding.

Then comes Sequel, using it is as simple as requiring it and feeding it with database connection information, just be sure to set the :encoding => 'UTF-8' (cf : Sequel::Database.connect method).
Sequel is great in that it has adapters for most commons connectors, first we tried DataObjects’s do_postgres as it should support asynchronous query (and it does !), but we had to fall back to the PG one and even to a patched version.

Let me explain the problem here, and be warned it’s not limited to Sequel, but to any ORM using currently available db connectors, when using a charset different from ASCII-8BIT under Ruby 1.9.
What happens is that ORMs do not force any encoding on String returned by the database connectors even when you specified an encoding (commonly used to set the connection’s “client_encoding”). Current ruby connectors (under Ruby 1.9) do not use the database/client’s connection encoding as a “hint” to determine and set the encoding of returned values.
This is not a problem on Ruby 1.8, but on Ruby 1.9 you get some weird results, the string returned from db have a default encoding “ASCII-8BIT” (in fact default for BINARY), as ORMs do not force encodings this result in a String with the bad Encoding.
Try to display it on a page and you’re welcomed with friendly “incompatible character encodings: ASCII-8BIT and UTF-8.” messages or try to use Webrat with RSpec matchers and you get “incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)”.

So here are the libraries versions to use to have a working Sinatra, Sequel with Postgres, HAML, RSpec, Cucumber stack :

  • Sinatra >= 0.9.4
  • Rack-Test >= 0.4.1
  • Webrat >= 0.5.0
  • kamk-pg >= (
  • Sequel >= 3.0.0
  • RSpec >= 1.2.8
  • Cucumber >= 0.3.94

Then you need to monkey-patch Rack (this is highly untested, it worked for my current app but it should not be used in production environment) :

module Rack
  module Utils
    def escape(s)
      regexp = case
        when RUBY_VERSION >= "1.9" && s.encoding === Encoding.find('UTF-8')
          /([^ a-zA-Z0-9_.-]+)/u
          /([^ a-zA-Z0-9_.-]+)/n
      s.to_s.gsub(regexp) {
      }.tr(' ', '+')

This does 2 things :

  • it’s using bytesize($1) to correctly handle multibyte chars (taken from

  • it’s using a regexp with “u” (unicode) modifier when dealing with UTF-8 Strings under Ruby 1.9.

There you go, you can finally test your Sinatra apps outside-in even when using non-us languages under Ruby 1.9. Isn’t that great ?

Thu, 20 Aug 2009 10:38 Posted in

  1. By Micah 02/09/2009 at 08h15

    Hi, I can’t seem to find a contact form on your site. I am having trouble viewing the single post of a previous story you wrote: When I try to view this story directly, it just gives me a blank white page. I thought you would want to address this problem.

  2. By Jonathan Tron 02/09/2009 at 10h04

    Thanks a lot, this is corrected.

  3. By Mira 14/07/2011 at 10h43

    A wonderful job. Super helpful infmoraiton.

Comment Sinatra, Sequel, HAML, PostgreSQL, UTF-8 and Ruby 1.9.1