Dave Jones

My Place On The Net

Handling Non-UTF-8 Params In Rails

Today I woke up to the following 500 error notification from mindfulchoices:

invalid byte sequence in UTF-8
  app/controllers/page_controller.rb:21:in `contact'

Seemingly, a SEO bot in New York had submitted a message via the ‘Contact Us’ form, though it included a non-UTF-8 sequence:

"And what is better than traffic? It\x92s recurring traffic!"

After a little reading around the subject, it seems Rails tries everything it can to force browsers to encode their submissions as UTF-8.

What can you do if/when a client refuses to conform and sends non-unicode characters to your server? Surprisingly, I couldn’t find an idiomatic solution to this issue.

A post from the all-knowing thoughtbot switched me on to the String#encode method.

Since my code was already using Rails Strong Params, I bolted some extra code to the end of the method, stripping out any non-UTF-8 characters:

# Sanitise user-submitted data using strong parameters and enforcement of UTF-8 encoding
  def message_params
    params_hash = params.require(:message).permit(:name, :email, :phone, :body).to_hash
    params_hash.merge(params_hash).each do |key, value|
      value.encode!('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')

I’m curious, do you know of a better (more “Rails-y”) way to mitigate this problem?