Développement Web à Toulouse: Rails validates_length

Why you should NEVER create a Postgres database with the default sql_ascii encoding set to use with Rails

Ruby prior to 1.9 considers strings as bytes, and that can cause some maddening issues. In my experience, all encoding issues are maddening...

Anyway, I had a validates_length_of like so:

I got a problematic lead from a customer (that included special Word characters that were multi-bytes), put them in a test and started a debug session:

What???

After some digging, Rails, in order to consider characters and not bytes, uses split(//).size instead of size directly. Split(//) is the default tokenizer of validates_length_of and it can be changed as an option passed to the validates method. See http://railspikes.com/2009/7/20/validates_length_of-gotcha for more details.

My original problem, though, was that the lead was 499 characters, but 517 bytes and my Postgres database was encoded with the default sql_ascii encoding set.

Therefore, my constraint on Postgres column lead "character varying(500)" translated to 500 bytes maximum...

Conclusion: never ever create a Postgres database with the default sql_ascii encoding set to use with Rails. Always use the utf8 encoding set. Even if you deal only with English: special characters exist in all languages...

Développement Web à Toulouse

25 septembre 2010

Rails validates_length_of and byte encoding

Why you should NEVER create a Postgres database with the default sql_ascii encoding set to use with Rails

Aucun commentaire:

Enregistrer un commentaire

Archives du blog

Qui êtes-vous ?