28 November 2011

spk, or, rspec for lazies

My tests fail. Fortunately, they fail, otherwise I wouldn't be sure they were really working. Anyway, my full rspec suite is over 1600 examples now, and it takes about 90 seconds to run (including 20 seconds of overhead booting rails). After tweaking my code, I want to re-run only the failing tests, and I hate typing pathnames. Something I would also like to do, is re-run my specs but only a very narrow subset, without having to type in the entire path to the spec in question.

For example:

$ spk

/Users/conan/myproject/spec/controllers/admin/invoices_controller_spec.rb:96 Expected 'this', got 'that'

1 failed, etc etc

$ spk in/invoices_controller_s

And the last line will run just the spec file that previously failed.

What's more, using rspec, I can easily choose a directory of specs to run, like this:

$ rspec spec/controllers/

but occasionally I would like to run only a specific subset of specs, for example invoice-related specs, because I've made big changes to my invoice class, and I want to check for ripples in neighbouring code:

$ rspec spec/controllers/admin/invoices_controller_spec.rb spec/models/invoice_spec.rb spec/views/admin/invoices/* spec/helpers/admin/invoices_helper_spec.rb

without having to type all that stuff.

In other words, I want to select a horizontal subset of my specs. For example, if I just want to run invoice-related specs:

$ spk invoice


The good news is: here's a wee script to do just that:


if [ "$1" = "" ]
  time rspec spec
  SPK_FILES=`find spec -type f -name "*_spec.rb" | grep $1`
  echo $SPK_FILES
  time rspec $SPK_FILES  --format documentation

Call it "spk" or whatever you want, put it in your path, chmod it 755, and use it as indicated above.


12 November 2011

Lexmark considered evil

Right, you'll tell me, they make printers, what was I expecting? Well, I stumbled on this only today: I can't print a black/white-only document using my Lexmark S600 if the colour cartridges are low. There is no technical justification for this. Black/white printers have been printing black/white documents for many years without needing colour cartridges. A Lexmark S600 could do the same. But Lexmark clearly cares less about you than about its bank account.

I'd love to say, never buy a Lexmark again, but I don't know of another manufacturer that's less crappy. Suggestions welcome.

26 September 2011

Incestuous Sed, or, 's_..*_s/\\"&\\"/\\\\\\"&\\\\\\"/g_g'

I have a CSV file to import; I have no control over the producer of this file, and its output is unfortunately non-conforming; it encloses every non-numeric field in double quotes, and fails to escape double quotes within the field. In other words, I have something like this:

1,"Foo","A "Lord of the Rings" expert","Blah",123.45

While it is possible to imagine a parser that might be able to cope with this, in my case I'm importing into MySQL ("load data infile ...") and MySQL has no plans to accommodate this kind of CSV right now today. In order to import this, I need to transform it to the following:

1,"Foo","A \"Lord of the Rings\" expert","Blah",123.45

One solution is to detect quotes that are not part of the pattern /","/, but that gets tricky for first, last, and numeric columns. Given that the data is finite and changes slowly, I decided to write down the expressions that needed fixing, and write a sed expression to fix them.

To fix the above, all I need is

cat data.csv | sed 's/\"Lord\ of\ the\ Rings\"/\\\"Lord\ of\ the\ Rings\\\"/g' > clean-data.csv

But given a list of expressions to fix, I don't want to go the error-prone way of typing out all these sed commands line-by-line, making sure to escape all the spaces and other special chars, and counting backslashes. What can I use to transform these expressions into sed commands? Why, sed, of course! Here's how I transform a list of expressions into a list of corresponding quote-escaping sed commands, for use immediately afterwards in the same script:

cat quoted_terms.txt | sed -e 's/( |\?)/\\\&/g' -e 's_.*_s/\\"&\\"/\\\\\\"&\\\\\\"/g_g' > clean.sed
  cat data.csv | sed -f clean.sed > clean-data.csv

Yes, it's Backslash Hell!! The first line transforms this:

Lord of the Rings
The Canterbury Tales
Is Anybody Home?

into this:

s/\"Lord\ of\ the\ Rings\"/\\\"Lord\ of\ the\ Rings\\\"/g
s/\"The\ Canterbury\ Tales\"/\\\"The\ Canterbury\ Tales\\\"/g
s/\"Is\ Anybody\ Home\?\"/\\\"Is\ Anybody\ Home\?\\\"/g

Then the second line transforms this:

1,"Foo","A "Lord of the Rings" expert","Blah",123.45
2,"Bar","Read all of "The Canterbury Tales"","Blah",234.56
4,"Titi","Asked "Is Anybody Home?"","Blah",456.78

into this:

1,"Foo","A \"Lord of the Rings\" expert","Blah",123.45
2,"Bar","Read all of \"The Canterbury Tales\"","Blah",234.56
4,"Titi","Asked \"Is Anybody Home?\"","Blah",456.78

And voilà, clean csv all ready for import ... all thanks to the power of sed to mate with itself and generate more sed ...

Now, you can go and enjoy Sed - An Introduction and Tutorial by Bruce Barnett, because I'm not going to try explain all those backslashes

03 March 2011

Convert Your MYSQL Database from Latin-1 to UTF-8

It all started when I tried upgrading to ruby 1.9.2 and learned more than I ever wanted to know about character encodings. All of a sudden, my site was showing text humans were never supposed to read, with gibberish in place of recognisably foreign accented letters.

I tried using the mysql2 gem, and setting Encoding.default_external = 'UTF-8' in my environment.rb, these steps were necessary but not enough.

After much googling, it became evident that I had to go through each text field in each row in each table, and convert each latin-1 character to utf-8.

You would think that alter table #{table} convert to character set utf8 would do the trick, but no. You would be wrong. At least, I was.

Many authors have chimed in on this topic, but my hat goes off to Derek Sivers who showed the light in an O'Reilly article on converting latin1 to utf-8 in mysql.

I didn't want to do all the work he did, and figured a rails/activerecord migration might ease the pain somewhat. Below you'll find what I came up with. Re-use as you please. You'll need to specify the table/column names that need converting, and you might want to make sure I've covered all the characters that matter to you.

Basically, all this does is iterate over the tables and columns you specify, and then iterates over all the shady latin-1 characters you need to fix, and asks mysql to replace them with the utf-8 equivalent. Someone with stronger mysql-fu might find a cleverer way to do this; in the meantime, here goes:

# encoding: UTF-8

class ConvertMySqlLatin1ColumnsToUtf8 < ActiveRecord::Migration
  def self.up

    execute("set names utf8")

    # change this hash for your application. This example here is for a
    # totally original blog application concept.
    keys = {
        :authors   => %w{first_name last_name},
        :blogs => %w{name description},
        :entries => %w{title content tags},
        :comments => %w{content}

    conversions = {
      'C383C2A1'         => 'á', 'C383C2A0'       => 'à', 'C383C2A4'       => 'ä', 'C383C2A2'   => 'â',
      'C383C2A9'         => 'é', 'C383C2A8'       => 'è', 'C383C2AB'       => 'ë', 'C383C2AA'   => 'ê',
      'C383C2AD'         => 'í', 'C383C2AC'       => 'ì', 'C383C2AF'       => 'ï', 'C383C2AE'   => 'î',
      'C383C2B3'         => 'ó', 'C383C2B2'       => 'ò', 'C383C2B6'       => 'ö', 'C383C2B4'   => 'ô',
      'C383C2BA'         => 'ú', 'C383C2B9'       => 'ù', 'C383C2BC'       => 'ü', 'C383C2BB'   => 'û',
      'C383C281'         => 'Á', 'C383E282AC'     => 'À', 'C383E2809E'     => 'Ä', 'C383E2809A' => 'Â',
      'C383E280B0'       => 'É', 'C383CB86'       => 'È', 'C383E280B9'     => 'Ë', 'C383C5A0'   => 'Ê',
      'C383C28D'         => 'Í', 'C383C592'       => 'Ì', 'C383C28F'       => 'Ï', 'C383C5BD'   => 'Î',
      'C383E2809C'       => 'Ó', 'C383E28099'     => 'Ò', 'C383E28093'     => 'Ö', 'C383E2809D' => 'Ô',
      'C383C5A1'         => 'Ú', 'C383E284A2'     => 'Ù', 'C383C593'       => 'Ü', 'C383E280BA' => 'Û',
      'C385C2B8'         => 'Ÿ', 'C385E2809C'     => 'œ', 'C383C2B8'       => 'ø', 'C383C2BF'   => 'ÿ',
      'C3A2E282ACC593'   => '“', 'C3A2E282ACC29D' => '”', 'C3A2E282ACCB9C' => '‘',
      'C3A2E282ACE284A2' => '’', 'C382C2AB'       => '«', 'C382C2BB'       => '»',
      'C383C2A5'         => 'å', 'C383E280A6'     => 'Å', 'C383C5B8'       => 'ß', 'C383E280A0' => 'Æ', 
      'C383C2A7'         => 'ç', 'C383E280A1'     => 'Ç', 'C383C2B1'       => 'ñ', 'C383E28098' => 'Ñ', 
      'C383C2A3'         => 'ã', 'C383C2B5'       => 'õ', 'C383C692'       => 'Ã', 'C383E280A2' => 'Õ'

    keys.each { |table, columns|
      execute "alter table #{table} convert to character set utf8"
      columns.each { |column|
        conversions.each { |hex, utf8|
          execute("update #{table} set #{column} = replace(#{column}, unhex('#{hex}'), '#{utf8}') where #{column} regexp unhex('#{hex}');")

  def self.down
    # left as an exercise for the reader :)

The # encoding comment at the beginning is important, don't leave it out or ruby 1.9.2 will complain.

Use this to check you've covered all the relevant text columns:

mysql> use information_schema
mysql> select table_name, column_name from columns where table_schema = '__YOUR_DB_NAME__' and (data_type = 'varchar' or data_type = 'text');

(There might be other relevant data types, like 'mediumtext' that you have to deal with; don't just take my word for it)

Here are some other places covering latin1/utf8 conversion:


14 February 2011

AWS SES RequestExpired

SendGrid tells me I need a reseller account, and Postmark won't let me send newsletter-style messages; so it's time to try Amazon's Simple Email Service. All I need is a big machine that takes care of delivering mail, the rest is fluff.

I'm using drewblas/aws-ses. Somewhere between ActionMailer and AWS::SES, errors are swallowed and your application fails to let you know that emails aren't getting sent. By the time I had broken the "fetch mail" button on my mail client, it was time to run rails console on the server to figure out what was going on:

$ RAILS_ENV=staging rails console
Loading staging environment (Rails 3.0.3)
irb(main):001:0> require "aws/ses"
irb(main):003:0> ses = AWS::SES::Base.new :access_key_id => "your_access_key", :secret_access_key => "not_telling_you"
=> #<AWS::SES::Base:0x7fd3607d0308 etc... >
irb(main):003:0> ses.send_email :to => ['me@my.domain'], :source => 'test@other.domain', :subject => 'Testing', :text_body => 'Yes, testing!'
AWS::SES::ResponseError: AWS::SES Response Error: RequestExpiredRequest timestamp: Mon, 14 Feb 2011 10:13:32 GMT expired.  It must be within 300 secs/ of server time.

It turns out that my server's clock was racing into the future. I like how Slicehost moves fast, but I wasn't expecting observable relativistic effects. My server was 8 whole minutes ahead of the rest of the world. If I wasn't busy building my cool new site I could have used it to game the stock market or something wicked like that ...

Anyway, thanks to Code Ghar here's the solution:

$ date
Mon Feb 14 10:16:24 UTC 2011
$ sudo ntpdate pool.ntp.org
14 Feb 10:08:55 ntpdate[25724]: step time server offset -639.622800 sec
$ date
Mon Feb 14 10:09:01 UTC 2011

Happy mailing!

03 February 2011

Paperclip, S3, and European Buckets

UPDATE @englandpost points out that newer versions of paperclip support the :s3_host_name option, see http://rubydoc.info/gems/paperclip/Paperclip/Storage/S3. Thanks @englandpost

So you have your European S3 bucket thinking how cool you can select buckets near where your customers live, you gem install paperclip aws_s3 and do the dances and the rails and the rituals and the cap production deploy, and you get

The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.

Not Fair!

The interblags reccomend you use gem install s3 instead, with appropriate monkeypatches for Paperclip::Storage::S3, but you might end up with this:

The request signature we calculated does not match the signature you provided. Check your key and signing method.

I tried and I tried, honest, s3 just wouldn't play the signature game by amazon's rules ... couldn't get anything to work, until I stumbled on http://www.mail-archive.com/heroku@googlegroups.com/msg05407.html in which the great and goodly Dan Croak recommends you put this in config/environment.rb:

AWS::S3::DEFAULT_HOST = "s3-eu-west-1.amazonaws.com"

Well lo and behold I was finally able to upload stuff, my pretty pictures are showing up in my AWS console.

But you're not done yet: you still need to generate the correct URL (my_model.my_attachment.url) for your pictures and mp3s and videos and Large Objects and whatever your pushing up to the clouds there ... Paperclip::Storage::S3 kindly hard-codes "s3.amazonaws.com" for you, and it doesn't work.

Here's the fix:

# in config/initializers/something.rb
Paperclip.interpolates(:s3_eu_url) { |attachment, style|
  "#{attachment.s3_protocol}://s3-eu-west-1.amazonaws.com/#{attachment.bucket_name}/#{attachment.path(style).gsub(%r{^/}, "")}"

# in your model
has_attached_file :image, 
  :storage => :s3,
  :s3_credentials => "#{Rails.root}/config/s3.yml",
  :path => "for/example/:id/:style.:extension",
  :url  => ":s3_eu_url"

This is a big song and dance to simply tell paperclip how to construct the s3 url. Left to itself, paperclip will replace your url setting with one of its own (":s3_path_url") if it doesn't match /^:s3.*url$/. Hence, the interpolation above is called "s3_eu_url", you can write your own for singapore or whatever far-flung place you've dumped your bucket.