Tag Archives: Dealesque

Nokogiri vs Crack & Hashie

Recently I wrote about using Vacuum gem for accessing Amazon Product Advertising API. As the result of API calls, it returns vanilla XML response from Amazon API. There are gems that returns PORO’s, but the “bare metal” access I get from Vacuum made it a great gem to use for several reasons (breakdown of those reasons and other existing gems is a topic for another post). Parsing the Amazon responses is a bit of a funny issue since information is located at various points. Here’s an example:

  root = Nokogiri::XML(response.body).remove_namespaces!
  items = parse_items(root)

  def parse_items(node)
    node.xpath('//Items/Item').map do |item_node|

  def create_item_from(node)
    attributes = {}
    attributes[:id] = parse_value(node, './ASIN')
    attributes[:title] = parse_value(node, './ItemAttributes/Title')
    attributes[:url] = parse_value(node, './DetailPageURL')
    attributes[:group] = parse_value(node, './ItemAttributes/ProductGroup')
    attributes[:images] = parse_item_images(node)

  def parse_item_images(node)
    image_sets = node.xpath('./ImageSets/ImageSet')
    return if image_sets.children.size == 0

    image_set = image_sets.find {|image_set| image_set.attribute('Category').value == 'primary'} || image_sets.first
    image_set.xpath('./*').inject(Hash.new) do |images, image_node|
      image = create_item_image_from(image_node)
      images[image.type] = image

  def create_item_image_from(node)
    attributes = {}
    attributes[:url] = parse_value(node, './URL')
    attributes[:height] = parse_value(node, './Height', :to_i)
    attributes[:width] = parse_value(node, './Width', :to_i)
    attributes[:type] = node.name.gsub("Image", "").downcase

  def parse_value(node, path, apply_method = nil)
    nodes = node.xpath(path)
    if nodes.first
      value = nodes.first.content
      value = value.respond_to?(:strip) ? value.strip : value
      apply_method ? value.send(apply_method) : value

As you can see, my domain objects don’t map exactly to Amazon structure. But, that’s what this parser is all about. It translates API responses to what I need in my domain. What bothered me was that I needed to use hard-coded paths to specific information, so I went looking for another solution. This is where Crack & Hashie gems come in. The first one converts received XML to a Hash, and the second one enables nicer Hash parsing. The code seems nicer:

  data = Hashie::Mash.new(Crack::XML.parse(response.body))
  items = parse_items(data)

  def parse_items(data)
    items_data = data.ItemSearchResponse.Items.Item
    if items_data.kind_of?(Array)
      items_data.map {|item_data| create_item_from(item_data)}

  def create_item_from(data)
    attributes = {}
    attributes[:id] = data.ASIN
    attributes[:title] = data.ItemAttributes.Title
    attributes[:url] = data.DetailPageURL
    attributes[:group] = data.ItemAttributes.ProductGroup
    attributes[:images] = parse_item_images(data)

  def parse_item_images(data)
    return unless data.respond_to?(:ImageSets)

    image_set = data.ImageSets.ImageSet
    image_set = image_set.find { |image_set| image_set.Category == 'primary' } || image_set.first if image_set.kind_of?(Array)
    image_set.keys.select { |key| key =~ /.*Image/ }.inject(Hash.new) do |images, key|
      image = create_item_image_from(image_set.send(key), key)
      images[image.type] = image

  def create_item_image_from(item_data, type)
    attributes = {}
    attributes[:url] = item_data.URL
    attributes[:height] = item_data.Height.to_i
    attributes[:width] = item_data.Width.to_i
    attributes[:type] = type.gsub("Image", "").downcase

So, the result although similar seems nicer to me. A bit more code like, and less strings, even a bit less code. Definitely a win so far. But, there are some issues.

First one is that with XML parsing I don’t think much about collections, they are always the same, regardless of the number of child nodes. With Crack/Hashie, suddenly I needed to think about it since the combination converts a collection of 1 into direct child. Hence the Array check in parse_items method. I don’t like making such checks but OK, it was a very limited and specific enough not to hurt me later.

The second issue was performance. Even while testing everything suddenly seemed slower. At first I attributed this to my fatigue, but just to be sure, I made a small performance test. It consisted of parsing a predefined XML document (with recorded Amazon API response) for 100 times. The XML response has 9 items, and you can see an example here. The test routine can be found here. The results were more than interesting:

Seconds: 1st 2nd 3rd 4th 5th Average
Nokogiri 1.18 1.26 1.27 1.33 1.23 1.25
Crack/Hashie 7.06 7.87 7.56 7.58 7.44 7.50

It seems extra baggage from Crack and Hashie makes that solution about 5 times slower.
This was more than enough reason to abandon the approach and just live with plain XML and Nokogiri.
But, at least now I know why 🙂

Tagged , ,

Stubbing Amazon API calls using VCR

When integrating with external services, it is wise to test those interactions. But then again, it soon becomes tedious and slow if you need to repeat tests and wait for those external services responses. So VCR gem comes to the rescue! Much has been written about it so I want go into details, but what put a smile on my face was the solution for the integration test where I needed to test searching Amazon item listings using Vacuum gem. So to test my code I needed to a way to:

  • tell VCR to record the Vacuum gem request and Amazon API response
  • reuse it for subsequent test runs

Two things bothered me:

  • Vacuum uses Excon for HTTP layer
  • Amazon API calls are signed, making two identical search calls have different URI’s – the difference being e.g. with Timestamp part of it

So how does one hook into these layers for test purposes? Fortunately, VCR comes with solutions for both issues.

There is a hook_into VCR configuration option for Excon. Essentially this means VCR can intercept Vacuum calls, great! Configuration is simple, just add :excon hook in spec helper, like in the gist below.

For signed Amazon API requests, VCR magic was needed 🙂 As you probably know, VCR saves the request and response to appropriate cassettes. For tests within cassette it tries to match the request from the test to the saved ones by comparing HTTP method and URI, as explained here. I couldn’t use it since Amazon API requests are signed, remember? And the existing matchers were of no help either. But, VCR also allows for custom matchers. So, I created a custom matcher that compares search keywords from request uri and that was it!

Now, when running tests, on first run the VCR records the Amazon API request and response to the configured location (spec/fixtures/vcr_cassettes in my case). Subsequent runs reuse those calls. This is more than OK for development tasks. If one needs to refresh the Amazon API response, just delete the saved cassette(s) and the sequence is repeated. Another choice I had to make was whether to store those response in SCM or not? In the end I decided not to save them. Search action is not destructive or otherwise dangerous so any developer can repeat the process without cost. Mind you, in some other use case, e.g. when billing some action over payment gateway, it would probably be wise to store the response.

Tagged , , , , , ,

Deploying Rails with Twitter Bootstrap on Heroku

Lately I’ve been playing with deploying a Rails application to Heroku. A most pleasurable experience but I’ve had some issues with Twitter Bootstrap. The reason is that Heroku discourages the usage of therubyracer gem, see https://devcenter.heroku.com/articles/rails3x-asset-pipeline-cedar#therubyracer for details. In my development environment I used the twitter-bootstrap-rails gem and I wanted to keep using it along with the less support. There are some other non less solutions, as described at the Ruby Source here, but that didn’t seem appealing. So, the only solution I could find was to precompile the assets and push them to Heroku. Aside from that, there were a few more issues to resolve:

  • Use of the helper bootstrap_flash doesn’t work when precompiling assets – the solution was to copy the helper to my Rails project, as described here
  • To be able to continue to use therubyracer gem in development, I had to move it to development group in Gemfile

Now, the procedure is pretty simple, just precompile the assets and push to Heroku and that’s it? Well, almost 🙂

I wanted also to be able to not use the precompiled assets in development, since this requires a bit of manual work (one needs to precompile for each run) and at the same time I didn’t want to forget precompiling them before pushing to Heroku. Hence, I put together a small shell script that:

  • creates a separate branch from the current one
  • precompiles the assets
  • pushes that branch to Heroku – it needs to be pushed to master
  • deletes the new branch

Tagged , ,

In memory session store for Rails 3

For the past two days I’ve been playing with the idea of having an in memory session store in a Rails application. Cookies 4KB limit soon became an issue, and I didn’t wan’t to use Redis, Memcached or even database for this. In production, those services have their place for sure, but for development it seems to me that it is silly to ask every developer on the team to have them installed and running to be able to run the application. Development environment is by its nature far from production, even when intentionally similar (e.g same OS), so it makes sense to use as little dependencies as possible.

So, I’ve been searching for in memory session store. At first, there were some references to memory_store, but unfortunately those were all Rails 2 related. There were even some attempts on re-implementing  it in Rails 3, but that of course failed.

Finally, this idea of implementing it for Rails 3 led me to reading Session related code in Rails. And bingo, there it was! Session store in Rails 3 has a couple of implementations, one of them being the CacheStore. And, since there is the in memory cache store in Rails, solution was pretty much straight forward. You can read up on cache store options in Rails here.

In the project, the following changes were needed:

# /config/initializers/session_store.rb
CrazyApp::Application.config.session_store :cache_store

# /config/environments/development.rb
config.cache_store = :memory_store

This will in effect put all your session data in rails process memory. After you kill rails, it all goes away. Ideal for development 🙂

In case you might need the data to persist, you can use file store which will, by default, save the session data in /tmp so you can inspect it. The file store is actually the default cache method.

There is one issue with this setup. You can’t have separate store for session and cache data. In that case, a new session store implementation would be needed, but let’s cross that bridge when the time comes.

Tagged , , ,