Nokogiri vs Crack & Hashie

Recently I wrote about using Vacuum gem for accessing Amazon Product Advertising API. As the result of API calls, it returns vanilla XML response from Amazon API. There are gems that returns PORO’s, but the “bare metal” access I get from Vacuum made it a great gem to use for several reasons (breakdown of those reasons and other existing gems is a topic for another post). Parsing the Amazon responses is a bit of a funny issue since information is located at various points. Here’s an example:

  root = Nokogiri::XML(response.body).remove_namespaces!
  items = parse_items(root)

  def parse_items(node)
    node.xpath('//Items/Item').map do |item_node|
      create_item_from(item_node)
    end
  end

  def create_item_from(node)
    attributes = {}
    attributes[:id] = parse_value(node, './ASIN')
    attributes[:title] = parse_value(node, './ItemAttributes/Title')
    attributes[:url] = parse_value(node, './DetailPageURL')
    attributes[:group] = parse_value(node, './ItemAttributes/ProductGroup')
    attributes[:images] = parse_item_images(node)
    Item.new(attributes)
  end

  def parse_item_images(node)
    image_sets = node.xpath('./ImageSets/ImageSet')
    return if image_sets.children.size == 0

    image_set = image_sets.find {|image_set| image_set.attribute('Category').value == 'primary'} || image_sets.first
    image_set.xpath('./*').inject(Hash.new) do |images, image_node|
      image = create_item_image_from(image_node)
      images[image.type] = image
      images
    end
  end

  def create_item_image_from(node)
    attributes = {}
    attributes[:url] = parse_value(node, './URL')
    attributes[:height] = parse_value(node, './Height', :to_i)
    attributes[:width] = parse_value(node, './Width', :to_i)
    attributes[:type] = node.name.gsub("Image", "").downcase
    ItemImage.new(attributes)
  end

  def parse_value(node, path, apply_method = nil)
    nodes = node.xpath(path)
    if nodes.first
      value = nodes.first.content
      value = value.respond_to?(:strip) ? value.strip : value
      apply_method ? value.send(apply_method) : value
    end
  end

As you can see, my domain objects don’t map exactly to Amazon structure. But, that’s what this parser is all about. It translates API responses to what I need in my domain. What bothered me was that I needed to use hard-coded paths to specific information, so I went looking for another solution. This is where Crack & Hashie gems come in. The first one converts received XML to a Hash, and the second one enables nicer Hash parsing. The code seems nicer:

  data = Hashie::Mash.new(Crack::XML.parse(response.body))
  items = parse_items(data)

  def parse_items(data)
    items_data = data.ItemSearchResponse.Items.Item
    if items_data.kind_of?(Array)
      items_data.map {|item_data| create_item_from(item_data)}
    else
      [create_item_from(items_data)]
    end
  end

  def create_item_from(data)
    attributes = {}
    attributes[:id] = data.ASIN
    attributes[:title] = data.ItemAttributes.Title
    attributes[:url] = data.DetailPageURL
    attributes[:group] = data.ItemAttributes.ProductGroup
    attributes[:images] = parse_item_images(data)
    Item.new(attributes)
  end

  def parse_item_images(data)
    return unless data.respond_to?(:ImageSets)

    image_set = data.ImageSets.ImageSet
    image_set = image_set.find { |image_set| image_set.Category == 'primary' } || image_set.first if image_set.kind_of?(Array)
    image_set.keys.select { |key| key =~ /.*Image/ }.inject(Hash.new) do |images, key|
      image = create_item_image_from(image_set.send(key), key)
      images[image.type] = image
      images
    end
  end

  def create_item_image_from(item_data, type)
    attributes = {}
    attributes[:url] = item_data.URL
    attributes[:height] = item_data.Height.to_i
    attributes[:width] = item_data.Width.to_i
    attributes[:type] = type.gsub("Image", "").downcase
    ItemImage.new(attributes)
  end

So, the result although similar seems nicer to me. A bit more code like, and less strings, even a bit less code. Definitely a win so far. But, there are some issues.

First one is that with XML parsing I don’t think much about collections, they are always the same, regardless of the number of child nodes. With Crack/Hashie, suddenly I needed to think about it since the combination converts a collection of 1 into direct child. Hence the Array check in parse_items method. I don’t like making such checks but OK, it was a very limited and specific enough not to hurt me later.

The second issue was performance. Even while testing everything suddenly seemed slower. At first I attributed this to my fatigue, but just to be sure, I made a small performance test. It consisted of parsing a predefined XML document (with recorded Amazon API response) for 100 times. The XML response has 9 items, and you can see an example here. The test routine can be found here. The results were more than interesting:

Seconds: 1st 2nd 3rd 4th 5th Average
Nokogiri 1.18 1.26 1.27 1.33 1.23 1.25
Crack/Hashie 7.06 7.87 7.56 7.58 7.44 7.50

It seems extra baggage from Crack and Hashie makes that solution about 5 times slower.
This was more than enough reason to abandon the approach and just live with plain XML and Nokogiri.
But, at least now I know why 🙂

Advertisements
Tagged , ,

2 thoughts on “Nokogiri vs Crack & Hashie

  1. nalsky says:

    Hey – cool post. What version of Nokogiri were you using here?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: