Kafka & Storm integration talk

This June I had the pleasure of attending the annual DORS/CLUC conference in Zagreb. It’s a convention for the Croatian Linux users group devoted to open systems in general. There was a bunch of interesting talks but my favorite of the day was definitely Mario Borna Mjertan, an elementary school student that’s been using Linux (as a user and as a developer) for 6 years already. Talking about a real interest in computers :-)

Also, I had the pleasure of talking about integrating Kafka & Storm at my dev shop. The basic idea was to build a system that can process incoming data stream (in the range of about 5,000 events per second) and build end user dashboard data (trends, current submit rate etc.) in near real-time. In the talk I gave a short intro to the technology stack and basic concepts but also an overview of bottlenecks and issues we experienced along the way.

Below you can find the video of the talk:

And the related slides:

Video can also be found here.

Tagged , , , , , , ,

jQuery UI slider for dynamic content

On one of the projects I’ve worked on, we used jQuery UI slider for dynamic content. The slider’s handle needed to resize according to that dynamically loaded content.

The main issue was that with the usual suggested ways to do it, there was a jump for first mouse initiated movement on the slider. That rather ugly behavior is related to the slider’s requirement that the slider bar and handle parent must be of same width[1]. That, in combination with the requirement to give the handle a half margin offset[2] due to handle sizing, causes that ugly jump to occur.

This is how the default setup looks like:


You can see that the bar and the handle parent are of the same size. Slider is initialized on bar element.

The fix is rather simple. Instead of initializing the slider on bar element, do it on ui-handle-helper-parent. Then you can size that element as needed, and the slider does not exhibit the ugly jump. The ui-handle-helper-parent element needs to be centered in bar and a bit of custom styling is needed, but in the end it works nicely. This is how the setup looks like after the fix:


Due to peculiar sizing of the ui-handle-helper-parent element, another fix was needed as well. When clicking on the area of bar element that the ui-handle-helper-parent element doesn’t cover, the slider value needs to be recalculated and set accordingly. All of these changes and fixes are packaged in the DynamicSlider[4] component and there is a working example[3].


  1. Jquery UI Scroll Bar not fitting the contents
  2. jQuery UI slider jumps
  3. A working example @ CodePen
  4. The gist of the example
Tagged , , ,

Karma & Browserify

Recently I had the pleasure of setting up Angular tests within a project. For testing, we decided on Karma test runner by Angular team. You can find details about the reasoning at the end of the post, but basically the idea was that we’d like to stick with the runner built and used by the Angular team itself.

The project is currently a Rails gem and there are a bunch of nice tutorials on how integrate the two (see resources), but this project has aspirations to become a full-blown Angular app without any direct Rails dependencies. So, solutions like rails_karma gem or rails assets was not an option. 

The good news is that the project in question used Browserify and sprockets-browserify gem to integrate it into Rails. After a bit of investigation, we came upon karma-browserify node module. This way, the resources under test were still served by browserify in the same manner as they would be served with the live application, but Rails was actually not needed to run the tests. The module in question essentially adds Karma preprocessors that make Karma aware of browserify served resources. In the end, this made our tests run fast, without involving Rails, which is a nice bonus.

The coffeescript configuration that ended up running the test suite:

module.exports = (config) ->
 basePath: '..'

frameworks: [

 'app/assets/javascripts/index.coffee': ['browserify'] # index contains all requires
 'spec/**/*.coffee': ['coffee']

 extensions: ['.coffee']
 transform: ['coffeeify']
 watch: true
 debug: true

 files: [
 'app/assets/javascripts/index.coffee' # load the application, have browserify serve it
 # watch application files, but do not serve them from Karma since they are served by browserify
 pattern: 'app/assets/javascripts/*.+(coffee|js)'
 watched: true
 included: false
 served: false
 'spec/support/**/*.+(coffee|js)' # load specs dependencies
 'spec/javascripts/**/*_spec.+(coffee|js)' # load the specs

exclude: []
 reporters: ['progress']
 port: 9876
 colors: true
 logLevel: config.LOG_INFO
 autoWatch: true
 browsers: ['PhantomJS']
 captureTimeout: 60000
 singleRun: false

Just run karma with karma start spec/karma.config.coffee and that’s it.


Tagged , , , , ,

Two column PDF to eReader format

Lately I’ve come across a few old PDF’s from faculty days. Some of them were quite interesting even today but reading PDF on anything other than large screen is painful, at least to me. So, I tried to figure out how to convert to a tablet or eReader friendly format. In my case, AZW3 for Kindle reader app on my tablet.

When the source PDF is well formatted, you’re in luck because you can do wonders with Calibre. Essentially, it will do an excellent job. The conversion page in Calibre manual pretty much explains it. One of the things that bothered me were the page numbers. Couldn’t get rid of them in the target format. So, Search & Replace and some regex magic to the rescue.


When in Search & Replace, you can start the wizard and browse the text to be converted. It is shown in HTML format, so it is easy to read tags, and it is shown in the way it will get converted. Thus, you can see all the mistakes Calibre made out of the box. The Search & Replace rules apply before other conversion phases kick in. So, for example, you can denote headers that were not recognized, and Calibre will then use this to create the table of content. In the above example, I used two replacements:

  • <p>([A-Z]|\s)+</p> => <h1>\1</h1>, this will convert e.g. <p>SOME TITLE</p> to <h1>SOME TITLE</h1>
  • <p>\d+</p> => nothing, which will essentially delete page numbers

You can think of Search & Replace as a sort of preprocessor and depending on the quality of the PDF, you can tweak the conversion to your liking.

Now, there are kinds of PDF’s that are downright impossible to convert in this manner. One of the issues is a PDF with columns per page. I didn’t find an easy way to tell Calibre about the columns. The text would get scrambled, with pieces of left column being mixed with the right one etc.

So, paperCrop to the rescue! paperCrop is a tool that splits the columns, go figure :-) It actually does quite a bit more, but I used it for that purpose. It suggests reasonably well what to split and where, but you can tweak the parameters easily. It also offers manual clipping if you like that approach. In the below example, I used only parts of the given PDF.


When the single column PDF was produced, it was again a straight job to convert using Calibre.

If the source PDF is text only, there is another option you can use. Just copy the PDF content into a TXT file. There you can edit the content easily and effectively prepare it for a conversion in Calibre with default parameters. E.g. I tend to remove the content, and let Calibre generate it from the rest of the text. This solution came in handy with word splitting often seen in PDF’s. Words near the right margin in PDF tend to be split in two rows with a dash in between, e.g. seren-dipity. So, you can replace those with your favourite editor, again using regexps. Or, if you prefer, you can try that in Calibre.

In the end, the result is a very usable eReader formatted book. Haven’t tried Calibre conversion for PDF’s with graphs or other images, but hopefully it will work just as well.

And a friendly note, for Linux boxes Calibre install instructions strongly suggest using their binary install instead of distribution packaged versions.


  1. Calibre
  2. Calibre blog
  3. Calibre converson manual
  4. Calibre regex manual
  5. Paper Crop
  6. Briss, another tool for column splitting, haven’t tried it
Tagged , , , , , ,

Jenkins Gitlab Hook Plugin reorganized

The Jenkins Gitlab Hook Plugin received a major refactoring. The goal was to separate concerns from existing modules and to make the project testable. Github repo now contains Java binaries needed to run the rspec tests, but hopefully you’ll find the new organisation a bit more intent revealing and easier to follow.

I’ve used the use case approach, and extracted related services so now all the domain knowledge is contained within models sub folders:


The remaining models in the root models folder are all directly Jenkins related and left there so Jenkins can load them first and register the plugin and the related web hook correctly.

The entire domain knowledge is now also testable. I chose the rspec to run the tests and have created the related spec helped that loads all the Java dependencies and models from the root folder. To run the specs, you’ll need to setup JRuby so it runs in Ruby 1.9 compatibility mode. Just add the following switches to your JRUBY_OPTS environment variable: –1.9 -Xcext.enabled=true -X+0.

The v1.0.0 release has all the goodies, so feel free to upgrade your Jenkins environments.

Tagged , ,

Jenkins Gitlab Hook plugin updates

There have been a few changes to the Jenkins and the related Ruby runtime as of late. This has caused a few issues with the Gitlab Jenkins Hook plugin which have finally been resolved.

It is recommended that you upgrade to the latest plugin version v0.12.2 and Jenkins to the latest available version if possible. Otherwise, I would stay away from Jenkins v1.519 to v1.521 and the plugin version v0.2.7 to v0.2.11. If you are not experiencing any issues currently, that’s OK, this is related only to those that want some part of the system upgraded for whatever reason.

Also, the upgrade is recommended if you have any of these symptoms:

  • Failed to load HAML message – problem with Ruby Runtime on windows, details in issue #9
  • Failed to install the plugin – problem with Ruby Runtime and Jenkins v1.519 and  v1.520, details in issues #10#11, #12 and #13
  • Undefined method ‘getDefaultParametersValues’ – method gone private in Jenkins, details in issue #14
  • Build no longer triggering – plugin was not building non parametrized Jenkins projects, details in issue #15

  • Case insensitive repo URL matching



Tagged , ,

Parametrized Jenkins releases from non master branches

Related to Jenkins & Git branches post, if you want to make it play nicely with M2 Release plugin, just configure the Jenkins project to checkout / merge the code to a local branch that has the same name as the branch that is currently being built, like this:


This will enable the release plugin to work even with non master branches.

You can then start the release build like as usual:

m2 release with parameters

Tagged , ,

Deploying Discourse to Heroku

Recently I stumbled upon Discourse. Finally someone tackled that problem. Forums, while rich in content, have been so dull and unfriendly for so long. Anyway, I wanted to get  it up and running for myself, preferably on some cloud infrastructure, to play around. I’ve had previous experience with Heroku, so chose that setup, for no other reason. The Discourse’s own forum has plenty of other options so feel free to investigate.

As for the Heroku setup, there are existing guides, see [4] and [5]. The difference between them is that when using Discourse’s default configuration samples, you get Open Redis support, while I wanted to use Redis Cloud add-on, similar to Swrobel’s take on it. Also, I wanted to use Autoscaler for Sidekiq to reduce total cost on Heroku, as noted in Discourse’s document. All in all, I ended up with a mix of ideas from both sources. Add-ons used on Heroku:

Currently free plans are used for all of them as you can see below.

Heroku Configuration for a Discourse instance

To be able to repeat the process and easily deploy updates using Discourse’s Github repo, I’ve created a small Bash script for this. It basically performs the following tasks:

  • goes into your local Discourse git repo
  • creates a new branch, based on the current branch you’re on (all changes must be committed)
  • creates Redis configuration using provided sample and tweaked for Redis Cloud
  • adds mail configuration to production environment
  • removes clockwork from Procfile
  • set’s ruby version to 2.0 in Gemfile
  • configures Sidekiq for Autoscaler
  • creates temporary database and migrates it
  • precompiles assets using the above database
  • drops the temporary database
  • commits all configuration changes along with precompiled assets
  • pushes it all to Heroku
  • migrates the database on Heroku
  • and finally deletes the deployment branch

You can find the script here, feel free to use it, change it, do things to it you see fit :-) The main idea was to do as minimum manual work as possible, at least for this phase where I follow the default instructions closely. This of course might change, but for now is quite all right.

Prerequisites for executing the script:

  • You need to be able to run Discourse locally or at least to be able to precompile assets
  • This means you need:
    • PostgreSQL
    • Ruby 2.0
    • Cloned Discourse Github repo

When setting up PostgreSQL, take care to enable HStore on it [2] and to set appropriate discourse user permissions [3]. You can change username and password easily to match your setup.

Also, you need to replace the SECRET_TOKEN within the script and match it to your own setup. I took care to match it to the one set for the Heroku instance (heroku config:get SECRET_TOKEN).

Aside from the script, you should still follow the instructions on mentioned documents to e.g. create initial user, setup Amazon S3 upload etc.

Finally, the forum is up:

Running Discourse


  1. Installing PostgreSQL on Ubuntu
  2. Enabling HStore for PostgreSQL
  3. Setting PostgreSQL permissions
  4. Swrobel’s take on Heroku deployment
  5. Discourse’s own Heroku deployment instruction
  6. Related topic on Discourse’s forum
Tagged , , , , , ,

Whisper Ruby

When using callbacks in your Ruby objects, there are more than a few ways of doing it. Recently I stumbled into Wisper gem that abstracts details away in a nice manner. Best do some code comparison.

Previously, I might be writing something like this:

class Worker
  def do_some_hard_work
    status = :should_be_sleeping_like_a_log
    notify_listeners(:on_work_done, status)

  def add_listener(listener)
    (@listeners ||= []) << listener

  def notify_listeners(event_name, *args)
    @listeners && @listeners.each do |listener|
      if listener.respond_to?(event_name)
        listener.public_send(event_name, self, *args)

class Owner
  def set_things_in_motion
    worker = Worker.new

class WorkerDisplay
  def on_work_done(worker, status)
    display_worker_status(worker, status)

Here you have a worker class that has the ability to register listeners and to trigger appropriate events on them, if they respond to the given event. Not too big of a footprint but it does sort of pollute the domain a bit. I guess things could be extracted to a module but let’s try it with Wisper now:

class Worker
  include Wisper

  def do_some_hard_work
   status = :should_be_sleeping_like_a_log
   publish(:on_work_done, self, status)

class Owner
  def set_things_in_motion
    worker = Worker.new

class WorkerDisplay
  def on_work_done(worker, status)
    display_worker_status(worker, status)

So, most of the code didn’t change, but the Worker class did benefit from a more clearly revealed intent. A small one, but still a win. Best thing is, you can use the async Wisper gem extension and turn your listener into a Celluloid Actor.

class Owner
  def set_things_in_motion
    worker.subscribe(WorkerDisplay.new, async: true)

There are other nice features like global listeners, mapping subscription to a different method or  subscription chaining, so if you’re interested go ahead and read the project’s readme.

Tagged ,

Scraping Amazon item offers

In my pet project Dealesque, I am trying to compare all offers on a number of Amazon items, the idea being that it can help decide which offers to use to minimize shipping and total cost. Using Amazon Product Advertising API was the logical first step, but it doesn’t return all the offers for an item. It does however return the “more offers URL” for each item. Hence, the old scrapin’ was due, and none too late!

Plain wget-like action would not suffice, since Amazon is taking care to block unwanted traffic. So, mechanize gem to the rescue! It actually allows you to impersonate a real browser:

agent = Mechanize.new { |agent| agent.user_agent_alias = 'Mac Safari' }

After that, you can navigate the site, click away, read any forms etc.
For scraping, what I actually ended up using was to get the content of the “more offers URL” page and parse it using Nokogiri. Something like:

page = agent.get(more_offers_url)
root = Nokogiri::HTML(page.content.strip)

For the current development stage, this is doing just fine. Unfortunately, for production use it will not suffice. There will probably be some traffic throttling from Amazon and some benchmarking will need to be done to determine the limits. Also, proxying the requests will probably be required too. But, I leave this for some other times.

The result of scraping the offers for picked items:

Dealesque picked items

Tagged , , , ,

Get every new post delivered to your Inbox.

Join 108 other followers