Thursday, December 31, 2015

Lua scripting with Redis

The last post covered how we're processing real-time events with ClojureAWS Lambda, and Redis.  Since that post, I decided to structure the keys differently, making it easier to query and organize the data.  Originally the keys were in the format of YYYY-MM-DD:<event>_<listing_id>, which represented a given event for a listing id on a specific date. After some additional thought and consideration, it was determined that moving the events into a hash for a given listing would be more flexible. So we went from keys with this structure:

"2015-12-12:cart::checkoutnow:impression_2155"

to hash keys with this structure: "2015-12-22_2155". Each key has a variety of key/value pairs that represent a given event type for the listing id and date. For example:



127.0.0.1:6379> hgetall "2015-12-22_2155"
 1) "ldp:sellPanel:addToCart:impression"
 2) "6"
 3) "ldp:sellPanel::impression"
 4) "1"
 5) "ldp:::impression"
 6) "1"
 7) "ldp:sellPanel:addToCart:click"
 8) "3"
 9) "cart::checkoutNow:impression"
10) "3"
11) "cart::checkoutNow:click"
12) "1"


Redis hashes are maps between string fields and string values, so they are the perfect data type to represent objects.  A hash with a few fields, where a few is defined as a hundred or so, is stored in a way that takes very little space, making it possible to store millions of object in a relatively small Redis instance.  A single hash in Redis can store 232 - 1 (more than 4 billion) field/value pairs. More information on these memory optimizations can be found here on the Redis website.

Since our requirements are fuzzy at best, I wanted to have a way to iterate quickly and prototype different functionality along with ways to query the data. For this I chose to leverage the powerful scripting language Lua.  Starting in version 2.6, the Lua interpreter was built into Redis. Lua is very powerful and extremely fast and is a great way to go if you have to do "advanced" data processing with Redis.  A full description of the language can be found on the Lua.org website, I would encourage you to take a quick read through the documentation. One key point to highlight is that the Lua table is the data structure in Lua. All structures that other languages offer---arrays, records, lists, queues, sets---are represented with tables in Lua. At first this may seem like a constraint, but in reality it doesn't turn out to be that way.  You can store pretty much anything in tables.

The initial requirements were as follows:
A. Run a predefined calculation against a listing id for a given day
B. Be able to run multiple predefined calculations against a given listing id for a given day
C. Run A and B over a list of listing ids
D. Run C for a given date range

The first part of data wrangling was to get the list of keys we're interested in.  For this I used the SCAN operator.  SCAN provides an iterator allowing for incremental iteration, returning only a small number of elements per call. This is the recommended way when working with keys since calling KEYS may block the server for long periods of time.  The SCAN looks something likes this and returns a cursor based iterator:

repeat
  ok=redis.call("scan",cursor,
                "match",start_year .. "-" .. start_month .. "-*_*",
                "count",100)
  cursor = ok[1]
  scan_keys = ok[2]
....
....

The ok variable is a Lua table, where the first row is a cursor id returned by the server, and the second row is the keys. Each time we call  redis.call(), we provide the next cursor id that the server provided from previous call. This continues until the server returns "0", at that point we've iterated thru all the keys for the call. Since the above call will give us back all the keys that match a given range between our starting year and month, we'll need to further whittle down the data to the set of data we're interested in. To filter the keys down, a simple higher-order function called "filter" would do the trick.

...
    filter = function (func, tbl)
     local t = {}
     for i,v in pairs(tbl) do
         if func(v) then
         table.insert(t,v)
         end
     end
     return t
    end,
    within_date_range = function (k)
      local _,_,y,m,d,l = string.find(k,"(%d%d%d%d)-(%d%d)-(%d%d)_(%d+)")
      local key_ts = y + m + d
      local start_ts = start_year + start_month + start_day
      local end_ts = end_year + end_month + end_day
      if (key_ts >= start_ts and key_ts <= end_ts and listing_ids[l]) then
        return true
      else 
        return false
      end
    end,
...

Calling the filter function will prune down the keys to just the date range we're interested in. Note, currently there's no support to span between months, this is something we can work on later. Once we have the key set of interest, we can iterate over the metrics we're interested in and calculate them.

The initial calculations are all stored as Lua functions in a table.  The first two such metrics we want to report on are related to click through rates. For example, knowing how many people add an item to their cart and either proceed to checking out or abandoned their cart.  When the need to add additional metric calculations arises, we can easily add functions to the utils table.

metrics = {
 ['ldp_add_to_cart_ctr'] = function (k)
    local r = nil
    local a = redis.call("hget",k, "ldp:sellPanel:addToCart:click") 
    local b = redis.call("hget",k, "ldp:sellPanel:addToCart:impression")
    if n and d then
     r = (a/b)*100 
    end
     return k,r
   end,
 ['cart_to_checkout_ctr'] = function (k)
    local r = nil
    local a = redis.call("hget",k, "cart::checkoutNow:click") 
    local b = redis.call("hget",k, "cart::checkoutNow:impression")
    if n and d then
     r = (a/b)*100 
    end
     return k,r
   end

...
...
local x = utils.filter(utils.within_date_range,scan_keys)
for _, key in ipairs(x) do
 for metric,_ in pairs(metrics_to_gather) do
  local k,ctr = utils.metrics[metric](key)
   if ctr then
    table.insert(results,metric ..'=>'..k..' : ' .. utils.format_percentage(ctr))
    end
  end
end

There's a few ways to test this out, of course you can easily run the code against your production Redis instance, but I scripted an Ansible script that logins to the instance, runs a bgsave,  scp's the dump.rdb file over to my workstation, and then I run redis-server locally.

MBP-200729:~ dan.young$ redis-cli -a '<snowden-knows>' --eval ~/projects/redis/lua/listing-stats.lua 0 , cart_to_checkout_ctr,ldp_add_to_cart_ctr 2015-12-30 2015-12-31 2149
1) "cart_to_checkout_ctr=>2015-12-31_2149 : 100.00%"
2) "ldp_add_to_cart_ctr=>2015-12-31_2149 : 33.33%"
3) "cart_to_checkout_ctr=>2015-12-30_2149 : 25.00%"
4) "ldp_add_to_cart_ctr=>2015-12-30_2149 : 13.64%"

The first argument to --eval  is the script name, and the second represents the key names, which can be accessed via the global KEYS variable. Since we're generating the list of the KEYS within the script itself, and not providing any via the command line, we provide 0 for the KEYS argument.  All the additional arguments should not represent key names and can be accessed via the global ARGV variable. In the above, the ARGV[1] would be the cart_to_checkout_ctr,ldp_add_to_cart_ctr and ARGV[2] and ARGV[3] would be 2015-12-30 and 2015-12-31 respectively,  and ARGV[4] would be 2149.  For testing purposes, using eval is fine, but once you get to a stable point where you want to run this on your production instance, I would recommend to cache the code on the Redis server using the SCRIPT LOAD command.

 redis-cli -h 127.0.0.1 -p 6379 -a '<snowden-knows>' \
> SCRIPT LOAD "$(cat ~/projects/redis/lua/listing-stats.lua)"
"64b561bb63a178360e02f8d63a3c1cbf6925e387"

 redis-cli -h 127.0.0.1 -p 6379 -a '<snowden-knows>' \
> EVALSHA "64b561bb63a178360e02f8d63a3c1cbf6925e387" 0 cart_to_checkout_ctr,ldp_add_to_cart_ctr 2015-12-30 2015-12-31 2149
1) "cart_to_checkout_ctr=>2015-12-31_2149 : 100.00%"
2) "ldp_add_to_cart_ctr=>2015-12-31_2149 : 33.33%"
3) "cart_to_checkout_ctr=>2015-12-30_2149 : 25.00%"
4) "ldp_add_to_cart_ctr=>2015-12-30_2149 : 13.64%"

Redis is cool...what else can be said....

So there you have it, Bob's your uncle !

Keeping in-sync with Salesforce

Keeping two or more data stores in-sync is always a fun task. With the help of Apache NiFi and some decently tooling, I've been able to...