Laura Diane Hamilton

Technical Product Manager at Groupon

Resumé

"Tracking All The Things" at Groupon

Chris Powers, a software engineering manager at Groupon, gave a talk today about how Groupon uses JavaScript to track their vast amounts of user, session, and pageview data.

When Chris first joined Groupon, he told us, he found total tracking chaos.

Each web page had 20 or 30 tracking pixels on it. (A tracking pixel is a 1x1 transparent gif that sends data about a user to the web server for behavior tracking purposes.)

"It doesn't seem like a big deal until you try to do it at scale."—Chris Powers, Groupon Engineering

With Groupon's scale, the dozens of tracking pixels were driving its tracking servers to the brink of failure.

Groupon also had multiple different groups tracking different sets of data in different ways. There was no central ownership or oversight of the tracking frameworks. There was very little communication among the different groups about what metrics they were tracking and how they were using the data.

Image credit: Allie Brosh on Hyperbole and a Half

Chris and his team set out to pull the disparate tracking systems together and improve the performance of the tracking system.

The team found that there were four key data stakeholders—each of whom had very different data needs.

The CEO required a set of key business metrics to help him understand the big picture. The definitions of these key business metrics did not change, and it was very important to keep the CEO metrics correct.

The marketing team wanted data one level down. The marketing team wanted to understand how to attribute different users to the proper marketing sources, and they wanted the power to optimize their search engine marketing (SEM) spend.

Product managers and developers had more detailed (and more transient) needs. Typically, PMs and developers would want to understand how each A/B test was performing. They'd also want to understand the impact of each website change or new feature.

Finally, the data scientists wanted all the data they could get their hands on, for the purposes of what Chris calls "Data Spelunking."

Groupon's solution?

Image credit: Allie Brosh on Hyperbole and a Half

The Groupon engineers developed a unified JavaScript-based framework to track all user behavior. The framework had three levels of data: User-level data, session-level data, and page-level data.

At the user level, Chris said, the team tracks the following:

  1. User ID
  2. User Agent
  3. Metadata (e.g., number of visits, user type)
  4. Logged in vs. logged out


One user has many sessions. (A session is basically a single "visit" or "interaction period," and usually has a time-based limit. All Groupon applications have the same session logic, but different companies use different session logic depending on their needs.) Groupon tracks the following data at the session level:

  1. Session ID (a hash of the User ID and timestamp)
  2. Session Expiry Logic
  3. Referrer


Finally, at the pageview level, Groupon tracks the following:

  1. Page ID (a hash of the Session ID and timestamp)
  2. Page Type (e.g., "checkout page" or "deals page")
  3. URL
  4. Country
  5. Locale (e.g., French vs. English in Canada)
  6. Application-specific metadata
  7. Referring Page ID
  8. Referring Click ID
  9. Tracking Library Version (so that tracking data corresponding with pre-release tracking libraries is understood)


Groupon set up two types of tracking—explicit tracking and implicit tracking.

For explicit tracking, they created a library called TrackingHub (which they promise to open-source) that sends AJAX-based tracking requests with a simple one-liner: TrackingHub.add("msgName",{some: "data"}

For implicit tracking, the Groupon team developed a system called Bloodhound. Essentially, their front-end developers simply use a specific type of markup, and that automagically sends structured data over to the data warehouse. <div data-bhw="MyContainer">...</div>
<div data-bhc="deal:my-great-deal">...</div>
The neatest part of the Bloodhound system is the browser plugin that the team developed. The plugin highlights the page syntax where it finds the bloodhound markup. It also adds a nice picture of a bloodhound to the web page.

Chris went on to give the audience some tips for implementing the JavaScript-based tracking system. First of all, he suggested batching messages to reduce overhead and server load. He also suggested persisting the message cache across page loads in order to minimize data loss.

He gave the following tips for implementing message persistence on the client:

  1. Version all messages so that a software release does not cause confusion
  2. Create migration functions to upgrade data to the latest version
  3. Isolate top-level localStorage keys to reduce churn
  4. Abstract away the storage engine, which could vary by browser. (A user with a legacy browser might get cookies, whereas a more modern browser could take advantage of localStorage.)
  5. Track storage usage
  6. Purge any data more than 24 hours old—by that point it is more likely to be confusing than helpful


Chris gave the following tips for verifying the correctness of the data:

  1. Define data-loss thresholds and success metrics up front
  2. Identify points throughout the tracking process where data validation could occur
  3. Use a judiciously-placed tracking pixel to double-check the TrackingHub and Bloodhound data
  4. Use message indexes to identify and count dropped messages
  5. Create unit tests for each component of the tracking system
  6. Use a JavaScript-enabled crawler to trigger the tracking messages, then look for the crawler-generated messages in the data warehouse
  7. Set up realtime alerts for any missing or malformed keys
  8. Proactively develop an escalation and testing process in the event that someone identifies a possible data quality issue


And that, Chris told us, is how Groupon tamed their tracking chaos and satisfied their many stakeholders.

Image credit: Allie Brosh on Hyperbole and a Half

The slides from Chris's talk are posted here.

The cartoons in this post are from Allie Brosh’s amazing blog, Hyperbole and a Half. If you’ve never read her blog before, you are in for a treat.

Here is a video of Chris's talk:

Tracking All the Things by Chris Powers from Groupon Engineering on Vimeo.

Lauradhamilton.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to amazon.com.