The phrase “feature complete” is often thrown around as the point at which an engineer’s job is done and the work is handed back to the product manager. From my experience in consumer-facing product development, I have found that taking a feature all the way through to analysis of the impact increases software quality and job satisfaction, and does so with less overall effort.
Product engineers are uniquely suited to diagnose and fix UX issues when exposed to analytics as they have an understanding of both how the system is built and how the system is meant to be used. Product Managers are often only exposed to half of the equation, adding communication cycles to arrive at a solution. Additionally, by seeing a feature “in the wild” via the lens of analytics and user testing, the developer builds a sense of ownership for the product, and empathy for the user as they react to positive and negative performance metrics.
Where the wild data are
Imagine this scenario: it’s Monday afternoon and you’ve just finished going over the next week’s work with your team and product manager. You pull a ticket off the todo list and start implementing it. It is launched to production later in the week, and you move on to the next task. At the quarterly company meeting it is announced that revenue is up 30% and customer satisfaction up 50%. This is great, but do you know how much you contributed to this?
As developers it is all too easy to disconnect the code we write from the experiences our users have. A craftsman in a trade where physical products are created depends on real feedback to know how their work is impacting people. Why should we not have this same feedback as developers? While we may not be able to see this interaction in the physical world, it is possible to get a fairly accurate proxy through numbers. Here’s an example:
If you were tasked with implementing a feature that didn’t have mobile designs but were then presented with the data below, you might be inclined to spend a little more time adapting the solution to this section of the user-base. If after doing so you notice mobile usage increase, then you’re starting to getting into the positive feedback loop where data drives a better design for the end-user, while also providing quantitative validation of your work.
Data provides a lens into user behaviour that can be tied directly to engineering effort. Building software from the ground-up with analytics in mind will enable much richer analysis, unlocking even more insight as the product evolves.
This blog post will provide an overview of different types of analytics data, the different tools and methods of analysis, tips on making the most of the data you collect, and how to build habits around using product analytics wherever appropriate. It is not intended as a comprehensive how-to guide, but rather a toe-dip into the vast but rewarding world of product analytics.
Different types of data
At the broadest level, product analytics covers any data generated as a result of customers using your product. Clearly this could be a large amount of information, so let’s define a few categories to help us narrow down the discussion. For simplification, we’ll also assume our product in question is a reasonably complex interactive application.
Perhaps the original metric, page views occur when a user views a specific page (or screen, for mobile) of your product. These are likely recorded in server logs, and when enriched with browser information such as language and operating system they can enable much more sophisticated cuts of the data such as the most popular regions for viewers of your site, or what percentage of traffic comes from a mobile device. If the version number of the application is recorded with page-level metrics, it becomes a powerful tool for debugging issues specific to a release.
These are events generated from the user interacting with the product. For websites, this might be a click of the mouse, or a keypress. On mobile, these are often referred to as taps, and would also cover gestures such as scroll and pinch. By connecting interaction events to the page they were generated on, via page url or some other indicator, one can determine how interaction differs between pages. If you’re interested in how a specific page is used, such as one with a calculator or other complex set of inputs/outputs, you might want to define very granular events such as “income text field change” or “age dropdown selected”. This will make it very clear what page elements you’re referencing when pulling analytics numbers. If you want to analyse engagement across a broad range of pages, you can opt for a higher-level naming convention such as “button click” or “scroll” where not every page has the same exact elements, but rather share classes of interaction modes. Starting with the type of questions you want answered and thus the type of analysis required will help inform how to structure your events.
Metrics involving time can be considered implicit events. By knowing the time a user first lands on a page, and keeping tracking of the time between subsequent page views, you can derive a proxy for how long they spend on a given page. A page with a longer time might be indicative of more engagement, or potentially more confusion! A website like Google probably wants to minimise time spent on their search results page as it means that the result they served up in the first position was the most relevant.
Similar to time, retention is an implicit measure of engagement. The basic question retention metrics answer is: “for users that visit today, how many are coming back tomorrow?”. Retention, also referred to as “stickiness”, is a sign that users are getting repeat value out of your product. While it is easy to get bogged down by variants such as trailing n-day and unbounded retention, they all answer this same question under the surface.
Different analytics tools
The most ubiquitous of analytics tools, Google Analytics (GA for short) is a platform many are familiar with. Given away by Google for free in order to collect rich data about the web to improve their algorithms, GA exposes a wealth of information. GA operates largely on Page level data, but it is possible to send Interaction events with their SDKs. By having cookies on a large percentage of sites on the internet, GA’s killer feature is the ability to provide insights into visitor demographics such as age, gender, and even interests such as “Interested in food and dining”.
When you want to start digging in to more fine-grained metrics like Interaction events, a dedicated product analytics platform like Amplitude or Mixpanel is where you’ll want to look. Unlike the GA SDK which starts collecting data simply being on a page, these services require events to be fired explicitly. Arbitrary information can be attached to each event making very custom and rich analysis possible. A rundown of how to define a useful event schema is beyond the scope of this post, but this Amplitude article is a great primer.
Different types of analysis
Now that we have a better sense of both data types and platforms, we can dig into the actual analysis. While Google Analytics has a lot of rich information, we’ll focus on looking at interaction data in a product like Amplitude.
Navigation options on a page present not only a path forwards, but also a stopping point where the user gives up and leaves the site. If there are a series of these steps required to turn a new visitor into a customer, funnel analysis will show you at which point you’re losing them. Funnel analysis can combine different event types such as page view and interaction. For example, you could define a funnel as page view of the landing page, page view of product details, then a click on a product as your 3 step conversion funnel. If 80% of users click on the first link, and 50% click on the second, you would have a total funnel conversion of 40% (0.8*0.5). This is commonly represented as a horizontal bar chart where the height of each bar is the number of users left at each step.
For a more nuanced view of a particular event or set of users, you can segment them by a specific property. For example, you might want to segment page views on the home page by the type of the device and browser used, be it desktop or mobile, Chrome or Firefox, or even language. Segmentation can be useful for determining how much to invest in a particular area. If you notice less than 5% of users on a certain device or browser, then it might not be worth optimizing for. Conversely, if you see a large group of events coming from a specific segment, that could be a signal to invest resources.
Combining both segmentation and funnel, we can create some very sophisticated cuts of our data. A cohort being simply a group of people, we use segmentation to select a subset of our users and label it as a cohort. This could be the set of users who have signed up for an account, or users that have opened 1 or more marketing emails. Advanced platforms will enable funnel analysis cut by cohort so you can determine how different groups of users behave. A particular interesting metric to cut is retention. Do registered users come back more often than guests? While you could back in to this via pageviews, cohort analysis is a lot more explicit and therefore easier to interpret.
Make the most of your data
You’ve instrumented your application, started collecting rich data and are analysing it in the appropriate platform. Two days after releasing a new feature, conversion is up 60%! Is it too good to be true? Learning a few basic statistical rules of thumb is essential to help you avoid misreading your data and drawing the wrong conclusions.
Precision and error
If there is one concept to aid in interpreting results, it is that of precision and error. Here’s an example to set up what I mean by this. A page has a button on it, and after a few days of collecting metrics there have been 300 page views and 11 clicks. Converting this to a percentage yields 3.7% conversion. The following week, someone changes the button to blue and you notice 15 clicks in the same period of time – now a 5% conversion rate, which is a 35% (5 / 3.7) improvement, right? Not so fast – this is an example of false precision.
If you line up each click on a timeline, there will be a certain time at which dividing the series can skew results. For example, let’s say we got 8 clicks in the first day, and then the final 3 clicks in the last hour of the second day. If we were to take our final measurement 1 hour earlier, we’d lose out on those 3 clicks and the conversion rate would be substantially lower. However, if in the same time period we had 1100 clicks, losing a few here or there to a different time period will not have such an outsized impact. When dealing with small magnitudes or small time periods, it is crucial not to be overly precise with your conversion rates. A more prudent value than 3.7% conversion would be to round and then provide a margin of error, such as 3% +- 2%, and to do the same with 5% +- 2%. Now we can see that both results share a bound of 4% within their error, and so statistically they are the same result (within our defined measure of error). There are many useful tools, such as this one from Thumbtack that can help you compute these errors.
As a rule of thumb, we look for at least 40 data points before taking it seriously, however the actual number you need for a significant result will scale with the size of the change you’re trying to measure. Optimizely has a great post with a lot more in-depth information on this. Additionally, you’ll want to collect data over at least a 7 day period to account for behavioural differences throughout the week. Let’s illustrate this with an example: visitors that come on a Sunday morning could be at home and have plenty of spare time to dig in to what you have to offer – they might be more tolerant of an experience that is slightly clunky and slow because they’re not in a rush, and so you may see a higher conversion rate. Traffic coming from 11 AM Wednesday could see people visiting during their coffee break at work where tolerance for delay is less acceptable, resulting in lower conversion. Releasing a feature on Thursday and seeing engagement pick up Friday, relative to Wednesday might purely be a function of this differing traffic rather than any meaningful product improvement. By ensuring you always compare the same days of the week you will average out these daily behavioural fluctuations.
Swing for big changes
The corollary of being wary of false precision could be stated as: only bother with large-impacting changes. While companies the scale of Facebook might be very happy with a fraction of a percent increase in engagement across their billion+ users, for the most part we’re dealing with a vastly smaller user base. It is easy to get trapped running from one sub-percent micro-optimisation to another, missing the big picture of a double-digit increase in page views that could be had from re-thinking the overall design. This concept of being stuck in a local optima is A common problem in A.I., but is also highly relevant to analytics.
Validate early, validate often
There’s nothing worse than pushing out a big change, only two discover two weeks later that half the data is either missing or incorrectly attributed. A common practice employed here at NerdWallet is to create any analytics dashboards before releasing a feature, which serves two purposes. Firstly, it ensures you’re collecting all of the data you need to perform the analysis later, and secondly it is another way to verify that the feature is actually working correctly. If you are building a multi-step flow and one of the steps does not seem to be reporting, then it’s an indication that something might be awry. If you have a staging environment for your product, be sure to treat the analytics there like production data. It is much easier to validate an error in staging before release, than have to manually clean up data in production.
The last, and arguably most important aspect of measurement is to be honest. Be vigilant about stating a hypothesis up front, such as “moving registration later into the funnel will increase the overall number of registered users”, and use the data to either prove or disprove this. Without a prior reason for making a change, it can be easy to read too much into the data to retroactively make yourself look right. This practice is especially harmful if you put more trust on a positive outcome than a negative one. Leaving tests that are underperforming to run longer than those that trend positive will introduce a large amount of bias into your analysis and decision making. Take a negative metric as a chance to learn – why something didn’t behave as expected can teach you as much about your user’s intents and behavior as a positive test.
Forming the habit
They say it’s easier to make a habit than break one, but that doesn’t mean it’s effortless. Start small by looking at metrics once a week and see how they change over time. This will help you build up a feeling for seasonality in your industry. With a feel for that, start building funnels to see how your users are exploring your product or website. As mentioned earlier, building funnels before launching a feature is a great way to get in the habit of viewing analytics. If you bake it into your release process, there’s no excuse for not knowing how your product is doing.
More important than any behavioral change however, is the cultural one. Effective use of product analysis requires buy-in at every level, so the more champions a company has, the more likely it is to become a part of the day-to-day. Try to ask your business and product partners for metrics to backup hypothesis and features, and work with them to analyse any impact. While not everything great can be quantified beforehand, it can still be a useful sanity check before investing considerable resources into a project. As data itself has no opinions, it is also a useful tool for settling disagreements. If you can’t agree on a feature decision, picking one, or both with an A/B test, and then resolving it with numbers can be an effective approach. For a little more fun, placing bets on the outcome of tests and experiments is a great way to increase engagement in metrics within a team.
At this point, there are perhaps more questions than answers: what exact platform to use, how to structure your events, how to update existing apps to have correct instrumentation, etc. Hopefully however, there is some intention to delve deeper into the world of product analytics. Start a conversation with people in your organization that use analytics already, visit the websites of some of the products mentioned in this article such as Amplitude (this is what we use), or talk with your product manager to see how you can start getting involved in the post-release analysis of the projects you’re working on.
Want to help us develop data driven products? We have a wide range of openings here at NerdWallet.