How we implemented kid-safe analytics with Amplitude

PopJam is a social platform that is built from the ground up to be safe, appropriate and fully anonymous for kids. It’s a safe, moderated community for kids to engage with their favourite content and brands, designed specifically for the safety and data privacy requirements (COPPA, GDPR-K) of the under-13 audience. Because it is aimed at an audience of 7-12 year olds, we take both privacy and compliance extremely seriously.

A critical concern for any platform is the product analytics you use to measure and learn. This is never a trivial thing to get right. Deciding what to measure, and how, requires careful thought, but at least you have a massive array of products and solutions to choose from, often with fully-featured SDKs to make implementation quicker and easier.

Not so in the kids digital ecosystem.

When making apps for kids you have an additional set of concerns and constraints. Were you to simply plug in an off-the-shelf analytics SDK, you would very likely find that analytics SDK broadcasting personally identifying data (as defined by COPPA) from your client application and recording it server side. This is standard for analytics platforms made for the grown-up internet and enables those platforms to provide richer insights, such as geographical heat-mapping. However, this is something we take every precaution to avoid across the  PopJam platform, be it web or mobile, as a stand-alone product or embedded in customers’ products.

As you consider your options in this space, allow me to reflect on the journey we went through on PopJam, to illustrate a couple of possible approaches.

Roll Your Own

We all know the cheap/fast/good triangle. In the PopJam team, we started with a solution that we rolled ourselves. We span up an Amazon Redshift database, put the open source query runner Re:Dash over the top and got to work on defining and implementing our own analytics event infrastructure.

We created our own client-side SDKs to broadcast product analytics events to our own analytics API, making certain not to pass any data that could be used to identify or fingerprint the user. Our events service then carefully discarded any remaining PII, such as the IP address and user agent in the request and wrote the event into a raw database for overnight processing.

We then painstakingly constructed our ETLs, analyst schemas and metric definitions by hand in raw SQL, using Re:Dash to schedule and run those queries.

Initially, it seemed like a great solution. Cheap (built on open source and internally-build tools), fast (we controlled the scope and the roadmap) and good (we knew it was compliant, and we had full visibility and control over our data). However, it didn’t take long for the cracks to appear.

Not So Cheap

While relatively cheap to run, from an infrastructure point of view, it was far from cheap to operate and maintain. Our overnight batch jobs, Redshift, Re:Dash, all proved to be somewhat unstable, and days a month of our tech lead’s time were lost to restarting, debugging, recovering lost data, responding to disgruntled business users and generally keeping the product analytics alive. When it failed, all kinds of business functions were affected, as we had built things like campaign reporting, community management dashboards and marketing attribution on top of the data in it.

It also took a ton of time from our product managers, who had to build every metric, graph, dashboard, reporting tool and query manually, learning as they went. Mistakes were made, which in turn took more time to undo.

Opportunity Cost

All this effort was not just time-consuming, not to mention frustrating for the team, it also represented a huge opportunity cost. With our product manager sometimes spending two whole days a week arms-deep in SQL, they were less able to focus on the job of discovering an awesome product. Questions raised during discovery took much longer to answer, further slowing the iteration cycle, and often we did not have the skills to be able to fully answer our own questions, having to fall back on simpler analysis instead.

In addition, any improvements we wanted to make to our analytics, analysis or visualisation toolkit would require prioritisation within our backlog. We were getting nothing over time unless we put the grunt in. No-one was moving us forward if we weren’t.

Not Good Enough

While Re:Dash worked tolerably well, there were a few downsides, even beyond the reliability (which very likely could have been resolved if we had prioritised the time to invest in it). Our key problem was our reliable on our own statistical analysis and query crafting skills within the team. The product manager (me!) was not a data scientist. We didn’t have a dedicated data analyst in the business.

While we were able to self-serve on the basics, and do a reasonable amount of discovery within the data, there was a universe of more advanced analytics techniques and approaches that were closed to us. We needed an analytics platform that could actually lift us up, above our own skills, and supercharge our ability to truly understand what our data was telling us about our users and customers use of the PopJam platform.

Leveling Up

As we started to push more frequently against the limits of our skills, we went in search of some experts who might have the solution to our problem. We knew that more advanced solutions existed, as many of us had used them in previous roles. We needed richer insights to continue to improve our product development process.

We tried a few different analytics platforms before selecting Amplitude. We were blown away by their user interface, which was totally dedicated to product development. The Amplitude platform contained a host of incredibly powerful and easy to use analytics tools that we just had no practical way of making ourselves in our previous solution.

One of the most impactful improvements was the ability to define cohorts from any data point, using Microscope. As well as giving the product team the ability to easily dig down to understand the behavior of those users that (for example) comment most regularly and see what else they do, the marketing team were also able to make immediate use of this feature to understand how engaged a cohort of users that joined as a result of a particular marketing campaign were, and asses if that tactic brought in the “right” kind of kid.

Another tool that we could not have dedicated the time to create ourselves is Impact Analysis. Using this analysis tool we were able to reveal and explore the hypothesis that encountering and enjoying a personality quiz early on in your PopJam journey has a big impact on how you perceive the product and how much you go on to engage and retain.

Migration

Migration to Amplitude was simple. We kept our existing kid-safe events pipeline, and client code, but piped all the events from our events service into Amplitude via their HTTP API. This ensured that we remained in full control of what data leaves (or more importantly doesn’t leave) the kids device. We continued to use our bespoke client eSDKs, rather than Amplitude’s client SDKs, as this kept us in full control of the data leaving the child’s device. We maintained our event service, which served to ensure (a) we remained de-coupled from any specific analytics platform, and (b) we were in full control of the privacy of data prior to piping it to Amplitude.

The result was that we could get all the benefits of Amplitude’s powerful front-end tools with full confidence that Amplitude contained absolutely zero personally identifying data from our users.

While more expensive than our previous solution, we have reclaimed focus on our ability to innovate and iterate our products and platforms, which is invaluable.

Kid-safe Analytics

The toolset we have now is light years ahead of what we were able to achieve internally, in terms of analytical sophistication, provided a depth of insight and guidance for product iteration that we would never have got anywhere near to. The team and the business trust the numbers now. We have rolled Amplitude out across multiple non-product teams in SuperAwesome because the UI is intuitive and non-threatening (goodbye SQL!).

Scarlett Cayford, Head of PopJam, heads up a team of strategists, designers and ad operations executives, all of whom regularly use Amplitude to analyse data in different areas of PopJam.

“While our own set of tools were workable, it meant we were restricted in what we could measure and were completely reliant on the product managers to build out new queries. Amplitude is simple enough that we can construct our own queries, and breaking out that data into different time frames and geographical regions is extremely simple. The adoption of Amplitude gave us autonomy as well as authority and has enabled us to react much faster.”

Moving from an internally-developed, open source based analytics solution to Amplitude was a great choice for us. We were able to find a setup that allowed us to use Amplitude in a way that continued to protect the data privacy of our under-13 users whilst giving us a sophisticated toolset for understanding how our product is used.

We don’t worry about analytics anymore. We get constant improvements to our tools and new capabilities because there is a whole other business thinking about that problem-space. We no longer have to be experts in a domain that has nothing to do with making the internet safer for kids.

If you’re interested in staying on top of technology and kidtech news, we publish several kids industry newsletters which now have over 10k subscribers reading monthly. Sign up now!


Mike Hutchinson is Chief Product Officer at SuperAwesome