Patterns in Natural Language Data | A Paw Patrol Analysis

After watching too much Paw Patrol with my two kids, I identified a rigid episode structure that, with some work, can answer some deeper questions about the team dynamics of a children’s television show. These questions about a popular children’s television show are, at heart, nonsensical, as it is a silly show. However, the exercise of finding structure in natural language and using that to extract a more profound meaning can be used to gather insight about your business as well. We will review the Paw Patrol formula through this blog post and determine how we can apply that over unstructured episode transcripts to identify richer team dynamics and demographics within the show.

Background

We first must start by describing some Paw Patrol lore. The show is about a team of search and rescue dogs that work together to help solve problems in their town. These dogs, referred to as “pups” for the remainder of this article, have a specialty loosely tied to popular first responder professions (police, firefighter, Coast Guard, etc.). The pups have a pup pack with a collection of tools associated with their profession that they can use to solve specific scenarios. For example, Marshal, the firefighter pup, has a water cannon to put out a fire. The pups use these tools to solve missions around the town.

Secondly, the show takes place in 3 acts:

Act 1: Some shenanigans cause some problems in the town.
Act 2: Ryder, the human leader of the paw patrol, assigns tasks to individual pups with instructions on which tool they should use to solve a problem. For example: “Marshal, use your water cannon to put out the fire.”
Act 3: The pups use their tools and sometimes improvise to solve the mission.

Purpose

With a better understanding of the show, let’s introduce some questions that this format can help us answer:

Who is the most and least useful pup?
Who goes outside the plan and uses their tools without being assigned a task?
Are there any pups assigned a task, but do not use their tools during Act 3?
Who is the best at sticking with the plan and regularly does their assigned task?

These are questions nobody should ask, but I felt the need to answer them as someone who enjoys team dynamics and cares about what my children watch. With some additional metadata around the transcripts, we can also answer a few other important questions:

What is the gender representation across the various seasons of the show?

What is the species (dog, human, chicken, etc.) representation within the show?

What are the most popular tools, and how frequently are they used?

Solution

The first step to solving this problem was to find a data source with the following minimum criteria:

Must list the quote and character that said the line.
Must maintain the correct order of the quotes as they are spoken.
Must separate the data into episodes.

To my surprise, a data source from the Paw Patrol fan wiki exists. Since this humble site does not have an API that I’m aware of, I resorted to web page scraping the data. Which provided me with the following data points: season, episode name, actor, and quote. From there, we can start to cleanse the unstructured data and parse it into a format that allows for richer analysis. This involves loading it into PySpark data frames, casting data types, and removing unnecessary characters.

With a working data structure, the most manual step began. We needed a business user with expert knowledge of the pups, their tools, and the universe in general. Since, to my knowledge, this person doesn’t exist, I took on that challenge. This process required a manual review of all the pups’ prospective tools in their pup packs, identifying the main and supporting characters in the show, assigning the characters perceived gender and species, and cataloging all the character’s catchphrases. The catchphrases are critical since they signify quite a bit of the formula:

Task assignment from Ryder
Acceptance of the task from Ryder
Tool usage
Identifies and distinguishes the three acts of the episode

Findings

General

After aggregating all the data, we came up with the following:

A specific call out is that this analysis is based on the early seasons of Paw Patrol. This is due to only having a valid dataset for the first seasons of Paw Patrol, and our findings would likely change with more data.

Gender

Based on the perceived gender of the main characters, we can quickly identify the number of quotes for each gender:

Tool Usage

From our manual identification of tools available to the pups in their pup packs, we can aggregate the sum to identify what is the most popular tool in the data we have:

One thing unexpected here is that there are some of the same tools available to multiple pups, for example, both Chase and Skye have goggles in their pup-pack. Also, this is only about half of the tools available to the pups, with another 39 less frequently used tools not shown in the diagram.

Role Assignments

During the show’s second act, Ryder assigns each pup a task to perform during the third act. The following highlights how frequently Ryder assigns a task to each pup:

In defense of Everest and Tracker, they are “guest stars” in the show and only appear in a limited number of episodes. Below is a count of the number of lines for each character highlighting that.

Team Contribution

Lastly, we wanted to identify how each pup contributed to the overall team. This was the most important and challenging thing to identify. We categorized contribution into four categories:

Based on this we have found the following:

Note on Skye: Her pup pack contains tools to fly which is reported on the least based on anecdotal data. However, given the current data set, I cannot objectively corroborate that without a lot of manual effort.

Data is everywhere and can allow us to make better decisions for our business. This nonsensical example is just one way a company can identify how a team is performing based on how well the team reports on their usage. Creative use of any existing data from a business can provide interesting insights, and I hope this silly example is, at minimum, a highlight of this point.