Each podcast has at least one category, most have multiple categories listed. Here we plot a bar chart showing the distributions of the first-listed category through the fourth-listed category. We see that 'Comedy' is by far the most common podcast category. We also see that 'Society & Culture' and 'Arts' are the most popular second-listed category, with both being more popular as second-listed categories than they are first-listed categories.
Using the first-listed category for each podcast, we looked at the average number of guests per podcast for each category, where the error bars show the standard deviation. Even though there are far more 'Comedy' podcasts than any other, they have a typical average number of guests. 'Education' podcasts, on the other hand, have by far the highest average number of guests. This is because there are only three such podcasts in the network, two of which, Entrepreneurial Thought Leaders and Econtalk, began in 2005 and 2007, and have biweekly and weekly episodes, respectively.
Only the podcasts have explicit categories. Each person's category is something that must be assigned. The way we do that is the following. If a person is a host, we figure out which of their podcast they've spent the most total time hosting (the sum of the durations of guest appearances) and assign to them the first-listed category of that podcast. If a person is not a host and only a guest, we figure out which podcast they've spend the most time on (again, the sum of the durations of guest appearances) and assign to them the first-listed category of that podcast. We plot this below as a bar chart, and we chose to use a logarithmic y-axis due to the large range of number of people for different categories.
Constructing a network graph where guests form direct connections with each of the hosts of podcast they appear on (such that all nodes are people, no podcast nodes), we can study the category mixing frequencies. That is, the frequency of links between people of the same or different categories. Due to the way each person's category is assigned, as explained above, it is not surprising that most of links in this network are between people of the same category. We do, however see that 'Comedy' has plenty of connections with 'Society & Culture', 'TV & Film', and 'Sports & Recreation'. We also see strong crossovers between 'Science & Medicine' and 'Society & Culture', 'Health' and 'Sports & Recreation', 'Business' and 'Education', and 'Arts' and 'TV & Film', all of which seem to intuitively make sense in term of overlap of the content of those categories.
Now that both people and podcasts have categories, we can look at the categorical bias for each podcast. The categorical bias describes the difference in fraction of a podcasts guests whose assigned category matches the first-listed category for that podcast from the fraction of all the people in the network whose category is the same as the podcasts first-listed category. The formal mathematical definition is given here. If the podcast's guest's category distribution matched that of the whole network, their category bias would be 1.0. We calculated the bias for each podcast, then calculated the average podcast's category bias per category and plotted here. 'Comedy', 'Society & Culture', and 'Education' show consistenly high category biases. 'Arts' has a very low category bias because of the podcast Off Camera with Sam Jones, which lists its first category as 'Arts' but mostly has guests who are celebrities and actors with categories of 'TV & Film' and 'Comedy' primarily.