This is the second of a two part series in which we focus on interesting queries and visualizations using the data pipeline we created in part one.
What’s bigger than Wikipedia? Spoiler: Wikipedia page views. This is the first of a two part series in which we’ll explore how to build a data engineering solution to process all 10TB of published wikipedia pageviews and entity data.