Processing 10TB of Wikipedia Page Views - Part 2

March 21, 2020 in Data

This is the second of a two part series in which we focus on interesting queries and visualizations using the data pipeline we created in part one.

Processing 10TB of Wikipedia Page Views - Part 1

March 11, 2020 in Data

What’s bigger than Wikipedia? Spoiler: Wikipedia page views. This is the first of a two part series in which we’ll explore how to build a data engineering solution to process all 10TB of published wikipedia pageviews and entity data.