Check out this nice summary of common ways we go wrong interpreting data, including one of my favorites, Simpson’s Paradox.
mco.dev/datasette datasette home page video tutorial SQLite is not a toy database - “SQLite is the most common DBMS in the world” Datasets: today’s demo examples fivethirtyeight install: pip install csvs-to-sqlite datasette datasette-vega convert: csvs-to-sqlite games.csv games.db start: datasette games.db simple query: select winner, count(*) from games group by winner complex query: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 select rating, victory_status, a, b, cast(a as float)/cast(b as float)*100 from (select case when white_rating between 0 and 1200 then '0-1200' when white_rating between 1200 and 1500 then '1200-1500' when white_rating between 1500 and 1800 then '1500-1800' when white_rating between 1800 and 2100 then '1800-2100' when white_rating between 2100 and 2400 then '2100-2400' else '2400+' end as rating, victory_status, count(*) as a, sum(count(victory_status)) over (partition by case when white_rating between 0 and 1200 then '0-1200' when white_rating between 1200 and 1500 then '1200-1500' when white_rating between 1500 and 1800 then '1500-1800' when white_rating between 1800 and 2100 then '1800-2100' when white_rating between 2100 and 2400 then '2100-2400' else '2400+' end) as b from games group by rating, victory_status) publish: datasette publish cloudrun games.