Books¶
Foundational¶
- DAMA-DMBOK 2: Data Management Body of Knowledge — the canonical reference for data management practices. Heavy but authoritative.
- Python for Data Analysis (Wes McKinney) — by the creator of pandas. Free online.
- Storytelling with Data (Cole Knaflic) — the book on data communication.
Statistics¶
- OpenIntro Statistics — free, rigorous, accessible.
- Introduction to Statistical Learning (ISLR) — free; ML/stats with R or Python labs.
- The Elements of Statistical Learning — free, advanced; classic.
Visualization¶
- Edward Tufte — The Visual Display of Quantitative Information — design canon.
- Edward Tufte — Envisioning Information
- Information Dashboard Design (Stephen Few)
- The Functional Art (Alberto Cairo)
SQL & databases¶
- Designing Data-Intensive Applications (Martin Kleppmann) — modern data systems.
- SQL Antipatterns (Bill Karwin) — what not to do.
- The Data Warehouse Toolkit (Kimball) — dimensional modeling classic.
Business intelligence & analytics¶
- Lean Analytics (Croll & Yoskovitz) — startup metrics.
- Competing on Analytics (Davenport & Harris)
- How to Measure Anything (Douglas Hubbard)
Machine learning¶
- Hands-On ML with Scikit-Learn, Keras & TensorFlow (Géron)
- Pattern Recognition and Machine Learning (Bishop)
- Probabilistic Machine Learning (Murphy) — free.
Communication & soft skills¶
R-specific¶
Reading order suggestion¶
For a brand-new analyst:
- Storytelling with Data (Knaflic)
- Python for Data Analysis (McKinney)
- OpenIntro Statistics
- Lean Analytics (or domain-specific equivalent)
- Designing Data-Intensive Applications (when you need scale)