The most recommended data mining books

Who picked these books? Meet our 11 experts.

11 authors created a book list connected to data mining, and here are their favorite data mining books.
Shepherd is reader supported. When you buy books, we may earn an affiliate commission.

What type of data mining book?

Loading...

Book cover of The Elements of Statistical Learning: Data Mining, Inference, and Prediction

Chris Conlan Author Of Algorithmic Trading with Python: Quantitative Methods and Strategy Development

From my list on mathematics for quant finance.

Why am I passionate about this?

I am a financial data scientist. I think it is important that data scientists are highly specialized if they want to be effective in their careers. I run a business called Conlan Scientific out of Charlotte, NC where me and my team of financial data scientists tackle complicated machine learning problems for our clients. Quant trading is a gladiator’s arena of financial data science. Anyone can try it, but few succeed at it. I am sharing my top five list of math books that are essential to success in this field. I hope you enjoy.

Chris' book list on mathematics for quant finance

Chris Conlan Why did Chris love this book?

This book might as well be called Introduction to machine learning, and it is probably one of the only books truly deserving of the title. Did you know neural networks have been used for decades to scan checks at the bank? They are called Boltzman Machine. Have you ever heard of how decision trees were used in old-school data mining? You could only get them from proprietary software packages from the early 2000s.

In quant trading, you will constantly face compute power constraints, so it is invaluable to understand the mathematical foundations of the most old-school machine learning methods out there. Researchers 20 years ago used to do a lot of impressive work with a lot less computing power.

By Trevor Hastie, Robert Tibshirani, Jerome Friedman

Why should I read it?

2 authors picked The Elements of Statistical Learning as one of their favorite books, and they share why you should read it.

What is this book about?

This book describes the important ideas in a variety of fields such as medicine, biology, finance, and marketing in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of colour graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book.

This major…


Book cover of Fundamentals of Machine Learning for Predictive Data Analytics, Second Edition: Algorithms, Worked Examples, and Case Studies

Yuxi (Hayden) Liu Author Of Python Machine Learning By Example: Build intelligent systems using Python, TensorFlow 2, PyTorch, and scikit-learn

From my list on machine learning for beginners.

Why am I passionate about this?

I have been a machine learning engineer applying my ML expertise in computational advertising, and search domain. I am an author of 8 machine learning books. My first book was ranked the #1 bestseller in its category on Amazon in 2017 and 2018 and was translated into many languages. I am also a ML education enthusiast and used to teach ML courses in Toronto, Canada.  

Yuxi's book list on machine learning for beginners

Yuxi (Hayden) Liu Why did Yuxi love this book?

Another practical book that I highly recommend. Its intuitive structure is the first thing I like about it. It gives you a comprehensive walkthrough of the ML workflow, from data exploration to learning. It covers abundant practical guides that get you prepared for real world challenges, such as how to handle outliers and to impute missing data. As a ML practitioner, I appreciate the dedicated case studies throughout the entire book. They really excite learners for future real world applications.

By John D. Kelleher, Brian Mac Namee, Aoife D'Arcy

Why should I read it?

1 author picked Fundamentals of Machine Learning for Predictive Data Analytics, Second Edition as one of their favorite books, and they share why you should read it.

What is this book about?

The second edition of a comprehensive introduction to machine learning approaches used in predictive data analytics, covering both theory and practice.

Machine learning is often used to build predictive models by extracting patterns from large datasets. These models are used in predictive data analytics applications including price prediction, risk assessment, predicting customer behavior, and document classification. This introductory textbook offers a detailed and focused treatment of the most important machine learning approaches used in predictive data analytics, covering both theoretical concepts and practical applications. Technical and mathematical material is augmented with explanatory worked examples, and case studies illustrate the application…


Book cover of Information Quality: The Potential of Data and Analytics to Generate Knowledge

Ron S. Kenett Author Of The Real Work of Data Science: Turning Data into Information, Better Decisions, and Stronger Organizations

From my list on how numbers turn into information.

Why am I passionate about this?

I was trained as a mathematician but have always been motivated by problem-solving challenges. Statistics and analytics combine mathematical models with statistical thinking. My career has always focused on this combination and, as a statistician, you can apply it in a wide range of domains. The advent of big data and machine learning algorithms has opened up new opportunities for applied statisticians. This perspective complements computer science views on how to address data science. The Real Work of Data Science, covers 18 areas (18 chapters) that need to be pushed forward in order to turning data into information, better decisions, and stronger organizations

Ron's book list on how numbers turn into information

Ron S. Kenett Why did Ron love this book?

A lightly technical introduction to a comprehensive framework defining and evaluating the quality of information generated by statistical analysis. It expands the role of analytics by including dimensions that affect information quality such as data resolution, data integration, operationalization, and generalizability of findings. This wide-angle perspective provides a practical checklist that has been found useful in applications. Multiple case studies enable the reader to connect to his favorite topic, but also learn from other areas.

By Ron S. Kenett, Galit Shmueli,

Why should I read it?

1 author picked Information Quality as one of their favorite books, and they share why you should read it.

What is this book about?

Provides an important framework for data analysts in assessing the quality of data and its potential to provide meaningful insights through analysis Analytics and statistical analysis have become pervasive topics, mainly due to the growing availability of data and analytic tools. Technology, however, fails to deliver insights with added value if the quality of the information it generates is not assured. Information Quality (InfoQ) is a tool developed by the authors to assess the potential of a dataset to achieve a goal of interest, using data analysis. Whether the information quality of a dataset is sufficient is of practical importance…


Book cover of Programming Collective Intelligence: Building Smart Web 2.0 Applications

Yuxi (Hayden) Liu Author Of Python Machine Learning By Example: Build intelligent systems using Python, TensorFlow 2, PyTorch, and scikit-learn

From my list on machine learning for beginners.

Why am I passionate about this?

I have been a machine learning engineer applying my ML expertise in computational advertising, and search domain. I am an author of 8 machine learning books. My first book was ranked the #1 bestseller in its category on Amazon in 2017 and 2018 and was translated into many languages. I am also a ML education enthusiast and used to teach ML courses in Toronto, Canada.  

Yuxi's book list on machine learning for beginners

Yuxi (Hayden) Liu Why did Yuxi love this book?

This was my favorite book when I started my career. It talks about how information is processed, in an intelligent way, in the internet age. It acts as a tutorial to teach developers how to code our own ML programs, from online dating services, to document analyzer, and search engine. The author did an excellent job of explaining abstract ML algorithms with clear examples. His coding style in Python reads clearly, which makes the book more beginner-friendly.

Don’t get disappointed when you know this book is more than a decade old. It was a visionary book back in the day and it is still relevant today.

By Toby Segaran,

Why should I read it?

1 author picked Programming Collective Intelligence as one of their favorite books, and they share why you should read it.

What is this book about?

Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it. Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing,…


Book cover of Be Data Literate: The Data Literacy Skills Everyone Needs to Succeed

Jeremy Adamson Author Of Minding the Machines: Building and Leading Data Science and Analytics Teams

From my list on for data science and analytics leaders.

Why am I passionate about this?

I am a leader in analytics and AI strategy, and have a broad range of experience in aviation, energy, financial services, and the public sector.  I have worked with several major organizations to help them establish a leadership position in data science and to unlock real business value using advanced analytics. 

Jeremy's book list on for data science and analytics leaders

Jeremy Adamson Why did Jeremy love this book?

Not everybody needs to be a data scientist, but everybody does need to be data literate. Without an intentional focus on evangelism and building a strong data culture in your organization it will be an uphill battle to make meaningful change. This book helps individuals and leaders to understand what data literacy is, and how we can build it like any other skill.

By Jordan Morrow,

Why should I read it?

1 author picked Be Data Literate as one of their favorite books, and they share why you should read it.

What is this book about?

In the fast moving world of the fourth industrial revolution not everyone needs to be a data scientist but everyone should be data literate, with the ability to read, analyze and communicate with data. It is not enough for a business to have the best data if those using it don't understand the right questions to ask or how to use the information generated to make decisions. Be Data Literate is the essential guide to developing the curiosity, creativity and critical thinking necessary to make anyone data literate, without retraining as a data scientist or statistician. With learnings to show…


Book cover of Advanced Analytics with Spark: Patterns for Learning from Data at Scale

Tomasz Lelek Author Of Software Mistakes and Tradeoffs: How to make good programming decisions

From my list on big data processing ecosystem.

Why am I passionate about this?

I am motivated by working on products that many people use. I've been a part of companies that deliver products impacting millions of people. To achieve it, I am working in the Big Data ecosystem and striving to simplify it by contributing to Dremio's Data LakeHouse solution. I worked on projects using Spark, HDFS, Cassandra, and Kafka technologies. I have been working in the software engineering industry for ten years now, and I've tried to share my experience and lessons learned in the Software Mistakes and Tradeoffs book, hoping that it will allow current and the next generation of engineers to create better software, leading to more happy users.

Tomasz's book list on big data processing ecosystem

Tomasz Lelek Why did Tomasz love this book?

Apache Spark has a very high point of entry for newcomers to the Big Data ecosystem.

However, it is a key tool that almost everyone is using for running distributed processing. I recommend everyone to read this book before delving into production solutions based on Apache Spark.

This book will allow you to alleviate many spark problems, such as serialization, memory utilization, and parallelization of processing.

By Sandy Ryza, Uri Laserson, Sean Owen , Josh Wills

Why should I read it?

1 author picked Advanced Analytics with Spark as one of their favorite books, and they share why you should read it.

What is this book about?

In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. You'll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques-classification, collaborative filtering, and anomaly detection among others-to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you'll find these patterns useful for…


Book cover of R for Data Science: Import, Tidy, Transform, Visualize, and Model Data

Tilman M. Davies Author Of The Book of R: A First Course in Programming and Statistics

From my list on intro to programming and data science with R.

Why am I passionate about this?

I’m an applied statistician and academic researcher/lecturer at New Zealand’s oldest university – the University of Otago. R facilitates everything I do – research, academic publication, and teaching. It’s the latter part of my job that motivated my own book on R. From first-year statistics students who have never seen R to my own Ph.D. students using R to implement novel and highly complex statistical methods and models, my experience is that all ultimately love the ease with which the R language permits exploration, visualisation, analysis, and inference of one’s data. The ever-growing need in today’s society for skilled statisticians and data scientists means there's never been a better time to learn this essential language.

Tilman's book list on intro to programming and data science with R

Tilman M. Davies Why did Tilman love this book?

For those intending to use R with an eye on the popular 'Tidyverse' suite of packages – which facilitate the handling, manipulation, and visualisation of data setsit's hard to go past this book. From the founding contributors of the RStudio/Tidyverse worlds, this is a great way to learn about this dialect of R against the overarching backdrop of statistical data analysis and data science.

By Hadley Wickham, Garrett Grolemund,

Why should I read it?

1 author picked R for Data Science as one of their favorite books, and they share why you should read it.

What is this book about?

Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible. Authors Hadley Wickham and Garrett Grolemund guide you through the steps of importing, wrangling, exploring, and modeling your data and communicating the results. You'll get a complete, big-picture understanding of the data science cycle, along…


Book cover of Introduction to Machine Learning with Python: A Guide for Data Scientists

Yuxi (Hayden) Liu Author Of Python Machine Learning By Example: Build intelligent systems using Python, TensorFlow 2, PyTorch, and scikit-learn

From my list on machine learning for beginners.

Why am I passionate about this?

I have been a machine learning engineer applying my ML expertise in computational advertising, and search domain. I am an author of 8 machine learning books. My first book was ranked the #1 bestseller in its category on Amazon in 2017 and 2018 and was translated into many languages. I am also a ML education enthusiast and used to teach ML courses in Toronto, Canada.  

Yuxi's book list on machine learning for beginners

Yuxi (Hayden) Liu Why did Yuxi love this book?

This book is more advanced than the first book I recommended. It presents ML theoretical and practical aspects step-by-step from the bottom up. Each chapter elaborates at length on a core building block in the ML life cycle. For example, feature engineering, supervised learning, and model evaluation have their own separate chapters, with intuitive discussions of how they work. Most of the concept is taught through the simple yet powerful Python Module Scikit-Learn so it won’t overburden you with heavy programming. This book will be perfect for practitioners with some understanding of statistics and linear algebra.

By Andreas C. Müller, Sarah Guido,

Why should I read it?

1 author picked Introduction to Machine Learning with Python as one of their favorite books, and they share why you should read it.

What is this book about?

Machine learning has become an integral part of many commercial applications and research projects, but this field is not exclusive to large companies with extensive research teams. If you use Python, even as a beginner, this book will teach you practical ways to build your own machine learning solutions. With all the data available today, machine learning applications are limited only by your imagination. You'll learn the steps necessary to create a successful machine-learning application with Python and the scikit-learn library. Authors Andreas Muller and Sarah Guido focus on the practical aspects of using machine learning algorithms, rather than the…


Book cover of Interviewing Users: How to Uncover Compelling Insights

Gregg Bernstein Author Of Research Practice: Perspectives from UX researchers in a changing field

From my list on understanding user research.

Why am I passionate about this?

After a career that took me from designer to design professor, I’ve spent the past decade leading user research practices for growing product organizations. I’m excited about user research because it positions us closer to the people we design for, and challenges us to capture and explain complex scenarios in service to them. Though there are many books that teach user research, my list of recommendations is meant to demonstrate why we research, how we make sense of what we learn, and where research might take us.

Gregg's book list on understanding user research

Gregg Bernstein Why did Gregg love this book?

Listening to users is essential to product design and development, full stop. Interviews allow us to understand who uses our products and the contexts our products fit into, and Steve Portigal demonstrates how to do it like a pro in Interviewing Users. Steve breaks down every angle of the interview process, from planning to conducting to documentation. (I particularly love Steve’s approach to the interview field guide in chapter 3!)

By Steve Portigal,

Why should I read it?

1 author picked Interviewing Users as one of their favorite books, and they share why you should read it.

What is this book about?

Interviewing is a foundational user research tool that people assume they already possess. Everyone can ask questions, right? Unfortunately, that's not the case. Interviewing Users provides invaluable interviewing techniques and tools that enable you to conduct informative interviews with anyone. You'll move from simply gathering data to uncovering powerful insights about people.


Book cover of Calling Bullshit: The Art of Skepticism in a Data-Driven World

Alex Edmans Author Of Grow the Pie: How Great Companies Deliver Both Purpose and Profit

From Alex's 3 favorite reads in 2023.

Why am I passionate about this?

Author Spirited Unorthodox Provocative Evidence-based Trustworthy

Alex's 3 favorite reads in 2023

Alex Edmans Why did Alex love this book?

This book highlights the many ways in which we are frequently misled by data – falling for claims of causation when the data only supports correlation, not recognising that data may be a selected sample that only presents a small part of the picture, and ignoring researchers’ incentives to hand-pick the methodology that gives them the results they want.

This rigorous but highly readable book explains how to spot the common ways in which people mislead us, either deliberately or unintentionally, and call it out in a constructive and professional way.

By Carl T. Bergstrom, Jevin D. West,

Why should I read it?

3 authors picked Calling Bullshit as one of their favorite books, and they share why you should read it.

What is this book about?

Bullshit isn’t what it used to be. Now, two science professors give us the tools to dismantle misinformation and think clearly in a world of fake news and bad data.
 
“A modern classic . . . a straight-talking survival guide to the mean streets of a dying democracy and a global pandemic.”—Wired

Misinformation, disinformation, and fake news abound and it’s increasingly difficult to know what’s true. Our media environment has become hyperpartisan. Science is conducted by press release. Startup culture elevates bullshit to high art. We are fairly well equipped to spot the sort of old-school bullshit that is based…