Scraping No Starch Press Books

Introduction to the O’Reilly website and description of No Starch Press books

some context about the page source and formatting of books on the O’Reilly website

understanding how the data about each book is embedded into the html content of the page

testing the publisher entries for all 350 books

extracting the information about the first 100 books

extracting the information from all 350 books

converting the strings about pages into integers

building a data frame from the book data

build the big data frame with all 350 books (even when we drop the duplicates)