Scraping No Starch Press Books
Introduction to the O’Reilly website and description of No Starch Press books
some context about the page source and formatting of books on the O’Reilly website
understanding how the data about each book is embedded into the html content of the page
testing the publisher entries for all 350 books
extracting the information about the first 100 books
extracting the information from all 350 books
converting the strings about pages into integers
building a data frame from the book data
build the big data frame with all 350 books (even when we drop the duplicates)