Baseball Examples Using the Lahman Database
These examples depend on this database:
/anvil/projects/tdm/data/lahman/lahman.db
Stealing Bases for Several Teams
For this question, first find the teamID, yearID, and playerID for players who stole 40 or more bases for an individual team in an individual season, during the years since 1901.
Then use this table as a subquery, to find the players who achieved this feat for at least 5 distinct teams during their career.
SELECT teamID, yearID, playerID FROM Batting WHERE (SB >= 40) AND (yearID >= 1901);
SELECT COUNT(DISTINCT teamID) as mycounts, playerID FROM (SELECT teamID, yearID, playerID FROM Batting WHERE (SB >= 40) AND (yearID >= 1901)) GROUP BY playerID HAVING mycounts >= 5;
More Walks than Strikeouts
There are some players who have more walks than strikeouts. Notice the use of HAVING
instead of WHERE
because we are comparing a function of the table values. When we SUM
values, SQL needs to calculate the results before the query can be displayed, so we need to use HAVING
instead of WHERE
.
SELECT playerID, SUM(BB), SUM(SO) FROM Batting GROUP BY playerID HAVING SUM(BB) > SUM(SO);
How many such players achieved this feat, in other words, had more walks than strikeouts? We can use our work above as a subquery, and COUNT
the number of rows from that subquery:
SELECT COUNT(*) FROM (SELECT playerID, SUM(BB), SUM(SO) FROM Batting GROUP BY playerID HAVING SUM(BB) > SUM(SO));
We can also find the total number of baseball players:
SELECT COUNT(DISTINCT playerID) FROM Batting;
Now that we know there are 2321 players who achieved more walks than strikeouts, and there are 20730 players altogether in the database, we conclude that only 11.1 percent of players (altogether) distinguished this feat in their careers!