Lahman Baseball Database

Source

An overview of the database is provided here:

We imported the 2022 csv version:

Description of the Data

The data set contains many tables about baseball performance statistics, back to 1871.

We build this data into a sqlite database, as described below.

Transformations to the original data source

First we unzip the downloaded file and extract the csv files that we need. We remove the head from each file because we already loaded the headers of each table into our sql script, given below.

unzip v2023.1.zip
mv baseballdatabank-2023.1/core/*.csv ./
mv baseballdatabank-2023.1/contrib/*.csv ./
rm v2023.1.zip
rm -rf baseballdatabank-2023.1
sed -i '1d' *.csv

Then we create a new sqlite database:

sqlite3 lahman.db

Inside sqlite3, we do the following sql import statements, with many thanks to WebucatorTraining for this script, which we modified:

.read mdw-updated-lahman-mysql.sql
.separator ,
.import AwardsManagers.csv AwardsManagers
.import AwardsPlayers.csv AwardsPlayers
.import AwardsShareManagers.csv AwardsShareManagers
.import AwardsSharePlayers.csv AwardsSharePlayers
.import CollegePlaying.csv CollegePlaying
.import HallOfFame.csv HallOfFame
.import Salaries.csv Salaries
.import Schools.csv Schools
.import AllstarFull.csv AllstarFull
.import Appearances.csv Appearances
.import Batting.csv Batting
.import BattingPost.csv BattingPost
.import Fielding.csv Fielding
.import FieldingOF.csv FieldingOF
.import FieldingOFsplit.csv FieldingOFsplit
.import FieldingPost.csv FieldingPost
.import HomeGames.csv HomeGames
.import Managers.csv Managers
.import ManagersHalf.csv ManagersHalf
.import Parks.csv Parks
.import People.csv People
.import Pitching.csv Pitching
.import PitchingPost.csv PitchingPost
.import SeriesPost.csv SeriesPost
.import Teams.csv Teams
.import TeamsFranchises.csv TeamsFranchises
.import TeamsHalf.csv TeamsHalf

then type Control-D to quit sqlite3

back in the bash shell, type:

rm *.csv
rm mdw-updated-lahman-mysql.sql