Working with Major League Baseball Batting Data

We’ll illustrate how to import a data file as a Pandas DataFrame into the IPython namespace using the Data Import Tool.

The Data Set

Here’s the data used in this example: mlb_batting_2008.csv.

Step 1: Opening the Data Set

The Data Import Tool can be used to open a data file in the following ways:

  • From the Canopy Editor, by clicking on the File –> Import Data Menu Option and selecting the datasource you’d like to use – a file, a URL, or the clipboard
  • Right-clicking on the data file in the File Browser and selecting Import Data

After opening the file, you will be able to use the Data Import Tool to browse through the data:

../_images/mlb-dit-window.png

Figure : Data Import Tool Window

Step 2: Deleting Rows

To delete rows that contain records for rookie players, we can use the Delete Rows Where command, utilizing the ROOKIE column. Under the Transforms menu, select the appropriate command:

../_images/transform-menu-delete-where.png

Figure : Transforms Menu

Then enter a boolean expression to filter the data:

../_images/mlb-delete-rows-where-dialog.png

Figure : Delete Rows Where ROOKIE == True

Step 3: Loading the DataFrame into Canopy

Clicking on the Use DataFrame button at the bottom of the Data Import Tool will load the dataset as a Pandas DataFrame object into Canopy’s IPython namespace. The Data Import Tool automatically uses the name of the data file, minus the file extension, as the name of the resulting Pandas DataFrame. This can be changed in the Configuration Pane. The console will provide messages with some information regarding your recent import:

../_images/mlb-ipython-prompt.png

Figure : DataFrame Loaded into Canopy

We can now view the contents of Pandas DataFrame object from the IPython terminal and use it for further numerical analysis. A trivial exercise using the DataFrame, as shown below, would be to plot a histogram of the salaries of the players contained in the data set:

../_images/mlb-salary-histogram.png

Figure : Histogram Plot