Frequently Asked Questions

How to load data using Canopy’s Data Import Tool?

There are multiple ways in which a data set can be loaded using Enthought Canopy’s Data Import Tool. See Launching the Data Import Tool for an overview and Working with Major League Baseball Batting Data, Working with Wind Speed Data for specific use cases.

Which file types can be loaded by the Data Import Tool?

Generally, the tool will attempt to load any text-based file format (.txt, .csv, etc) and the more structure it has the better the tool can perform. However, certain file types are explicitly not accepted due to their format:

  • .py
  • .xml
  • .pdf
  • .json
  • .h5
  • .gif
  • .jpg
  • .png

Why is the Data Import Tool not displaying the dataset properly?

After reading your file, the Data Import Tool tries to infer certain characteristics about the supplied data, such as what the comment character might be present, what the column separator is, whether or not there is a row containing column headers, and so on. Sometimes, our algorithms might not detect one, or many, of these parameters correctly, in which case your dataset might not be displayed properly. You can set these parameters yourselves (See Read Data Command to know how). Clicking Refresh Data should now display the dataset as you would expect it to be.

Hovering over these parameters in the Data Import Tool should also provide a tooltip with additional information on what the parameter is used for by the Tool.

How do I determine whether my data contains Null/NaN values?

When the Data Import Tool reads the file, it may detect missing values. Additionally, the tool may decide to convert a column to a particular data type. If certain values cannot be coerced into the new data type, the tool will set that value to Null in the data but keep a note of the original value and display that content via a mask. The user can easily display the positions of the Null/NaN values in the DataFrame by ticking the Highlight Null Values checkbox at the bottom left corner of the DataFrame display.

Note that the Tool assumes certain values as Null/NaN values during the initial read-in. Refer to Missing Values for more.

How to sort column data?

See this description in our Commands section for how to sort a column.

How to insert a new column?

The Insert Column command can be accessed by right-clicking on a column header or from the Transform menu of the Data Import Tool. See Insert Column for an overview of how the command can be used and Step 2: Convert Year by Inserting Column, Step 5: Insert Column with Converted Wind Speed for specific use cases.

How to join multiple columns to create a new one?

The Join Columns command can be accessed by right-clicking on a column header after selecting multiple columns or from the Transform menu option of the Data Import Tool. See Join Columns for an overview of how the command can be used and Working with Wind Speed Data for a specific use case.

How to convert a column type?

The Data Import Tool attempts to auto-detect the column type when the dataset is being loaded, or when new columns are created. However, the user might understand the data set better or desire a specific conversion. In that case, the user can choose to change the conversion by editing the appropriate Type conversion command in the history, or by right-clicking on the column header and choosing the Convert... command. The available options are float, int, string, datetime, bool.

How to remove specific rows/columns?

The user can choose to delete individual entries in a row/column by pressing the delete shortcut. To remove full rows/columns, the user can select and right-click on any of the specific rows/columns and choose the Delete option.

The user might also need to delete entries in the data set based on a user-set criterion. This can be done using the Delete Rows Where command. See Working with Major League Baseball Batting Data and Working with Wind Speed Data for specific use cases.

What is the Python/Pandas Code View?

As cleaning, manipulating and operating on a data set can involve multiple operations, it would be time-consuming for the user if they had to perform the operations every time they wanted to import the file.

The Python/Pandas Code View displays the auto-generated Python/Pandas code for each command that the user has performed on the data set. This Python/Pandas code can then be exported and saved as a script for reuse later.

How to view the data after loading the dataset to the IPython console?

When you’ve finished importing and click “Use DataFrame”, the data set is loaded as a Pandas DataFrame in the Canopy IPython namespace. At this point, standard Pandas commands can be used to inspect the data, such as df_name.head() and df_name.info(). For further information on working with DataFrames, refer to the Pandas Documentation.

The user can also use the view(df_name) command, which opens a GUI to allow the user to view the entire data set with easy scrolling.

Why is my file being pre-loaded with commands?

After successfully importing a DataFrame into the IPython console, the Data Import Tool autosaves the operations performed on the file. When the file is reloaded using the Tool, the Tool automatically loads the operations previously performed.