Frequently Asked Questions¶
How to load data using Canopy’s Data Import Tool?¶
There are multiple ways in which a data set can be loaded using Enthought Canopy’s Data Import Tool. See Launching the Data Import Tool for an overview and Working with Major League Baseball Batting Data, Working with Wind Speed Data for specific use cases.
Which file types can be loaded by the Data Import Tool?¶
Generally, the tool will attempt to load any text-based file format (
.csv, etc) and the more structure it has the better the tool can perform.
However, certain file types are explicitly not accepted due to their format:
Why is the Data Import Tool not displaying the dataset properly?¶
After reading your file, the Data Import Tool tries to infer certain
characteristics about the supplied data, such as what the comment character
might be present, what the column separator is, whether or not there is a row
containing column headers, and so on. Sometimes, our algorithms might not
detect one, or many, of these parameters correctly, in which case your dataset
might not be displayed properly. You can set these parameters yourselves (See
Read Data Command to know how). Clicking
Data should now display the dataset as you would expect it to be.
Hovering over these parameters in the Data Import Tool should also provide a tooltip with additional information on what the parameter is used for by the Tool.
How do I determine whether my data contains Null/NaN values?¶
When the Data Import Tool reads the file, it may detect missing values.
Additionally, the tool may decide to convert a column to a particular data
type. If certain values cannot be coerced into the new data type, the tool will
set that value to Null in the data but keep a note of the original value and
display that content via a mask. The user can easily display the positions of
the Null/NaN values in the DataFrame by ticking the
Highlight Null Values
checkbox at the bottom left corner of the DataFrame display.
Note that the Tool assumes certain values as Null/NaN values during the initial read-in. Refer to Missing Values for more.
How to insert a new column?¶
The Insert Column command can be accessed by right-clicking on a column header or from the Transform menu of the Data Import Tool. See Insert Column for an overview of how the command can be used and Step 2: Convert Year by Inserting Column, Step 5: Insert Column with Converted Wind Speed for specific use cases.
How to join multiple columns to create a new one?¶
The Join Columns command can be accessed by right-clicking on a column header after selecting multiple columns or from the Transform menu option of the Data Import Tool. See Join Columns for an overview of how the command can be used and Working with Wind Speed Data for a specific use case.
How to convert a column type?¶
The Data Import Tool attempts to auto-detect the column type when the dataset
is being loaded, or when new columns are created. However, the user might
understand the data set better or desire a specific conversion. In that case,
the user can choose to change the conversion by editing the appropriate Type
conversion command in the history, or by right-clicking on the column header
and choosing the Convert... command. The available
How to remove specific rows/columns?¶
The user can choose to delete individual entries in a row/column by pressing
the delete shortcut. To remove full rows/columns, the user can select and
right-click on any of the specific rows/columns and choose the
The user might also need to delete entries in the data set based on a user-set criterion. This can be done using the Delete Rows Where command. See Working with Major League Baseball Batting Data and Working with Wind Speed Data for specific use cases.
What is the Python/Pandas Code View?¶
As cleaning, manipulating and operating on a data set can involve multiple operations, it would be time-consuming for the user if they had to perform the operations every time they wanted to import the file.
The Python/Pandas Code View displays the auto-generated Python/Pandas code for each command that the user has performed on the data set. This Python/Pandas code can then be exported and saved as a script for reuse later.
How to view the data after loading the dataset to the IPython console?¶
When you’ve finished importing and click “Use DataFrame”, the data set is
loaded as a Pandas DataFrame in the Canopy IPython namespace. At this point,
standard Pandas commands can be used to inspect the data, such as
df_name.info(). For further information on working
with DataFrames, refer to the Pandas Documentation.
The user can also use the
view(df_name) command, which opens a GUI to allow
the user to view the entire data set with easy scrolling.
Why is my file being pre-loaded with commands?¶
After successfully importing a DataFrame into the IPython console, the Data Import Tool autosaves the operations performed on the file. When the file is reloaded using the Tool, the Tool automatically loads the operations previously performed.