Data Import Tool Release Notes

This section lists major new features, fixes, and improvements. For more information about key known issues with using the tool, please see our Known Issues article in the Enthought Knowledge Base.

Release Summary

Release notes for v1.0.9 (18 January 2017)

New : Support pandas 0.19.2 (PR #1191)

With this release, the Data Import Tool supports the recently released v0.19.2 of the pandas library. Overall, the Tool has been tested to work with v0.16.2, v0.17.1, v0.18.0 and v0.19.2 of pandas, preventing you from worrying about changes to the pandas API.

Fix : Remove sort indicator if value in a sorted column is changed (#1163)

With this release, the sort indicator on a sorted column is removed if the value of any cell in that column is changed.

Fix : Fix interaction between promoting a row to header and sorting a column (#1165)

Previously, sorting a column and promoting a row to header might cause the sort indicator to move to an unexpected and wrong column. This bug has been fixed with this release.

Fix : Fix behavior of Highlight NaNs checkbox (#727)

Previously, on Windows and Linux, checking the Highlight NaNs checkbox didn’t highlight the NaN values immediately. This has been fixed in this release, and checking the checkbox immediately highlights the NaN values in the DataFrame.

Release notes for v1.0.8

Improve : Prevent setting a column with non-unique values as index (#929)

The Tool now prevents users from setting the data frame index to a column which has non-unique values. This is because it can lead to unintended consequences when using commands (such as DeleteRows) which use the index value to reference rows as they would face ambiguity in referring to the correct row.

Improve : Switch off column infer and convert on large files (#1010)

The Tool automatically infers the type of a column and converts the column. This feature can be time consuming on very large files and has been switched off for the moment.

Improve : Support multi-column sorting (#1151)

With this release, sorting using the Tool is stable i.e. sorting preserves any existing order in the DataFrame.

Improve : Add sorting markers on all columns (#649)

With this release, sort indicators are displayed on all columns, making it obvious that columns can be sorted using the Tool.

Fix : Make sort indicator stick to the column (#907)

Previously, the sort indicator didn’t stick to the sorted column if columns were added or removed from the left of the sorted column. This has now been corrected.

Fix : Follow convention for the sort icon (#1143)

Previously, the sort icon convention used by the Tool was the opposite of what is used in general. This has now been corrected.

Improve : Improve performance of copying cell data to clipboard (PR #1172)

Copying data from the cells of a data frame to the clipboard has been sped up sigfinicantly with this release.

Improve : Last updated template is used if multiple templates match (#1142)

If two or more template files are found by the Tool to be relevant, the Tool chooses the template file that was most recently created/updated.

Improve : Explicitly convey the columns that have been renamed (#1115)

Previously, the Tool renamed column names that were duplicate but it did so without the user of the change. With this release, we explicitly convey what columns were renamed.

Fix : Saving dataframe to csv file is broken when unicode data is involved (#1162)

Previously, saving a DataFrame that contains unicode characters raised an error. This has now been fixed.

Fix : Raise an appropriate error when an Excel file is passed to the DIT (#1161)

Previously, the Tool reported a unicode error when it was passed an Excel file. The Tool currently does not support loading Excel files and it now raises an custom error explicitly mentioning the same.

Fix : Fix problems with generating Template files on Windows (#665)

Previously, a corrupt template file was created when the command history contained a DeleteRows command. This has now been fixed.

Release notes for v1.0.7

New : Save DataFrames as csv/xlsx file (#872)

With this release, the Data Import Tool allows you to save the DataFrame as a csv file or as an xlsx file.

Improve : Don’t infer column types when a template is used (#1093)

With this release, we have modified the Tool to prevent inferring the column types of the DataFrame if a Template was used. Instead, we depend on any column conversion commands present in the Template file to do the job.

Improve : Templates are not loaded when import uses clipboard data (#1064)

With this release, we don’t look for Templates if Clipboard data is being loaded into the Tool.

Improve : Use pattern matching to discover relevant Templates (#1065)

In the previous version of the Tool, a Template was loaded only if it’s name matched that of the DataFrame. With this update, if a Template’s name is similar to the DataFrame’s name, then it is loaded. This way, a single template file can be used to clean and transform numerous similarly-named files.

Fix : Add missing kwargs in exported pandas code (#1117)

Previously, the exported script was missing a few keyword arguments. These have now been added while calling the respective pandas code.

Fix : Fix conversion of columns containing NaNs (#876, #1131)

Previously, converting a Float column containing NaNs to an Integer column resulted in no change. Similarly, converting a Int column containing NaNs to a Boolean column also gave erroneous results. This was happening because of the way column conversion is handled by pandas.

With this release, float values are converted to ints and int values are converted to bools irrespective of whether or not the columns contain NaNs.

Fix : Reproduce column stripping using exported script (#1085)

With this release, the exported script generates DataFrames which are stripped of whitespaces.

Fix : Detect tab delimiter correctly (#1135)

A Tab delimiter is now automatically detected by the Tool with this update.

Fix : Prevent Floats from being inferred as Ints (#1129)

Previously, a column containing Float values was being inferred as an Int column. This behavior has been fixed with this release.

Fix : Correct for sampling problems while inferring column type (#1033)

With this release, we take the underlying Column’s dtype while inferring column type.

Release notes for v1.0.6 (28 October 2016)

New: Support pandas 0.19.0 (#1046)

With this release, the Data Import Tool supports the recently released v0.19.0 of the pandas library. Overall, the Tool now works with v0.16.2, v0.17.1, v0.18.0 and v0.19.0 of pandas, preventing you from worrying about changes to the pandas API.

New: DeleteEmptyColumns command (#828)

We added a new command DeleteEmptyColumns to make it easier to remove columns containing NaNs in your data.

New: Extending the ability ot delete rows/columns (#210)

Using pandas, the user can remove rows/columns in two ways - only if all values in the row/column are NaNs or if any value in the row/column is a NaN. With this release, we’ve incorporated this ability into our DeleteEmptyRows and DeleteEmptyColumns commands.

New: Demography of the UK use case (#1011)

We have added a new use case outlining how the Data Import Tool can be used to extract and load tables from any wikipedia page, choosing from among multiple tables on a page and performing operations on the DataFrame.

Improve: Python code is generated even for commands with error status (#1069)

Previously, the Tool didn’t generate Python Code if any of the commands executed were erreneous. We have changed this behavior with this release.

Fix: Python code when user reads data from Clipboard (#1066)

Previously, the code generated when the user loads data from the Clipboard was missing a few additional arguments to the relevant pandas commands. This has been updated with this release and the generated Python Code helps the user reproduce his actions better.

Release notes for v1.0.5 (22 September 2016)

New: Autoload previously applied commands to a file (#1022, #582)

The Data Import Tool now saves the set of operations a user performs on a file into a handy template file. When the user loads the same file into the Tool a second time, the Tool automatically loads the template and applies the set of commands. The user can disable the auto-load by manually disabling the individual commands.

New: Autosave generated python scripts and export location to Canopy File Browser (#965)

The Data Import Tool now autosaves the generated python code into your home directory, ~ on OSX and Linux and \Users\User\ on Windows. And, Canopy gets informed of the autosave location with every successful import, which the user can browse through in the Canopy File Browser.

New: Explicitly set Missing Values attribute of the initial read-in (#198, #635)

Previously, the Data Import Tool implicitly assumed that the string NA represented None. This has now been made explicit, and all values that the Data Import Tool assumes as None are displayed in the Missing Values field of the initial read-in command. For a full list of all values that the Tool assumes to mean None, refer to Missing Values.

Fix: Support copying Unicode cell data (#953)

Previosly, the Data Import Tool didn’t support copying cells containing Unicode data. This has been fixed with this release.

Fix: Support splitting columns using unicode delimiters (#1045)

Previously, the Data Import Tool didn’t support splitting a column using a unicode delimiter. This has been fixed with this release.

Improve: DIT no longer raises an error when a file is passed with < 3 lines (#982)

Previously, the Data Import Tool raised an error when passed a file with less than 3 lines in it. This restriction has now been lifted.

Release notes for v1.0.4 (19 August 2016)

Fix: Loading the Data Import Tool from Clipboard on Linux now works (#977)

Previously, trying to load data from the clipboard into the Data Import Tool didn’t work as expected. This has now been fixed in v1.0.4

New: Allow user to choose a specific table on a HTML page via an index (#999)

Previously, if the user was interested in choosing a specific table among multiple tables on a html page, the user had to enter a string unique to the table of interest. Now, we provide an easier way, using an index.

Improve: Support datetime values with year < 1900 (#892)

Previously, datetime values with year < 1900 were not detected as valid datetime objects. This has now been corrected and the Tool supports datetime values without any condition on the year.

Improve: Load the first table on a wikipage (#869)

Previously, when a user tries loading a table from a wikipage using the Data Import Tool, he might be surprised to find garbage instead of the table he expected. This behavior has now been fixed and the Tool now opens the first meaningful table on the wikipage.

New: Add new ZipFileURLHandler (#955)

With this release, the Data Import Tool can open .zip files from a URL.

Fix Prevent infer and convert of columns if DataFrame has >250 columns (#883)

The Data Import Tool now skips both inferring column types from and conversion of columns if the file contains more than 250 columns of data.

Improve: Track dirty state of exported code (#780)

Now, the Data Import Tool tracks whether or not the user exported the code earlier and if the code state has changed since the last save.

Improve: Raise Error when the Tool is passed a .tar.gz archive (#950)

While .gz files can be opened using the Data Import Tool, files of type .tar.gz cannot be opened.

Fix: Delete button in convert column pane now deletes the correct item (#942)

Previously, clicking on the Delete button in the Convert Column pane to remove the specified auto conversion deleted the wrong item. This has been corrected.

Improve: Concatenate Command History item text for Delete Columns and Delete Rows (#930, #888)

When a large number of rows/columns were deleted using the Delete rows/ Delete Columns command, the command history items was trying to display all of the column names and the text overflowed beyond the view. Now, the display concatenates the text containing row names.

New: From Examples menu option to load data (#231)

Apart from From File ..., From URL ..., From Clipboard, the user is also presented with a From Examples ... option to load the example files that we provide with the Tool.

New: Support for opening large files (#845, #14)

Previously, the Tool would prevent the user from opening any file larger than 70MB in size (or 10MB for compressed files). With this release, the Data Import Tool supports any file size but we give a warning about performance before proceeding.

Fix: Auto-detection of encoding switches to UTF-8 after detecting ASCII (#904)

To reduce errors (not eliminate) in reading, we upgrade ASCII detected encoding automatically to UTF-8 as the latter is the most common encoding on the internet.

Fix: Removed split_x attribute that created odd partition (#803)

Cells containing invalid data only took up two-thirds of the space inside the cell. This has now been corrected.

Fix: Join columns dialog now has an option to represent null values (#759)

Previously, joining columns that contain null values created a null-valued result.

Fix: On Linux, fix missing .py extension in exported Python script (#858)

The Tool now checks for the presence of the .py extension in the filename and adds it if it is absent.

Fix: Removed empty line in generated Python code for Delete Columns (#889)

The exported code for Delete Columns now follows the same pattern as the other commands.

Fix: On OS X, hotkey for “redo” uses CMD+Shift+Z, not CMD+Y (#289)

The Tool now uses the common OS X standard keyboard shortcut.

Fix: On Linux, Raw Data view now uses fixed-width font (#855)

The Tool was previously using a variable-width font when displaying the raw data.

Release notes for v1.0.2 (5 May 2016)

Improve: Boolean column detection is less eager with numeric data (#873)

The tool now attempts to verify if column numeric data is in the set of values {0, 1} before converting the column’s type during auto-detection.

Fix: Boolean columns now properly display data with missing values (#873)

Converting a column that contained null entries to boolean resulted with the column not displaying all non-null values as True or False.

Fix: Window activation on Linux was not working (#859)

Due to a difference in how windows are activated on Linux, the tool wasn’t displaying error dialogs in cases where the file could not be loaded.

Release notes for v1.0.1 (18 April 2016)

New: Online Documentation accessible from help menu (#761)

Previously, the documentation was accessible via the Canopy Documentation Browser or by navigating to the online documentaiton. We provide a link to the latter from within the Tool’s Help menu.

New: Add tool tip to the View on Close checkbox (#831)

Provides more context as to the purpose of this option, which is launch the viewer application after clicking “Use DataFrame” and returning to Canopy’s IPython prompt.