Data Import Tool Release Notes¶
This section lists major new features, fixes, and improvements. For more information about key known issues with using the tool, please see our Known Issues article in the Enthought Knowledge Base.
Release Summary
- Data Import Tool Release Notes
- Release notes for v1.1.6 (05 Feb 2019)
- Release notes for v1.1.5 (23 August 2018)
- Release notes for v1.1.4 (29 June 2018)
- Release notes for v1.1.3 (6 April 2018)
- Release notes for v1.1.2 (5 July 2017)
- Release notes for v1.1.1 (5 June 2017)
- Release notes for v1.1.0
- Improve : Add support for Pandas 0.20.1 (#1262)
- Improve : Add support for Python 3 (#1267)
- Improve : Add support for PyQt4 (#1253)
- Fix : Fix broken autosave preferences path on Windows (#1268)
- Fix : Add missing dependencies html5lib and beautifulsoup4 (#1250)
- Fix : Fix overflowing command description of JoinColumns command (#1075)
- Fix : Fix broken Save Code menu item (#1202)
- Release notes for v1.0.9 (18 January 2017)
- Release notes for v1.0.8
- Improve : Prevent setting a column with non-unique values as index (#929)
- Improve : Switch off column infer and convert on large files (#1010)
- Improve : Support multi-column sorting (#1151)
- Improve : Add sorting markers on all columns (#649)
- Fix : Make sort indicator stick to the column (#907)
- Fix : Follow convention for the sort icon (#1143)
- Improve : Improve performance of copying cell data to clipboard (PR #1172)
- Improve : Last updated template is used if multiple templates match (#1142)
- Improve : Explicitly convey the columns that have been renamed (#1115)
- Fix : Saving dataframe to csv file is broken when unicode data is involved (#1162)
- Fix : Raise an appropriate error when an Excel file is passed to the DIT (#1161)
- Fix : Fix problems with generating Template files on Windows (#665)
- Release notes for v1.0.7
- New : Save DataFrames as csv/xlsx file (#872)
- Improve : Don’t infer column types when a template is used (#1093)
- Improve : Templates are not loaded when import uses clipboard data (#1064)
- Improve : Use pattern matching to discover relevant Templates (#1065)
- Fix : Add missing kwargs in exported pandas code (#1117)
- Fix : Fix conversion of columns containing NaNs (#876, #1131)
- Fix : Reproduce column stripping using exported script (#1085)
- Fix : Detect tab delimiter correctly (#1135)
- Fix : Prevent Floats from being inferred as Ints (#1129)
- Fix : Correct for sampling problems while inferring column type (#1033)
- Release notes for v1.0.6 (28 October 2016)
- New: Support pandas 0.19.0 (#1046)
- New: DeleteEmptyColumns command (#828)
- New: Extending the ability ot delete rows/columns (#210)
- New: Demography of the UK use case (#1011)
- Improve: Python code is generated even for commands with error status (#1069)
- Fix: Python code when user reads data from Clipboard (#1066)
- Release notes for v1.0.5 (22 September 2016)
- New: Autoload previously applied commands to a file (#1022, #582)
- New: Autosave generated python scripts and export location to Canopy File Browser (#965)
- New: Explicitly set Missing Values attribute of the initial read-in (#198, #635)
- Fix: Support copying Unicode cell data (#953)
- Fix: Support splitting columns using unicode delimiters (#1045)
- Improve: DIT no longer raises an error when a file is passed with < 3 lines (#982)
- Release notes for v1.0.4 (19 August 2016)
- Fix: Loading the Data Import Tool from Clipboard on Linux now works (#977)
- New: Allow user to choose a specific table on a HTML page via an index (#999)
- Improve: Support datetime values with year < 1900 (#892)
- Improve: Load the first table on a wikipage (#869)
- New: Add new ZipFileURLHandler (#955)
- Fix Prevent infer and convert of columns if DataFrame has >250 columns (#883)
- Improve: Track dirty state of exported code (#780)
- Improve: Raise Error when the Tool is passed a .tar.gz archive (#950)
- Fix: Delete button in convert column pane now deletes the correct item (#942)
- Improve: Concatenate Command History item text for Delete Columns and Delete Rows (#930, #888)
- New:
From Examples
menu option to load data (#231) - New: Support for opening large files (#845, #14)
- New: Link to Known Issues page on Enthought’s Knowledge Base (#841)
- Fix: Auto-detection of encoding switches to UTF-8 after detecting ASCII (#904)
- Fix: Removed split_x attribute that created odd partition (#803)
- Fix: Join columns dialog now has an option to represent null values (#759)
- Fix: On Linux, fix missing
.py
extension in exported Python script (#858) - Fix: Removed empty line in generated Python code for Delete Columns (#889)
- Fix: On OS X, hotkey for “redo” uses
CMD+Shift+Z
, notCMD+Y
(#289) - Fix: On Linux, Raw Data view now uses fixed-width font (#855)
- Release notes for v1.0.2 (5 May 2016)
- Release notes for v1.0.1 (18 April 2016)
Release notes for v1.1.6 (05 Feb 2019)¶
Fix : Remove use of old traits adapter machinery (#1337)¶
With these changes, the Data Import Tool supports the latest release of traits.
Release notes for v1.1.5 (23 August 2018)¶
Fix : Intermittent CI error (#1309)¶
This release of the Data Import Tool includes a fix for a possible exception when creating a QReceiverView.
Release notes for v1.1.4 (29 June 2018)¶
Improve : Add support for Pandas 0.23.1 (#1316)¶
With this release of the Data Import Tool, we add support for the recently released v0.23.1 of the Pandas library.
Release notes for v1.1.3 (6 April 2018)¶
Change : Window pop-up when license is not found (#1302)¶
Update wording in pop-up window when license is not found to reflect changes with Canopy product licensing.
Release notes for v1.1.2 (5 July 2017)¶
Fix : Fix Python 3 support for Catalyst (#1302)¶
We discovered an issue with the Python 3 support in the v1.1.1 release of the Catalyst. Trying to use the Import Data menu items to access Pandas DataFrames from the IPython console raised an error. This issue has now been fixed.
Change : Check the row selection before showing actions available (#562)¶
Previously, the right-click menu on a selection of rows showed actions which were invalid. This has been fixed with the v1.1.2 release.
Release notes for v1.1.1 (5 June 2017)¶
Change : Drop support for Pandas below 0.19.2 (PR #1282)¶
With v1.1.1 of the Data Import Tool, only Pandas v0.19.2 and v0.20.1 are supported.
Fix : Fix seg fault (#1279, #1280)¶
We discovered an issue on Python 3 where the Data Import Tool could crash with a Segmentation Fault. With this release, we fixed this issue.
Release notes for v1.1.0¶
Improve : Add support for Pandas 0.20.1 (#1262)¶
With this release of the Data Import Tool, we add support for the recently released v0.20.1 of the Pandas library.
Improve : Add support for Python 3 (#1267)¶
The Data Import Tool is now fully compatible with Python 2 and 3.
Improve : Add support for PyQt4 (#1253)¶
Previously, the Data Import Tool only worked with a PySide backend. With this release, the Tool works both with PySide and PyQt backend.
Fix : Fix broken autosave preferences path on Windows (#1268)¶
With every successfull import of a DataFrame, the Tool autosaves the relevant generated Python/Pandas code. The code gets autosaved into the data_import_tool/autosaved_scripts directory in your home folder. This path was broken on Windows, which we fixed with this release.
Fix : Add missing dependencies html5lib and beautifulsoup4 (#1250)¶
Previously, the html5lib and beautifulsoup4 libraries weren’t explicitly listed as runtime dependencies of the Data Import Tool, which we fixed with this release.
Fix : Fix overflowing command description of JoinColumns command (#1075)¶
The description of the JoinColumns command overflowed the enclosing textbox, making it hard to read the command description. This has been corrected in this release.
Release notes for v1.0.9 (18 January 2017)¶
New : Support pandas 0.19.2 (PR #1191)¶
With this release, the Data Import Tool supports the recently released v0.19.2 of the pandas library. Overall, the Tool has been tested to work with v0.16.2, v0.17.1, v0.18.0 and v0.19.2 of pandas, preventing you from worrying about changes to the pandas API.
Fix : Remove sort indicator if value in a sorted column is changed (#1163)¶
With this release, the sort indicator on a sorted column is removed if the value of any cell in that column is changed.
Fix : Fix interaction between promoting a row to header and sorting a column (#1165)¶
Previously, sorting a column and promoting a row to header might cause the sort indicator to move to an unexpected and wrong column. This bug has been fixed with this release.
Fix : Fix behavior of Highlight NaNs checkbox (#727)¶
Previously, on Windows and Linux, checking the Highlight NaNs checkbox didn’t highlight the NaN values immediately. This has been fixed in this release, and checking the checkbox immediately highlights the NaN values in the DataFrame.
Release notes for v1.0.8¶
Improve : Prevent setting a column with non-unique values as index (#929)¶
The Tool now prevents users from setting the data frame index to a column which
has non-unique values. This is because it can lead to unintended consequences
when using commands (such as DeleteRows
) which use the index value to
reference rows as they would face ambiguity in referring to the correct row.
Improve : Switch off column infer and convert on large files (#1010)¶
The Tool automatically infers the type of a column and converts the column. This feature can be time consuming on very large files and has been switched off for the moment.
Improve : Support multi-column sorting (#1151)¶
With this release, sorting using the Tool is stable i.e. sorting preserves any existing order in the DataFrame.
Improve : Add sorting markers on all columns (#649)¶
With this release, sort indicators are displayed on all columns, making it obvious that columns can be sorted using the Tool.
Fix : Make sort indicator stick to the column (#907)¶
Previously, the sort indicator didn’t stick to the sorted column if columns were added or removed from the left of the sorted column. This has now been corrected.
Fix : Follow convention for the sort icon (#1143)¶
Previously, the sort icon convention used by the Tool was the opposite of what is used in general. This has now been corrected.
Improve : Improve performance of copying cell data to clipboard (PR #1172)¶
Copying data from the cells of a data frame to the clipboard has been sped up sigfinicantly with this release.
Improve : Last updated template is used if multiple templates match (#1142)¶
If two or more template files are found by the Tool to be relevant, the Tool chooses the template file that was most recently created/updated.
Improve : Explicitly convey the columns that have been renamed (#1115)¶
Previously, the Tool renamed column names that were duplicate but it did so without the user of the change. With this release, we explicitly convey what columns were renamed.
Fix : Saving dataframe to csv file is broken when unicode data is involved (#1162)¶
Previously, saving a DataFrame that contains unicode characters raised an error. This has now been fixed.
Fix : Raise an appropriate error when an Excel file is passed to the DIT (#1161)¶
Previously, the Tool reported a unicode error when it was passed an Excel file. The Tool currently does not support loading Excel files and it now raises an custom error explicitly mentioning the same.
Fix : Fix problems with generating Template files on Windows (#665)¶
Previously, a corrupt template file was created when the command history
contained a DeleteRows
command. This has now been fixed.
Release notes for v1.0.7¶
New : Save DataFrames as csv/xlsx file (#872)¶
With this release, the Data Import Tool allows you to save the DataFrame as a csv file or as an xlsx file.
Improve : Don’t infer column types when a template is used (#1093)¶
With this release, we have modified the Tool to prevent inferring the column types of the DataFrame if a Template was used. Instead, we depend on any column conversion commands present in the Template file to do the job.
Improve : Templates are not loaded when import uses clipboard data (#1064)¶
With this release, we don’t look for Templates if Clipboard data is being loaded into the Tool.
Improve : Use pattern matching to discover relevant Templates (#1065)¶
In the previous version of the Tool, a Template was loaded only if it’s name matched that of the DataFrame. With this update, if a Template’s name is similar to the DataFrame’s name, then it is loaded. This way, a single template file can be used to clean and transform numerous similarly-named files.
Fix : Add missing kwargs in exported pandas code (#1117)¶
Previously, the exported script was missing a few keyword arguments. These have now been added while calling the respective pandas code.
Fix : Fix conversion of columns containing NaNs (#876, #1131)¶
Previously, converting a Float column containing NaNs to an Integer column resulted in no change. Similarly, converting a Int column containing NaNs to a Boolean column also gave erroneous results. This was happening because of the way column conversion is handled by pandas.
With this release, float values are converted to ints and int values are converted to bools irrespective of whether or not the columns contain NaNs.
Fix : Reproduce column stripping using exported script (#1085)¶
With this release, the exported script generates DataFrames which are stripped of whitespaces.
Fix : Detect tab delimiter correctly (#1135)¶
A Tab delimiter is now automatically detected by the Tool with this update.
Fix : Prevent Floats from being inferred as Ints (#1129)¶
Previously, a column containing Float values was being inferred as an Int column. This behavior has been fixed with this release.
Fix : Correct for sampling problems while inferring column type (#1033)¶
With this release, we take the underlying Column’s dtype while inferring column type.
Release notes for v1.0.6 (28 October 2016)¶
New: Support pandas 0.19.0 (#1046)¶
With this release, the Data Import Tool supports the recently released v0.19.0 of the pandas library. Overall, the Tool now works with v0.16.2, v0.17.1, v0.18.0 and v0.19.0 of pandas, preventing you from worrying about changes to the pandas API.
New: DeleteEmptyColumns command (#828)¶
We added a new command DeleteEmptyColumns
to make it easier to remove
columns containing NaNs
in your data.
New: Extending the ability ot delete rows/columns (#210)¶
Using pandas
, the user can remove rows/columns in two ways - only if all
values in the row/column are NaNs
or if any value in the row/column is a
NaN
. With this release, we’ve incorporated this ability into our
DeleteEmptyRows
and DeleteEmptyColumns
commands.
New: Demography of the UK use case (#1011)¶
We have added a new use case outlining how the Data Import Tool can be used to extract and load tables from any wikipedia page, choosing from among multiple tables on a page and performing operations on the DataFrame.
Improve: Python code is generated even for commands with error status (#1069)¶
Previously, the Tool didn’t generate Python Code if any of the commands executed were erreneous. We have changed this behavior with this release.
Fix: Python code when user reads data from Clipboard (#1066)¶
Previously, the code generated when the user loads data from the Clipboard was
missing a few additional arguments to the relevant pandas
commands. This
has been updated with this release and the generated Python Code helps the user
reproduce his actions better.
Release notes for v1.0.5 (22 September 2016)¶
New: Autoload previously applied commands to a file (#1022, #582)¶
The Data Import Tool now saves the set of operations a user performs on a file into a handy template file. When the user loads the same file into the Tool a second time, the Tool automatically loads the template and applies the set of commands. The user can disable the auto-load by manually disabling the individual commands.
New: Autosave generated python scripts and export location to Canopy File Browser (#965)¶
The Data Import Tool now autosaves the generated python code into your home
directory, ~
on OSX and Linux and \Users\User\
on Windows. And, Canopy
gets informed of the autosave location with every successful import, which the
user can browse through in the Canopy File Browser.
New: Explicitly set Missing Values attribute of the initial read-in (#198, #635)¶
Previously, the Data Import Tool implicitly assumed that the string NA
represented None
. This has now been made explicit, and all values that the
Data Import Tool assumes as None
are displayed in the Missing Values
field of the initial read-in command. For a full list of all values that the
Tool assumes to mean None
, refer to Missing Values.
Fix: Support copying Unicode cell data (#953)¶
Previosly, the Data Import Tool didn’t support copying cells containing Unicode data. This has been fixed with this release.
Fix: Support splitting columns using unicode delimiters (#1045)¶
Previously, the Data Import Tool didn’t support splitting a column using a unicode delimiter. This has been fixed with this release.
Improve: DIT no longer raises an error when a file is passed with < 3 lines (#982)¶
Previously, the Data Import Tool raised an error when passed a file with less than 3 lines in it. This restriction has now been lifted.
Release notes for v1.0.4 (19 August 2016)¶
Fix: Loading the Data Import Tool from Clipboard on Linux now works (#977)¶
Previously, trying to load data from the clipboard into the Data Import Tool
didn’t work as expected. This has now been fixed in v1.0.4
New: Allow user to choose a specific table on a HTML page via an index (#999)¶
Previously, if the user was interested in choosing a specific table among multiple tables on a html page, the user had to enter a string unique to the table of interest. Now, we provide an easier way, using an index.
Improve: Support datetime values with year < 1900 (#892)¶
Previously, datetime values with year < 1900 were not detected as valid datetime objects. This has now been corrected and the Tool supports datetime values without any condition on the year.
Improve: Load the first table on a wikipage (#869)¶
Previously, when a user tries loading a table from a wikipage using the Data Import Tool, he might be surprised to find garbage instead of the table he expected. This behavior has now been fixed and the Tool now opens the first meaningful table on the wikipage.
New: Add new ZipFileURLHandler (#955)¶
With this release, the Data Import Tool can open .zip
files from a URL.
Fix Prevent infer and convert of columns if DataFrame has >250 columns (#883)¶
The Data Import Tool now skips both inferring column types from and conversion of columns if the file contains more than 250 columns of data.
Improve: Track dirty state of exported code (#780)¶
Now, the Data Import Tool tracks whether or not the user exported the code earlier and if the code state has changed since the last save.
Improve: Raise Error when the Tool is passed a .tar.gz archive (#950)¶
While .gz
files can be opened using the Data Import Tool, files of type
.tar.gz
cannot be opened.
Fix: Delete button in convert column pane now deletes the correct item (#942)¶
Previously, clicking on the Delete
button in the Convert Column
pane
to remove the specified auto conversion deleted the wrong item. This has been
corrected.
Improve: Concatenate Command History item text for Delete Columns and Delete Rows (#930, #888)¶
When a large number of rows/columns were deleted using the Delete rows
/
Delete Columns
command, the command history items was trying to display all
of the column names and the text overflowed beyond the view. Now, the display
concatenates the text containing row names.
New: Support for opening large files (#845, #14)¶
Previously, the Tool would prevent the user from opening any file larger than 70MB in size (or 10MB for compressed files). With this release, the Data Import Tool supports any file size but we give a warning about performance before proceeding.
New: Link to Known Issues page on Enthought’s Knowledge Base (#841)¶
For information on known issues with the Data Import Tool, a link to the relevant Enthought Knowledge Base article has been added at the top of the release notes.
Fix: Auto-detection of encoding switches to UTF-8 after detecting ASCII (#904)¶
To reduce errors (not eliminate) in reading, we upgrade ASCII detected encoding automatically to UTF-8 as the latter is the most common encoding on the internet.
Fix: Removed split_x attribute that created odd partition (#803)¶
Cells containing invalid data only took up two-thirds of the space inside the cell. This has now been corrected.
Fix: Join columns dialog now has an option to represent null values (#759)¶
Previously, joining columns that contain null values created a null-valued result.
Fix: On Linux, fix missing .py
extension in exported Python script (#858)¶
The Tool now checks for the presence of the .py
extension in the filename
and adds it if it is absent.
Fix: Removed empty line in generated Python code for Delete Columns (#889)¶
The exported code for Delete Columns now follows the same pattern as the other commands.
Fix: On OS X, hotkey for “redo” uses CMD+Shift+Z
, not CMD+Y
(#289)¶
The Tool now uses the common OS X standard keyboard shortcut.
Fix: On Linux, Raw Data view now uses fixed-width font (#855)¶
The Tool was previously using a variable-width font when displaying the raw data.
Release notes for v1.0.2 (5 May 2016)¶
Improve: Boolean column detection is less eager with numeric data (#873)¶
The tool now attempts to verify if column numeric data is in the set of values
{0, 1}
before converting the column’s type during auto-detection.
Fix: Boolean columns now properly display data with missing values (#873)¶
Converting a column that contained null entries to boolean resulted with the
column not displaying all non-null values as True
or False
.
Fix: Window activation on Linux was not working (#859)¶
Due to a difference in how windows are activated on Linux, the tool wasn’t displaying error dialogs in cases where the file could not be loaded.
Release notes for v1.0.1 (18 April 2016)¶
New: Add tool tip to the View on Close checkbox (#831)¶
Provides more context as to the purpose of this option, which is launch the viewer application after clicking “Use DataFrame” and returning to Canopy’s IPython prompt.