Data Sources¶
A data source is a wrapper object for the actual data that the plot will be handling. For the most part, a data source looks like an array of values, with an optional mask and metadata.
The data source interface provides methods for retrieving data, estimating a size of the dataset, indications about the dimensionality of the data, a place for metadata (such as selections and annotations), and events that fire when the data gets changed.
There are two primary reasons for a data source class:
It provides a way for different plotting objects to reference the same data.
It defines the interface to expose data from existing applications to Chaco.
In most cases, the standard ArrayDataSource
will suffice.
Interface¶
The basic interface for data sources is defined in
AbstractDataSource
.
Here is a summary of the most important attributes and methods
(see the docstrings of this class for more details):
value_dimension
The dimensionality of the data value at each point. It is defined as a
DimensionTrait
, i.e., one of “scalar”, “point”, “image”, or “cube”. For example, aGridDataSource
represents data in a 2D array and thus itsvalue_dimension
is “scalar”.index_dimension
The dimensionality of the data value at each point. It is defined as a
DimensionTrait
, i.e., one of “scalar”, “point”, “image”, or “cube”. For example, aGridDataSource
represents data in a 2D array and thus itsindex_dimension
is “image”.metadata
A dictionary that maps strings to arbitrary data. Usually, the mapped data is a set of indices, as in the case of selections and annotations. By default,
metadata
contains the keys “selections” (representing indices that are currently selected by some tool) and “annotations”, both initialized to an empty list.persist_data
If True (default), the data that this data source refers to is serialized when the data source is.
get_data()
Returns a data array containing the data referred to by the data source. Treat the returned array as read-only.
is_masked()
Returns True if this data source’s data uses a mask. In this case, to retrieve the data, call
get_data_mask()
instead ofget_data()
.get_data_mask()
Returns the full, raw, source data array and a corresponding binary mask array. Treat both arrays as read-only.
get_size()
Returns the size of the data.
get_bounds()
Returns a tuple (min, max) of the bounding values for the data source. In the case of 2-D data, min and max are 2-D points that represent the bounding corners of a rectangle enclosing the data set. If data is the empty set, then the min and max vals are 0.0.
Events¶
AbstractDataSource
defines three events
that can be used in Traits applications to react to changes in the data source:
data_changed
Fired when the data values change.
Note
The majority of concrete data sources do not fire this event when the data values change. Rather, the event is usually fired when new data or a new mask is assigned through setter methods (see notes below).
bounds_changed
Fired when the data bounds change.
metadata_changed
Fired when the content of
metadata
changes (both themetadata
dictionary object or any of its items).
List of Chaco data sources¶
This is a list of all concrete implementations of data sources in Chaco:
ArrayDataSource
A data source representing a single, continuous array of numerical data. This is the most common data source for Chaco plots.
This subclass adds the following attributes and methods to the basic interface:
sort_order
The sort order of the data, one of ‘ascending’, ‘descending’, or ‘none’. If the underlying data is sorted, and this attribute is set appropriately, Chaco is able to use shortcuts and optimizations in many places.
reverse_map(pt)
Returns the index of pt in the data source (optimized if
sort_order
is set).
Note
This class does not listen to the array for changes in the data values. The
data_changed
event is fired only when the data or the mask are set with the methodsset_data()
,set_mask()
, orremove_mask()
.ImageData
Represents a 2D grid of image data.
The underlying data array is 3D, where the third dimension is either 1 (one scalar value at each point of the grid), 3 (one RGB vector at each point), or 4 (one RGBa vector at each point). The depth of the array is defined in the attribute
value_depth
.Access to the image data is controlled by three properties: The boolean attribute
transposed
defines whether the data array stored by this class is to be interpreted as transposed;raw_value
returns the underlying data array as-is, ignoringtransposed
;value
returns the data array or its transposed depending on the value oftransposed
.The correct usage pattern of these attributes is to give to the class contiguous image data, and assign
transposed
if the two axis should be swapped. Functions that would benefit from working on contiguous data can then useraw_value
directly. (See the class docstrings for more details, and some caveats.)Noteworthy methods of this class are:
fromfile(filename)
Factory method that creates an
ImageData
instance from an image file. filename can be either a file path or a file object.get_width()
,get_height()
Return the width or the height of the image (takes the value of
transposed
into account).get_array_bounds()
Return ((0, width), (0, height)).
Note
This class does not implement the methods related to masking, and it does not fire
bounds_changed
events.Note
This class does not listen to the array for changes in the data values. The
data_changed
event is fired only when the data are set with the methodset_data()
.GridDataSource
Data source representing the coordinates of a 2D grid. It is used, for example, as a source for the index data in an
ImagePlot
.It defines these attributes:
sort_order
Similar to the
sort_order
attribute for theArrayDataSource
class above, but this is a tuple with two elements, one per dimension.
Note
This class does not implement the methods related to masking, and it does not fire
bounds_changed
events.Note
This class does not listen to the array for changes in the data values. The
data_changed
event is fired only when the data is set with the methodset_data()
.MultiArrayDataSource
A data source representing a single, continuous array of multidimensional numerical data.
It is useful, for example, to define 2D vector data at each point of a scatter plot (as in
QuiverPlot
), or to represent multiple values for each index (as inMultiLinePlot
).As
ArrayDataSource
, this data source defines asort_order
attribute for its index dimension.Warning
In
MultiArrayDataSource
, theindex_dimension
andvalue_dimension
attributes are integers that define which dimension of the data array correspond to indices and which to values (default is 0 and 1, respectively). This is different from the same attributes in the interface, which are strings describing the dimensionality of index and value.Note
This class does not listen to the array for changes in the data values. The
data_changed
event is fired only when the data or the mask are set with the methodset_data()
.PointDataSource
A data source representing a set of (X,Y) points.
This is a subclass of
ArrayDataSource
, and inherits its methods and attributes. The attributesort_index
defines whether the data is sorted along the X’s or the Y’s (as specified insort_order
).Note
This class does not listen to the array for changes in the data values. The
data_changed
event is fired only when the data or the mask are set with the methodset_data()
.FunctionDataSource
A subclass of
ArrayDataSource
that sets the values of the underlying data array based on a function (defined in the callable attributefunc
) evaluated on a 1D data range (defined indata_range
).FunctionImageData
A subclass of
ImageData
that sets the values of the underlying data array based on a 2D function (defined in the callable attributefunc
) evaluated on a 2D data range (defined indata_range
).