Data Sources¶
A data source is a wrapper object for the actual data that the plot will be handling. For the most part, a data source looks like an array of values, with an optional mask and metadata.
The data source interface provides methods for retrieving data, estimating a size of the dataset, indications about the dimensionality of the data, a place for metadata (such as selections and annotations), and events that fire when the data gets changed.
There are two primary reasons for a data source class:
It provides a way for different plotting objects to reference the same data.
It defines the interface to expose data from existing applications to Chaco.
In most cases, the standard ArrayDataSource
will suffice.
Interface¶
The basic interface for data sources is defined in
AbstractDataSource.
Here is a summary of the most important attributes and methods
(see the docstrings of this class for more details):
value_dimensionThe dimensionality of the data value at each point. It is defined as a
DimensionTrait, i.e., one of “scalar”, “point”, “image”, or “cube”. For example, aGridDataSourcerepresents data in a 2D array and thus itsvalue_dimensionis “scalar”.index_dimensionThe dimensionality of the data value at each point. It is defined as a
DimensionTrait, i.e., one of “scalar”, “point”, “image”, or “cube”. For example, aGridDataSourcerepresents data in a 2D array and thus itsindex_dimensionis “image”.metadataA dictionary that maps strings to arbitrary data. Usually, the mapped data is a set of indices, as in the case of selections and annotations. By default,
metadatacontains the keys “selections” (representing indices that are currently selected by some tool) and “annotations”, both initialized to an empty list.persist_dataIf True (default), the data that this data source refers to is serialized when the data source is.
get_data()Returns a data array containing the data referred to by the data source. Treat the returned array as read-only.
is_masked()Returns True if this data source’s data uses a mask. In this case, to retrieve the data, call
get_data_mask()instead ofget_data().get_data_mask()Returns the full, raw, source data array and a corresponding binary mask array. Treat both arrays as read-only.
get_size()Returns the size of the data.
get_bounds()Returns a tuple (min, max) of the bounding values for the data source. In the case of 2-D data, min and max are 2-D points that represent the bounding corners of a rectangle enclosing the data set. If data is the empty set, then the min and max vals are 0.0.
Events¶
AbstractDataSource defines three events
that can be used in Traits applications to react to changes in the data source:
data_changedFired when the data values change.
Note
The majority of concrete data sources do not fire this event when the data values change. Rather, the event is usually fired when new data or a new mask is assigned through setter methods (see notes below).
bounds_changedFired when the data bounds change.
metadata_changedFired when the content of
metadatachanges (both themetadatadictionary object or any of its items).
List of Chaco data sources¶
This is a list of all concrete implementations of data sources in Chaco:
ArrayDataSourceA data source representing a single, continuous array of numerical data. This is the most common data source for Chaco plots.
This subclass adds the following attributes and methods to the basic interface:
sort_orderThe sort order of the data, one of ‘ascending’, ‘descending’, or ‘none’. If the underlying data is sorted, and this attribute is set appropriately, Chaco is able to use shortcuts and optimizations in many places.
reverse_map(pt)Returns the index of pt in the data source (optimized if
sort_orderis set).
Note
This class does not listen to the array for changes in the data values. The
data_changedevent is fired only when the data or the mask are set with the methodsset_data(),set_mask(), orremove_mask().ImageDataRepresents a 2D grid of image data.
The underlying data array is 3D, where the third dimension is either 1 (one scalar value at each point of the grid), 3 (one RGB vector at each point), or 4 (one RGBa vector at each point). The depth of the array is defined in the attribute
value_depth.Access to the image data is controlled by three properties: The boolean attribute
transposeddefines whether the data array stored by this class is to be interpreted as transposed;raw_valuereturns the underlying data array as-is, ignoringtransposed;valuereturns the data array or its transposed depending on the value oftransposed.The correct usage pattern of these attributes is to give to the class contiguous image data, and assign
transposedif the two axis should be swapped. Functions that would benefit from working on contiguous data can then useraw_valuedirectly. (See the class docstrings for more details, and some caveats.)Noteworthy methods of this class are:
fromfile(filename)Factory method that creates an
ImageDatainstance from an image file. filename can be either a file path or a file object.get_width(),get_height()Return the width or the height of the image (takes the value of
transposedinto account).get_array_bounds()Return ((0, width), (0, height)).
Note
This class does not implement the methods related to masking, and it does not fire
bounds_changedevents.Note
This class does not listen to the array for changes in the data values. The
data_changedevent is fired only when the data are set with the methodset_data().GridDataSourceData source representing the coordinates of a 2D grid. It is used, for example, as a source for the index data in an
ImagePlot.It defines these attributes:
sort_orderSimilar to the
sort_orderattribute for theArrayDataSourceclass above, but this is a tuple with two elements, one per dimension.
Note
This class does not implement the methods related to masking, and it does not fire
bounds_changedevents.Note
This class does not listen to the array for changes in the data values. The
data_changedevent is fired only when the data is set with the methodset_data().MultiArrayDataSourceA data source representing a single, continuous array of multidimensional numerical data.
It is useful, for example, to define 2D vector data at each point of a scatter plot (as in
QuiverPlot), or to represent multiple values for each index (as inMultiLinePlot).As
ArrayDataSource, this data source defines asort_orderattribute for its index dimension.Warning
In
MultiArrayDataSource, theindex_dimensionandvalue_dimensionattributes are integers that define which dimension of the data array correspond to indices and which to values (default is 0 and 1, respectively). This is different from the same attributes in the interface, which are strings describing the dimensionality of index and value.Note
This class does not listen to the array for changes in the data values. The
data_changedevent is fired only when the data or the mask are set with the methodset_data().PointDataSourceA data source representing a set of (X,Y) points.
This is a subclass of
ArrayDataSource, and inherits its methods and attributes. The attributesort_indexdefines whether the data is sorted along the X’s or the Y’s (as specified insort_order).Note
This class does not listen to the array for changes in the data values. The
data_changedevent is fired only when the data or the mask are set with the methodset_data().FunctionDataSourceA subclass of
ArrayDataSourcethat sets the values of the underlying data array based on a function (defined in the callable attributefunc) evaluated on a 1D data range (defined indata_range).FunctionImageDataA subclass of
ImageDatathat sets the values of the underlying data array based on a 2D function (defined in the callable attributefunc) evaluated on a 2D data range (defined indata_range).