Data sources

A data source is a wrapper object for the actual data that the plot will be handling. For the most part, a data source looks like an array of values, with an optional mask and metadata.

The data source interface provides methods for retrieving data, estimating a size of the dataset, indications about the dimensionality of the data, a place for metadata (such as selections and annotations), and events that fire when the data gets changed.

There are two primary reasons for a data source class:

  • It provides a way for different plotting objects to reference the same data.

  • It defines the interface to expose data from existing applications to Chaco.

In most cases, the standard ArrayDataSource will suffice.

Interface

The basic interface for data sources is defined in AbstractDataSource. Here is a summary of the most important attributes and methods (see the docstrings of this class for more details):

value_dimension

The dimensionality of the data value at each point. It is defined as a DimensionTrait, i.e., one of “scalar”, “point”, “image”, or “cube”. For example, a GridDataSource represents data in a 2D array and thus its value_dimension is “scalar”.

index_dimension

The dimensionality of the data value at each point. It is defined as a DimensionTrait, i.e., one of “scalar”, “point”, “image”, or “cube”. For example, a GridDataSource represents data in a 2D array and thus its index_dimension is “image”.

metadata

A dictionary that maps strings to arbitrary data. Usually, the mapped data is a set of indices, as in the case of selections and annotations. By default, metadata contains the keys “selections” (representing indices that are currently selected by some tool) and “annotations”, both initialized to an empty list.

persist_data

If True (default), the data that this data source refers to is serialized when the data source is.

get_data()

Returns a data array containing the data referred to by the data source. Treat the returned array as read-only.

is_masked()

Returns True if this data source’s data uses a mask. In this case, to retrieve the data, call get_data_mask() instead of get_data().

get_data_mask()

Returns the full, raw, source data array and a corresponding binary mask array. Treat both arrays as read-only.

get_size()

Returns the size of the data.

get_bounds()

Returns a tuple (min, max) of the bounding values for the data source. In the case of 2-D data, min and max are 2-D points that represent the bounding corners of a rectangle enclosing the data set. If data is the empty set, then the min and max vals are 0.0.

Events

AbstractDataSource defines three events that can be used in Traits applications to react to changes in the data source:

data_changed

Fired when the data values change.

Note

The majority of concrete data sources do not fire this event when the data values change. Rather, the event is usually fired when new data or a new mask is assigned through setter methods (see notes below).

bounds_changed

Fired when the data bounds change.

metadata_changed

Fired when the content of metadata changes (both the metadata dictionary object or any of its items).

List of Chaco data sources

This is a list of all concrete implementations of data sources in Chaco:

ArrayDataSource

A data source representing a single, continuous array of numerical data. This is the most common data source for Chaco plots.

This subclass adds the following attributes and methods to the basic interface:

sort_order

The sort order of the data, one of ‘ascending’, ‘descending’, or ‘none’. If the underlying data is sorted, and this attribute is set appropriately, Chaco is able to use shortcuts and optimizations in many places.

reverse_map(pt)

Returns the index of pt in the data source (optimized if sort_order is set).

Note

This class does not listen to the array for changes in the data values. The data_changed event is fired only when the data or the mask are set with the methods set_data(), set_mask(), or remove_mask().

ImageData

Represents a 2D grid of image data.

The underlying data array is 3D, where the third dimension is either 1 (one scalar value at each point of the grid), 3 (one RGB vector at each point), or 4 (one RGBa vector at each point). The depth of the array is defined in the attribute value_depth.

Access to the image data is controlled by three properties: The boolean attribute transposed defines whether the data array stored by this class is to be interpreted as transposed; raw_value returns the underlying data array as-is, ignoring transposed; value returns the data array or its transposed depending on the value of transposed.

The correct usage pattern of these attributes is to give to the class contiguous image data, and assign transposed if the two axis should be swapped. Functions that would benefit from working on contiguous data can then use raw_value directly. (See the class docstrings for more details, and some caveats.)

Noteworthy methods of this class are:

fromfile(filename)

Factory method that creates an ImageData instance from an image file. filename can be either a file path or a file object.

get_width(), get_height()

Return the width or the height of the image (takes the value of transposed into account).

get_array_bounds()

Return ((0, width), (0, height)).

Note

This class does not implement the methods related to masking, and it does not fire bounds_changed events.

Note

This class does not listen to the array for changes in the data values. The data_changed event is fired only when the data are set with the method set_data().

GridDataSource

Data source representing the coordinates of a 2D grid. It is used, for example, as a source for the index data in an ImagePlot.

It defines these attributes:

sort_order

Similar to the sort_order attribute for the ArrayDataSource class above, but this is a tuple with two elements, one per dimension.

Note

This class does not implement the methods related to masking, and it does not fire bounds_changed events.

Note

This class does not listen to the array for changes in the data values. The data_changed event is fired only when the data is set with the method set_data().

MultiArrayDataSource

A data source representing a single, continuous array of multidimensional numerical data.

It is useful, for example, to define 2D vector data at each point of a scatter plot (as in QuiverPlot), or to represent multiple values for each index (as in MultiLinePlot).

As ArrayDataSource, this data source defines a sort_order attribute for its index dimension.

Warning

In MultiArrayDataSource, the index_dimension and value_dimension attributes are integers that define which dimension of the data array correspond to indices and which to values (default is 0 and 1, respectively). This is different from the same attributes in the interface, which are strings describing the dimensionality of index and value.

Note

This class does not listen to the array for changes in the data values. The data_changed event is fired only when the data or the mask are set with the method set_data().

PointDataSource

A data source representing a set of (X,Y) points.

This is a subclass of ArrayDataSource, and inherits its methods and attributes. The attribute sort_index defines whether the data is sorted along the X’s or the Y’s (as specified in sort_order).

Note

This class does not listen to the array for changes in the data values. The data_changed event is fired only when the data or the mask are set with the method set_data().

FunctionDataSource

A subclass of ArrayDataSource that sets the values of the underlying data array based on a function (defined in the callable attribute func) evaluated on a 1D data range (defined in data_range).

FunctionImageData

A subclass of ImageData that sets the values of the underlying data array based on a 2D function (defined in the callable attribute func) evaluated on a 2D data range (defined in data_range).