Post Processing Data

The following provide information on how the historical data obtained in the form of xarray.DataArray as described here can be post-processed.

You can find an example on post-processing at :examples/coin_history_post_process.py

The currently offered post-processing capabilities are:

Type Conversion

The original data obtained from the exchange may or may not be set with the correct type. An example of this is Binance which provides the open (the opening value of the ticker) as a string. The type converter stores the same value as a float.

The types are stored in the OHLCVFields

class crypto_history.data_container.data_container_post.TypeConvertedData

Type converts the data in the dataarray/dataset

get_ohlcv_field_type_dict() → Dict[KT, VT]

Gets the field types of the OHLCV Fields converted to the numpy/pandas format. This is done to be able to handle nan values in Int/String types

Returns (dict): Dictionary of the map from the ohlcv-field to the pd/np type

set_type_on_dataarray(dataarray: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray

Sets the type on the xr.DataArray according to the ohlcv field type :param dataarray: The DataArray on which the type has to be set :type dataarray: xr.DataArray

Returns:xr.DataArray which has the type set on it
set_type_on_dataset(dataset: xarray.core.dataset.Dataset) → xarray.core.dataset.Dataset

Sets the type on the xr.DataSet according to the ohlcv field type

Parameters:dataset (xr.DataSet) – The dataset on which the type has to be set
Returns:xr.DataSet which has the type set on it
type_mapping = {<class 'int'>: <class 'pandas.core.arrays.integer.Int64Dtype'>, <class 'str'>: <class 'pandas.core.arrays.string_.StringDtype'>, <class 'float'>: <class 'numpy.float64'>}

Incomplete Data Deletion

Incomplete data from the xarray.DataArray or xarray.DataSet may have to be removed to avoid unexpected behaviour and to save memory. It offers removal of incomplete data in two ways. If all the data corresponding to a particular base or reference asset is not available, it can remove that coin from the xarray item. If one of the values corresponding to a particular ticker is nan, it can make the entire ticker contents nan.

class crypto_history.data_container.data_container_post.HandleIncompleteData

Responsible for handling missing data: 1. If a certain coin has to be dropped if it is null 2. If a ticker has to be nulliifed as it has incomplete data

drop_xarray_coins_with_entire_na(data_item: Union[xarray.core.dataarray.DataArray, xarray.core.dataset.Dataset]) → Union[xarray.core.dataarray.DataArray, xarray.core.dataset.Dataset]

Drops the coins from the base/reference asset if all its corresponding values are nan :param data_item: which contains information of the coin histories :type data_item: xr.DataArray/xr.DataSet

Returns:xr.DataArray/xr.DataSet where the coins have been dropped
get_all_coord_combinations(data_item: Union[xarray.core.dataarray.DataArray, xarray.core.dataset.Dataset])
Gets all the various combinations to iterate
according to the coordinates to drop
Parameters:data_item (xr.DataArray/xr.DataSet) – data_item whose combinations need to be iterated over
Yields:A dict with various combinations
nullify_incomplete_data_from_dataarray(dataarray: xarray.core.dataarray.DataArray) → xarray.core.dataarray.DataArray

Nullifies incomplete data from the xr.DataArray :param dataarray: dataarray whose coordinates are to be nullified :type dataarray: xr.DataArray

Returns:xr.DataArray whose data has been nullified if incomplete
nullify_incomplete_data_from_dataset(dataset: xarray.core.dataset.Dataset) → xarray.core.dataset.Dataset

Nullifies the incomplete data of datasets

Notes

Using indexing to assign values to a
subset of dataset (e.g., ds[dict(space=0)] = 1) is not yet supported. http://xarray.pydata.org/en/stable/indexing.html
Parameters:dataset (xr.DataSet) – dataset whose data is to be nullified
Returns:xr.DataSet whose incomplete items are nullified