Class `strymread`¶

Class strymread¶

Class `strymread`¶

Import strymread as:

from strym import strymread

for reading and analysing CAN bus csv files.

class strym.strymread(csvfile, dbcfile='', **kwargs)¶

strymread reads the logged CAN data from the given CSV file. This class provides several utilities functions

Parameters

csvfile : str, pandas.DataFrame, default = None: The CSV file to be read. If pandas.DataFrame is supplied, then csvfile is set to None PandasDataFrame, if provided, must have columns [“Time”, “Message”, “MessageID”, “Bus”]
dbcfile : str, default = “”: The DBC file which will provide codec for decoding CAN messages
kwargs : variable list of argument in the dictionary format
bus : list | default = None: A list of integer correspond to Bus ID.
dbcfolder : str | default = None: Specifies a folder path where to look for appropriate dbc if dbcfile=’’ or dbcfile = None Appropriate dbc file can be inferred from <brand>_<model>_<year>.dbc If dbcfolder is None or empty string, then by default, strymread will look for dbc file in the dbc folder of the package where we ship sample dbc file to work with.
verbose : bool: Option for verbosity, prints some information when True
createdb : bool: If True, creates a sqlite3 database for raw CAN data if the database doesn’t exist
dbdir : str: Optional argument that specifies where sqlite3 database will be stored. The default location is ~/.strym/

dbcfile¶

The filepath of DBC file

Type: str, default = “”

csvfile¶

The filepath of CSV Data file, or, raw CAN Message DataFrame

Type: str | pandas.DataFrame

dataframe¶

Pandas dataframe that stores content of csvfile as dataframe

Type: pandas.Dataframe

dataframe_raw¶

Pandas original dataframe with all bus IDs. When bus= is passed to the constructor to filter out dataframe based on bus id, then original dataframe is save in dataframe_raw

Type: pandas.Dataframe

candb¶

CAN database fetched from DBC file

Type: cantools.db

burst¶

A boolean flag that checks if CAN data came in burst. If True, then CAN Data was captured in burst, else False. If CAN Data came in burst (as in say 64 messages at a time or so) then any further analysis might not be reliable. Always check that.

Type: bool

success¶

A boolean flag, if True, tells that reading of CSV file was successful.

Type: bool

bus¶

A list of integer correspond to Bus ID.

Type: list | default = None

dbcfolder¶

Specifies a folder path where to look for appropriate dbc if dbcfile=”” or dbcfile = None Appropriate dbc file can be inferred from <brand>_<model>_<year>.dbc If dbcfolder is None or empty string, then by default, strymread will look for dbc file in package’s dbcfolder where we ship sample dbc file to work with.

Type: str | default = None

dbdir¶

Location of database where sqlite3 database for CAN Dataframe will stored. Default location: ~/.strym/

Type: str

database¶

The name of the database corresponding to the model/make of the vehicle from which the CAN data was captured

Type: str

inferred_dbc¶

DBC file inferred from the name of the csvfile passed.

Type: str

Returns: strymread – Returns an object of type strymread upon successful reading or else return None

Example

>>> import strym
>>> from strym import strymread
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> dbcfile = 'newToyotacode.dbc'
>>> csvdata = '2020-03-20.csv'
>>> r0 = strymread(csvfile=csvdata, dbcfile=dbcfile)

acc_state(plot=False)¶

Get the cruise control state of the vehicle

Returns

pandas.DataFrame – Timeseries data with different levels corresponding to different cruise control state

”disabled”: 2, “hold”: 11, “hold_waiting_user_cmd”: 10, “enabled”: 6, “faulted”: 5;

accelx()¶

Returns: pandas.DataFrame – Timeseries data for acceleration in x-direction (i.e. longitudinal acceleration) from the CSV file

accely()¶

Returns: pandas.DataFrame – Timeseries data for acceleration in y-direction from the CSV file

accelz()¶

Returns: pandas.DataFrame – Timeseries data for acceleration in z-direction from the CSV file

count(plot=False)¶

A utility function to return and optionally plot the counts for each Message ID as bar graph

Returns: pandas.DataFrame – A pandas DataFrame with total message counts per Message ID and total count by Bus

Example

>>> import strym
>>> from strym import strymread
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> dbcfile = 'newToyotacode.dbc'
>>> csvdata = '2020-03-20.csv'
>>> r0 = strymread(csvfile=csvlist[0], dbcfile=dbcfile)
>>> r0.count()

static create_chunks(df, continuous_threshold=3.0, column_of_interest='Message', plot=False)¶

create_chunks computes separate chunks from a timeseries data.

Parameters

df : pandas.DataFrame: DataFrame that needs to divided into chunks
continuous_threshold : float, Default = 3.0: Continuous threshold above which we a change point detection is made, and signals start of a new chunk.
column_of_interest : str , Default = “Message”: Column of interest in DataFrame on which continuous_threshold should act to detect change point for creation of chunks
plot : bool, Default = False: If True, a scatter plot of Full timeseries of df overlaid with separate continuous chunks of df will be created.

Returns

list of pandas.DataFrame – Returns a list of DataFrame with same columns as df

static dateparse(ts)¶

Converts POSIX timestamp to human readable Datformat as per GMT

Parameters
ts : float: POSIX formatted timestamp

Returns

str – Human-readable timestamp as per GMT

dbconnect(db_location)¶

Creates dbconnection and returns db connection object

Parameters
db_location : str: sqlite db url

static denoise(df, method='MA', **kwargs)¶

Denoise the time-series dataframe df using method. By default moving-average is used.

Parameters

df : pandas.DataFrame

Original Dataframe to denoise

method : string, “MA”

Specifies method used for denoising

MA: moving average (default)

window_size : int

window size used in moving-average based denoising method

Default value: 10

Returns

pandas.DataFrame – Denoised Timeseries Data

static differentiate(df, method='S', **kwargs)¶

Differentiate the given timeseries datafrom using spline derivative

Parameters

df : pandas.DataFrame

Original Dataframe to be differentiated

method : str

Specifies method used for differentiation

S: spline, spline based differentiation

AE: autoencoder based denoising-followed by discrete differentiation

kwargs

variable keyword arguments

epochs : int

Number of training epochs in case of AE method

verbose : bool

If True, print logs

dense_time_points : bool

Used in AutoEncoder AE based differentiation. If True, then differnetiation is computer on 50 times denser time points.

Returns

pandas.DataFrame – Differentiated Timeseries Data

driving_characteristics()¶

driving_characteristics provides driving characteristics for the given driving data in the form of python dictionary.

Currently, the dictionary contains following metadata from the driving data

File name of CSV-formatedd CAN data file
Associated DBC file used
Start time of the trip in human-readable date format
End time of the trip in human-readable date format
Total duration of the trip
Total distance traveled in meters
Total distance traveled in kilometers
Total distance traveled in miles

Returns: dictionary – A python dictionary containing driving metadata

end_time()¶

end_time retrieves the the human-readable time when logging of the data was stopped.

Returns: str – Human-readable string-formatted time.

export2mat(force_rewrite=False)¶

Extract the known messages in MAT file for further downstream analysis

Parameters
force_rewrite : bool, default: False: If the mat file exists then force_rewrite=True regenerates the file and overwrite the existing one. If the mat file doesn’t exist, then this parameter will be ignored.

Returns

list – A list of strings that is file names of extracted data as .mat files

frequency()¶

Retrieves the frequency of each message in a pandas.Dataframe()

Returns: pandas.DataFrame – Returns the a data frame containing mean rate, std rate, max rate, min rate, rate iqr

get_ts(msg, signal, verbose=False)¶

get_ts returns Timeseries data by given msg_name and signal_name

Parameters

msg : string | int: A valid message that can be found in the given DBC file. Can be specified as message name or message ID
signal : string | int: A valid signal in string format corresponding to msg_name that can be found in the given DBC file. Can be specified as signal name or signal ID
verbose : bool, default = False: If True, print some information

static integrate(df, init=0.0, msg_axis='Message', integrator=<function cumtrapz>)¶

Integrate a timeseries data using scipy.integrate.cumtrapz

Parameters

df : pandas.Datframe

A two column Pandas data frame. First Column should have name ‘Time’ and Second Column Should be named ‘Message’

init : double

Initial conditions for integration. Default Value: 0.0.

msg_axis : str

The value of column in df the needs to be integrated with respect to the time.

Default is ‘Message`

integrator : function

Integrator method. By default, it is scipy.integrate.cumptrapz

Returns

df (pandas.Datframe) – A two column Pandas data frame with first column named ‘Time’ and second column named ‘Message’

lat_dist(track_id)¶

utility function to return timeseries lateral distance from radar traces of particular track id

Parameters
track_id : int | numpy array | list

Returns

pandas.DataFrame | list<pandas.DataFrame> – Timeseries lateral distance data from the CSV file

lead_distance()¶

Get the distance information of lead vehicle

Returns: pandas.DataFrame – Timeseeries data for lead distance from the CSV file

long_dist(track_id)¶

utility function to return timeseries longitudinal distance from radar traces of particular track id

Parameters
track_id : int | numpy array | list

Returns

pandas.DataFrame | list<pandas.DataFrame> – Timeseries longitduinal distance data from the CSV file

messageIDs()¶

Retreives list of all messages IDs available in the given CSV-formatted CAN data file.

Returns: list – A python list of all available message IDs in the given CSV-formatted CAN data file.

msg_subset(**kwargs)¶

Get the subset of message dataframe based on a condition.

Parameters

conditions : str | list<str>

Human readable condition for subsetting of message dataframe. Following conditions are available:

lead vehicle present: Extracts only those messages for which there was lead vehicle present.
cruise control on: Extracts only those messages for which cruise control is on.
operand op x: Extracts those messages for which operator op is operated on operand to fulfil x.

Available operators op are [>,<,==, !=, >=,<=]

Available operand operand are [speed, acceleration, lead_distance, steering_angle, steering_rate, yaw_rate ]. Details of operands are as follows:

speed: timeseries longitudinal speed of the vehicle
acceleration: timeseries longitudinal acceleration of the vehicle
lead_distance: timeseries distance of lead vehicle from the vehicle
steering_angle: timeseries steering angle of the vehicle
steering_rate: timeseries steering rate of the vehicle
yaw_rate: timeseries yaw rate of the vehicle

For example, “speed < 2.3”

time : (t0, t1)

t0 start elapsed-time t1 end elapsed-time

Extracts messages from time t0 to t1. t0 and t1 denotes elapsed-time and not the actual time.

ids : list

Get message dataframe containing messages given the list id

Returns

strymread – Returns strymread object with a modified dataframe attribute

plt_speed()¶: Utility function to plot speed data

static plt_ts(df, title='', msg_axis='Message', **kwargs)¶: A utility function to plot a timeseries

static ranalyze(df, title='Timeseries', savefig=False)¶

A utility function to analyse rate of a timeseries data

Parameters
title : str: A descriptive string for this particular analysis

rel_accel(track_id)¶

utility function to return timeseries relative acceleration of detected object from radar traces of particular track id

Parameters
track_id : int | numpy array | list

Returns

pandas.DataFrame | list<pandas.DataFrame> – Timeseries relative acceleration data from the CSV file

rel_velocity(track_id)¶

utility function to return timeseries lateral distance from radar traces of particular track id

Parameters
track_id : int | numpy array | list

Returns

pandas.DataFrame | list<pandas.DataFrame> – Timeseries lateral distance data from the CSV file

relative_leadervel()¶

Utility function to return timeseries relative velocity of the leader obtained through all RADAR traces

Returns: pandas.DataFrame – Timeseries relative velocity of the leader

static remove_duplicates(df)¶

Remove rows with duplicate time index from the timeseries data

Parameters
df : pandas.DataFrame: A pandas dataframe with at least one column Time or DateTimeIndex type Index

static resample(df, rate=50, categorical=False, **kwargs)¶

Resample the time-series dataframe df of varying, non-uniform sampling.

Resampling is done using cubic interpolation and spline method.

Parameters

df : pandas.DataFrame

Original Dataframe to be resampled

rate : double

Desired sampling rate in Hz

cont_method : str

Resampling method for continuous dataset. Available methods: “cubic”, “nearest”, “linear”, “nearest”, “exact”

cat_method : `str’

Resampling method for categorical dataset. Available method: “nearest”

categorical : bool

Boolean flag specifying if dataframe being passed represents a categorical data

time_col: str: Name of time column in df. Default value is “Time”

msg_col : str

Name of message column in df. Default value is “Message”

Returns

dfnew1 (pandas.DataFrame) – New resampled timseries DataFrame

static scatterts(ts, marker_size=10, stacked=True, taxis='elapsed', labels=None, return_fig=False, **kwargs)¶

Parameters

ts : list | `pd.DataFrame: A timeseries or a list of timeseries dataframe for creating a scatter plot
marker_size : int: Markersize for scatter plot
stacked : bool: If stacked is true, then only one plot will be created and all subplots will be overlaid.
taxis : ["elapsed", "clock"]: How the time axis should be displayed is defined by taxis: If taxis = “elapsed”, then time axis starts with 0. If taxis = “clock”, then time axis will show human readable datetime
labels : list: Labels to be used for legends
return_fig : bool

speed()¶

Returns: pandas.DataFrame – Timeseries speed data from the CSV file

Example

>>> import strym
>>> from strym import strymread
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> dbcfile = 'newToyotacode.dbc'
>>> csvdata = '2020-03-20.csv'
>>> r0 = strymread(csvfile=csvlist[0], dbcfile=dbcfile)
>>> speed = r0.speed()

speed_limit()¶

Returns: pandas.DataFrame – Timeseries data for acceleration in speed limit from the CSV file

speed_raw(bus)¶: Get Speed on All buss

static split_ts(df, by=30.0)¶

Split the timeseries data by by seconds

Parameters

df : pandas.DataFrame: dataframe to split
by : double: Specify the interval in seconds by which the timseries dataframe needs to split

Returns

pandas.DataFrame – dataframe with an extra column Second denoting splits specified by interval
pandas.DataFrame Array – An array of splitted pandas Dataframe by Seconds

start_time()¶

start_time retrieves the the human-readable time when logging of the data started

Returns: str – Human-readable string-formatted time.

state_space(rate=20, cont_method='nearest', cat_method='nearest', todb=False)¶: state_space generates a DatFrame with Time column and several other signals - uniformly sampled with common start and end-points for further downstream analysis

steer_angle()¶

Returns: pandas.DataFrame – Timeseries data for steering angle from the CSV file

steer_fraction()¶

Returns: pandas.DataFrame – Timeseries data for steering fraction from the CSV file

steer_rate()¶

Returns: pandas.DataFrame – Timeseries data for steering rate from the CSV file

steer_torque()¶

Returns: pandas.DataFrame – Timeseries data for steering torque from the CSV file

static temporalviolinplot(dataframe, by=30, title='Timeseries')¶: A temporal plot showing evolution of distribution as a function by time

static time_shift(df1, df2, time_col1='Time', time_col2='Time', msg_col1='Message', msg_col2='Message', **kwargs)¶

Compute the time shift specified by time_col2 of df2 with respect to time of df1 specified by time_col1. Once you get time shift you will add it to time axis of second dataframe.

Caveat: Units of time in time columns of both timeseries dataframe must be same.

Parameters

df1 : pandas.DataFrame: First timeseries datframe.
df2 : pandas.DataFrame: Second timeseries datframe.
time_col1 : str: Name of time column in df1. Default value is “Time”
time_col2 : str: Name of time column in df2. Default value is “Time”
msg_col1 : str: Name of message column in df1. Default value is “Message”
msg_col2 : str: Name of message column in df2. Default value is “Message”
correlation_threshold : double: Correlation coefficient threshold in [0,1] at which to stop looking for better time-shift and return the result.

Returns

double, double – Time shift in the unit of time as used in time columns of both timeseries dataframe.

Maximu correlation with given timeshift.

time_subset(**kwargs)¶

Get the time slices satsifying a particular condition for the dataframe.

Parameters

conditions : str | list<str>: Human readable condition for subsetting of message dataframe. Following conditions are available:
"lead vehicle present" : -

Returns

list – A list of tuples with start and end time of slices. E.g. [(t0, t1), (t2, t3), …] satisfying the given conditions

static timeindex(df, inplace=False)¶

Convert multi Dataframe of which on column must be ‘Time’ to pandas-compatible timeseries where timestamp is used to replace indices The convesion happens with no time zone information, i.e. all Clock time are in GMT

Parameters

df : pandas.DataFrame: A pandas dataframe with two columns with the column names “Time” and “Message”
inplace : bool: Modifies the actual dataframe, if true, otherwise doesn’t.

Returns

pandas.DataFrame – Pandas compatible timeseries with a single column having column name “Message” where indices are timestamp in hum an readable format.

static timeslices(ts)¶

timeslices return a set of timeslices in the form of [(t0, t1), (t2, t3), …] from ts where ts is a square pulse (or a timeseries) representing two levels 0 and 1 or True and False where True for when a certain condition was satisfied and False for when condition was not satisfied. For example: ts should be a pandas Series (index with timestamp) with values [True, True, True, …., False, False, …, True, True, True ] which represents square pulses. In that case, t0, t2, … are times for edge rising, and t1, t2, … for edge falling.

Parameters
ts : pandas.core.series.Series: A valid pandas time series with timestamp as index for the series

Returns

list – A list of tuples with start and end time of slices. E.g. [(t0, t1), (t2, t3), …]

topic2msgs(topic)¶

Return a dictionary value with the message ID and signal name for this particular DBC file, based on the passed in topic name. This is needed because various DBC files have different default names and signal structures depending on manufacturer. This redirection provides robustness to strym when the dbc files are not standardized—as they will never be so.

Parameters
topic : string: The string name of the topic in question. Only limited topics are supported by default

Returns

d (dictionary) – Dictionary with the key/value pairs for message and signal that should be passed to the corresponding strym function. To access the message signal, use d[‘message’] and d[‘signal’]

trajectory(x_init=0.0, y_init=0.0, data_rate=50.0)¶

A simple trajectory tracing function based on CAN data

Parameters

x_init : double: Initial X-coordinate of the vehicle
y_init : double: Initial Y-coordinate of the vehicle
data_rate : double: Rate at which message are sampled.

Returns

pandas.DataFrame – A pandas Dataframe with three columns: Time, X, Y, Vx, Vy

triplength(time=- 1)¶

triplength returns total distance travelled while logging CAN data.

Alternative, one can provide a second argument time to query how much distance was traveled in, say 50 seconds from start.

Parameters
time : double: Provide a valid elapsed time in seconds to query how much distance was traveled time seconds since the logging of data was started.

triptime()¶

triptime retrieves total duration of the recording for given CSV-formatted log file in seconds.

Returns: double – Duration in seconds.

static ts_sync(df1, df2, rate=50, msg_col1='Message', msg_col2='Message', **kwargs)¶

Time-synchronize and resample two time-series dataframes of varying, non-uniform sampling.

In a non-ideal condition, the first time of df1 timeseries dataframe will not be same as the first time of df2 dataframe.

In that case, we will calculate the value of message at the latest of two first times of df1 and df2 using linear interpolation method. Call the latest of two first time as latest_first_time.

Similarly, we will calculate the value of message at the earliest of two end times of df1 and df2 using linear interpolation method. Call the latest of two first time as earliest_last_time.

Linear interpolation formula is

\[X_i = \cfrac{X_A - X_B}{a-b}(i-b) + X_B\]

Next, we will truncate anything beyond [latest_first_time, earliest_last_time]

Once we have common first and last time in both timeseries dataframes, we will use cubic interpolation to do uniform sampling and interpolation of both time-series dataframe.

Parameters

df1 : pandas.DataFrame

First timeseries datframe. First column name must be named ‘Time’ and second column must be ‘Message’

df2 : pandas.DataFrame

Second timeseries datframe. First column name must be named ‘Time’ and second column must be ‘Message’

rate : double | str

double: New uniform sampling rate

str: Inherting sampling rate from. If rate=”first”, then df2 will be sampled by inheriting time points from df1. If rate=”second” , then df1 will be sampled by inheriting time points from df2

method : str

Resampling method for dataset. Available methods: “cubic”, “nearest”, “linear”, “nearest”, “exact”

Returns

dfnew1 (pandas.DataFrame) – First new resampled timseries DataFrame
dfnew2 (pandas.DataFrame) – Second new resampled timseries DataFrame

static violinplot(df, title='Violin Plot')¶: A violin plot to show the data distribution

wheel_speed_fl()¶

Returns: pandas.DataFrame – Timeseeries data for wheel speed of front left tire from the CSV file

wheel_speed_fr()¶

Returns: pandas.DataFrame – Timeseeries data for wheel speed of front right tire from the CSV file

wheel_speed_rl()¶

Returns: pandas.DataFrame – Timeseeries data for wheel speed of rear left tire from the CSV file

wheel_speed_rr()¶

Returns: pandas.DataFrame – Timeseeries data for wheel speed of rear right tire from the CSV file

yaw_rate()¶

Returns: pandas.DataFrame – Timeseries data for yaw rate from the CSV file

MedianRate