Class strymread

Import strymread as:

from strym import strymread

for reading and analysing CAN bus csv files.

class strym.strymread(csvfile, dbcfile='', **kwargs)

strymread reads the logged CAN data from the given CSV file. This class provides several utilities functions

Parameters
csvfile : str, pandas.DataFrame, default = None

The CSV file to be read. If pandas.DataFrame is supplied, then csvfile is set to None PandasDataFrame, if provided, must have columns [“Time”, “Message”, “MessageID”, “Bus”]

dbcfile : str, default = “”

The DBC file which will provide codec for decoding CAN messages

kwargs : variable list of argument in the dictionary format

bus : list | default = None

A list of integer correspond to Bus ID.

dbcfolder : str | default = None

Specifies a folder path where to look for appropriate dbc if dbcfile=’’ or dbcfile = None Appropriate dbc file can be inferred from <brand>_<model>_<year>.dbc If dbcfolder is None or empty string, then by default, strymread will look for dbc file in the dbc folder of the package where we ship sample dbc file to work with.

verbose : bool

Option for verbosity, prints some information when True

createdb : bool

If True, creates a sqlite3 database for raw CAN data if the database doesn’t exist

dbdir : str

Optional argument that specifies where sqlite3 database will be stored. The default location is ~/.strym/

dbcfile

The filepath of DBC file

Type

str, default = “”

csvfile

The filepath of CSV Data file, or, raw CAN Message DataFrame

Type

str | pandas.DataFrame

dataframe

Pandas dataframe that stores content of csvfile as dataframe

Type

pandas.Dataframe

dataframe_raw

Pandas original dataframe with all bus IDs. When bus= is passed to the constructor to filter out dataframe based on bus id, then original dataframe is save in dataframe_raw

Type

pandas.Dataframe

candb

CAN database fetched from DBC file

Type

cantools.db

burst

A boolean flag that checks if CAN data came in burst. If True, then CAN Data was captured in burst, else False. If CAN Data came in burst (as in say 64 messages at a time or so) then any further analysis might not be reliable. Always check that.

Type

bool

success

A boolean flag, if True, tells that reading of CSV file was successful.

Type

bool

bus

A list of integer correspond to Bus ID.

Type

list | default = None

dbcfolder

Specifies a folder path where to look for appropriate dbc if dbcfile=”” or dbcfile = None Appropriate dbc file can be inferred from <brand>_<model>_<year>.dbc If dbcfolder is None or empty string, then by default, strymread will look for dbc file in package’s dbcfolder where we ship sample dbc file to work with.

Type

str | default = None

dbdir

Location of database where sqlite3 database for CAN Dataframe will stored. Default location: ~/.strym/

Type

str

database

The name of the database corresponding to the model/make of the vehicle from which the CAN data was captured

Type

str

inferred_dbc

DBC file inferred from the name of the csvfile passed.

Type

str

Returns

strymread – Returns an object of type strymread upon successful reading or else return None

Example

>>> import strym
>>> from strym import strymread
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> dbcfile = 'newToyotacode.dbc'
>>> csvdata = '2020-03-20.csv'
>>> r0 = strymread(csvfile=csvdata, dbcfile=dbcfile)
acc_state(plot=False)

Get the cruise control state of the vehicle

Returns

pandas.DataFrame – Timeseries data with different levels corresponding to different cruise control state

”disabled”: 2, “hold”: 11, “hold_waiting_user_cmd”: 10, “enabled”: 6, “faulted”: 5;

accelx()
Returns

pandas.DataFrame – Timeseries data for acceleration in x-direction (i.e. longitudinal acceleration) from the CSV file

accely()
Returns

pandas.DataFrame – Timeseries data for acceleration in y-direction from the CSV file

accelz()
Returns

pandas.DataFrame – Timeseries data for acceleration in z-direction from the CSV file

count(plot=False)

A utility function to return and optionally plot the counts for each Message ID as bar graph

Returns

pandas.DataFrame – A pandas DataFrame with total message counts per Message ID and total count by Bus

Example

>>> import strym
>>> from strym import strymread
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> dbcfile = 'newToyotacode.dbc'
>>> csvdata = '2020-03-20.csv'
>>> r0 = strymread(csvfile=csvlist[0], dbcfile=dbcfile)
>>> r0.count()
static create_chunks(df, continuous_threshold=3.0, column_of_interest='Message', plot=False)

create_chunks computes separate chunks from a timeseries data.

Parameters
df : pandas.DataFrame

DataFrame that needs to divided into chunks

continuous_threshold : float, Default = 3.0

Continuous threshold above which we a change point detection is made, and signals start of a new chunk.

column_of_interest : str , Default = “Message”

Column of interest in DataFrame on which continuous_threshold should act to detect change point for creation of chunks

plot : bool, Default = False

If True, a scatter plot of Full timeseries of df overlaid with separate continuous chunks of df will be created.

Returns

list of pandas.DataFrame – Returns a list of DataFrame with same columns as df

static dateparse(ts)

Converts POSIX timestamp to human readable Datformat as per GMT

Parameters
ts : float

POSIX formatted timestamp

Returns

str – Human-readable timestamp as per GMT

dbconnect(db_location)

Creates dbconnection and returns db connection object

Parameters
db_location : str

sqlite db url

static denoise(df, method='MA', **kwargs)

Denoise the time-series dataframe df using method. By default moving-average is used.

Parameters
df : pandas.DataFrame

Original Dataframe to denoise

method : string, “MA”

Specifies method used for denoising

MA: moving average (default)

window_size : int

window size used in moving-average based denoising method

Default value: 10

Returns

pandas.DataFrame – Denoised Timeseries Data

static differentiate(df, method='S', **kwargs)

Differentiate the given timeseries datafrom using spline derivative

Parameters
df : pandas.DataFrame

Original Dataframe to be differentiated

method : str

Specifies method used for differentiation

S: spline, spline based differentiation

AE: autoencoder based denoising-followed by discrete differentiation

kwargs

variable keyword arguments

epochs : int

Number of training epochs in case of AE method

verbose : bool

If True, print logs

dense_time_points : bool

Used in AutoEncoder AE based differentiation. If True, then differnetiation is computer on 50 times denser time points.

Returns

pandas.DataFrame – Differentiated Timeseries Data

driving_characteristics()

driving_characteristics provides driving characteristics for the given driving data in the form of python dictionary.

Currently, the dictionary contains following metadata from the driving data

  • File name of CSV-formatedd CAN data file

  • Associated DBC file used

  • Start time of the trip in human-readable date format

  • End time of the trip in human-readable date format

  • Total duration of the trip

  • Total distance traveled in meters

  • Total distance traveled in kilometers

  • Total distance traveled in miles

Returns

dictionary – A python dictionary containing driving metadata

end_time()

end_time retrieves the the human-readable time when logging of the data was stopped.

Returns

str – Human-readable string-formatted time.

export2mat(force_rewrite=False)

Extract the known messages in MAT file for further downstream analysis

Parameters
force_rewrite : bool, default: False

If the mat file exists then force_rewrite=True regenerates the file and overwrite the existing one. If the mat file doesn’t exist, then this parameter will be ignored.

Returns

list – A list of strings that is file names of extracted data as .mat files

frequency()

Retrieves the frequency of each message in a pandas.Dataframe()

MessageID

MeanRate

MedianRate

RateStd

MaxRate

MinRate

RateIQR

Returns

pandas.DataFrame – Returns the a data frame containing mean rate, std rate, max rate, min rate, rate iqr

get_ts(msg, signal, verbose=False)

get_ts returns Timeseries data by given msg_name and signal_name

Parameters
msg : string | int

A valid message that can be found in the given DBC file. Can be specified as message name or message ID

signal : string | int

A valid signal in string format corresponding to msg_name that can be found in the given DBC file. Can be specified as signal name or signal ID

verbose : bool, default = False

If True, print some information

static integrate(df, init=0.0, msg_axis='Message', integrator=<function cumtrapz>)

Integrate a timeseries data using scipy.integrate.cumtrapz

Parameters
df : pandas.Datframe

A two column Pandas data frame. First Column should have name ‘Time’ and Second Column Should be named ‘Message’

init : double

Initial conditions for integration. Default Value: 0.0.

msg_axis : str

The value of column in df the needs to be integrated with respect to the time.

Default is ‘Message`

integrator : function

Integrator method. By default, it is scipy.integrate.cumptrapz

Returns

df (pandas.Datframe) – A two column Pandas data frame with first column named ‘Time’ and second column named ‘Message’

lat_dist(track_id)

utility function to return timeseries lateral distance from radar traces of particular track id

Parameters
track_id : int | numpy array | list

Returns

pandas.DataFrame | list<pandas.DataFrame> – Timeseries lateral distance data from the CSV file

lead_distance()

Get the distance information of lead vehicle

Returns

pandas.DataFrame – Timeseeries data for lead distance from the CSV file

long_dist(track_id)

utility function to return timeseries longitudinal distance from radar traces of particular track id

Parameters
track_id : int | numpy array | list

Returns

pandas.DataFrame | list<pandas.DataFrame> – Timeseries longitduinal distance data from the CSV file

messageIDs()

Retreives list of all messages IDs available in the given CSV-formatted CAN data file.

Returns

list – A python list of all available message IDs in the given CSV-formatted CAN data file.

msg_subset(**kwargs)

Get the subset of message dataframe based on a condition.

Parameters
conditions : str | list<str>

Human readable condition for subsetting of message dataframe. Following conditions are available:

  • lead vehicle present: Extracts only those messages for which there was lead vehicle present.

  • cruise control on: Extracts only those messages for which cruise control is on.

  • operand op x: Extracts those messages for which operator op is operated on operand to fulfil x.

Available operators op are [>,<,==, !=, >=,<=]

Available operand operand are [speed, acceleration, lead_distance, steering_angle, steering_rate, yaw_rate ]. Details of operands are as follows:

  • speed: timeseries longitudinal speed of the vehicle

  • acceleration: timeseries longitudinal acceleration of the vehicle

  • lead_distance: timeseries distance of lead vehicle from the vehicle

  • steering_angle: timeseries steering angle of the vehicle

  • steering_rate: timeseries steering rate of the vehicle

  • yaw_rate: timeseries yaw rate of the vehicle

For example, “speed < 2.3”

time : (t0, t1)

t0 start elapsed-time t1 end elapsed-time

Extracts messages from time t0 to t1. t0 and t1 denotes elapsed-time and not the actual time.

ids : list

Get message dataframe containing messages given the list id

Returns

strymread – Returns strymread object with a modified dataframe attribute

plt_speed()

Utility function to plot speed data

static plt_ts(df, title='', msg_axis='Message', **kwargs)

A utility function to plot a timeseries

static ranalyze(df, title='Timeseries', savefig=False)

A utility function to analyse rate of a timeseries data

Parameters
title : str

A descriptive string for this particular analysis

rel_accel(track_id)

utility function to return timeseries relative acceleration of detected object from radar traces of particular track id

Parameters
track_id : int | numpy array | list

Returns

pandas.DataFrame | list<pandas.DataFrame> – Timeseries relative acceleration data from the CSV file

rel_velocity(track_id)

utility function to return timeseries lateral distance from radar traces of particular track id

Parameters
track_id : int | numpy array | list

Returns

pandas.DataFrame | list<pandas.DataFrame> – Timeseries lateral distance data from the CSV file

relative_leadervel()

Utility function to return timeseries relative velocity of the leader obtained through all RADAR traces

Returns

pandas.DataFrame – Timeseries relative velocity of the leader

static remove_duplicates(df)

Remove rows with duplicate time index from the timeseries data

Parameters
df : pandas.DataFrame

A pandas dataframe with at least one column Time or DateTimeIndex type Index

static resample(df, rate=50, categorical=False, **kwargs)

Resample the time-series dataframe df of varying, non-uniform sampling.

Resampling is done using cubic interpolation and spline method.

Parameters
df : pandas.DataFrame

Original Dataframe to be resampled

rate : double

Desired sampling rate in Hz

cont_method : str

Resampling method for continuous dataset. Available methods: “cubic”, “nearest”, “linear”, “nearest”, “exact”

cat_method : `str’

Resampling method for categorical dataset. Available method: “nearest”

categorical : bool

Boolean flag specifying if dataframe being passed represents a categorical data

time_col: str

Name of time column in df. Default value is “Time”

msg_col : str

Name of message column in df. Default value is “Message”

Returns

dfnew1 (pandas.DataFrame) – New resampled timseries DataFrame

static scatterts(ts, marker_size=10, stacked=True, taxis='elapsed', labels=None, return_fig=False, **kwargs)
Parameters
ts : list | `pd.DataFrame

A timeseries or a list of timeseries dataframe for creating a scatter plot

marker_size : int

Markersize for scatter plot

stacked : bool

If stacked is true, then only one plot will be created and all subplots will be overlaid.

taxis : ["elapsed", "clock"]

How the time axis should be displayed is defined by taxis: If taxis = “elapsed”, then time axis starts with 0. If taxis = “clock”, then time axis will show human readable datetime

labels : list

Labels to be used for legends

return_fig : bool

speed()
Returns

pandas.DataFrame – Timeseries speed data from the CSV file

Example

>>> import strym
>>> from strym import strymread
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> dbcfile = 'newToyotacode.dbc'
>>> csvdata = '2020-03-20.csv'
>>> r0 = strymread(csvfile=csvlist[0], dbcfile=dbcfile)
>>> speed = r0.speed()
speed_limit()
Returns

pandas.DataFrame – Timeseries data for acceleration in speed limit from the CSV file

speed_raw(bus)

Get Speed on All buss

static split_ts(df, by=30.0)

Split the timeseries data by by seconds

Parameters
df : pandas.DataFrame

dataframe to split

by : double

Specify the interval in seconds by which the timseries dataframe needs to split

Returns

  • pandas.DataFramedataframe with an extra column Second denoting splits specified by interval

  • pandas.DataFrame Array – An array of splitted pandas Dataframe by Seconds

start_time()

start_time retrieves the the human-readable time when logging of the data started

Returns

str – Human-readable string-formatted time.

state_space(rate=20, cont_method='nearest', cat_method='nearest', todb=False)

state_space generates a DatFrame with Time column and several other signals - uniformly sampled with common start and end-points for further downstream analysis

steer_angle()
Returns

pandas.DataFrame – Timeseries data for steering angle from the CSV file

steer_fraction()
Returns

pandas.DataFrame – Timeseries data for steering fraction from the CSV file

steer_rate()
Returns

pandas.DataFrame – Timeseries data for steering rate from the CSV file

steer_torque()
Returns

pandas.DataFrame – Timeseries data for steering torque from the CSV file

static temporalviolinplot(dataframe, by=30, title='Timeseries')

A temporal plot showing evolution of distribution as a function by time

static time_shift(df1, df2, time_col1='Time', time_col2='Time', msg_col1='Message', msg_col2='Message', **kwargs)

Compute the time shift specified by time_col2 of df2 with respect to time of df1 specified by time_col1. Once you get time shift you will add it to time axis of second dataframe.

Caveat: Units of time in time columns of both timeseries dataframe must be same.

Parameters
df1 : pandas.DataFrame

First timeseries datframe.

df2 : pandas.DataFrame

Second timeseries datframe.

time_col1 : str

Name of time column in df1. Default value is “Time”

time_col2 : str

Name of time column in df2. Default value is “Time”

msg_col1 : str

Name of message column in df1. Default value is “Message”

msg_col2 : str

Name of message column in df2. Default value is “Message”

correlation_threshold : double

Correlation coefficient threshold in [0,1] at which to stop looking for better time-shift and return the result.

Returns

double, double – Time shift in the unit of time as used in time columns of both timeseries dataframe.

Maximu correlation with given timeshift.

time_subset(**kwargs)

Get the time slices satsifying a particular condition for the dataframe.

Parameters
conditions : str | list<str>

Human readable condition for subsetting of message dataframe. Following conditions are available:

"lead vehicle present" : -

Returns

list – A list of tuples with start and end time of slices. E.g. [(t0, t1), (t2, t3), …] satisfying the given conditions

static timeindex(df, inplace=False)

Convert multi Dataframe of which on column must be ‘Time’ to pandas-compatible timeseries where timestamp is used to replace indices The convesion happens with no time zone information, i.e. all Clock time are in GMT

Parameters
df : pandas.DataFrame

A pandas dataframe with two columns with the column names “Time” and “Message”

inplace : bool

Modifies the actual dataframe, if true, otherwise doesn’t.

Returns

pandas.DataFrame – Pandas compatible timeseries with a single column having column name “Message” where indices are timestamp in hum an readable format.

static timeslices(ts)

timeslices return a set of timeslices in the form of [(t0, t1), (t2, t3), …] from ts where ts is a square pulse (or a timeseries) representing two levels 0 and 1 or True and False where True for when a certain condition was satisfied and False for when condition was not satisfied. For example: ts should be a pandas Series (index with timestamp) with values [True, True, True, …., False, False, …, True, True, True ] which represents square pulses. In that case, t0, t2, … are times for edge rising, and t1, t2, … for edge falling.

Parameters
ts : pandas.core.series.Series

A valid pandas time series with timestamp as index for the series

Returns

list – A list of tuples with start and end time of slices. E.g. [(t0, t1), (t2, t3), …]

topic2msgs(topic)

Return a dictionary value with the message ID and signal name for this particular DBC file, based on the passed in topic name. This is needed because various DBC files have different default names and signal structures depending on manufacturer. This redirection provides robustness to strym when the dbc files are not standardized—as they will never be so.

Parameters
topic : string

The string name of the topic in question. Only limited topics are supported by default

Returns

d (dictionary) – Dictionary with the key/value pairs for message and signal that should be passed to the corresponding strym function. To access the message signal, use d[‘message’] and d[‘signal’]

trajectory(x_init=0.0, y_init=0.0, data_rate=50.0)

A simple trajectory tracing function based on CAN data

Parameters
x_init : double

Initial X-coordinate of the vehicle

y_init : double

Initial Y-coordinate of the vehicle

data_rate : double

Rate at which message are sampled.

Returns

pandas.DataFrame – A pandas Dataframe with three columns: Time, X, Y, Vx, Vy

triplength(time=- 1)

triplength returns total distance travelled while logging CAN data.

Alternative, one can provide a second argument time to query how much distance was traveled in, say 50 seconds from start.

Parameters
time : double

Provide a valid elapsed time in seconds to query how much distance was traveled time seconds since the logging of data was started.

triptime()

triptime retrieves total duration of the recording for given CSV-formatted log file in seconds.

Returns

double – Duration in seconds.

static ts_sync(df1, df2, rate=50, msg_col1='Message', msg_col2='Message', **kwargs)

Time-synchronize and resample two time-series dataframes of varying, non-uniform sampling.

In a non-ideal condition, the first time of df1 timeseries dataframe will not be same as the first time of df2 dataframe.

In that case, we will calculate the value of message at the latest of two first times of df1 and df2 using linear interpolation method. Call the latest of two first time as latest_first_time.

Similarly, we will calculate the value of message at the earliest of two end times of df1 and df2 using linear interpolation method. Call the latest of two first time as earliest_last_time.

Linear interpolation formula is

\[X_i = \cfrac{X_A - X_B}{a-b}(i-b) + X_B\]

Next, we will truncate anything beyond [latest_first_time, earliest_last_time]

Once we have common first and last time in both timeseries dataframes, we will use cubic interpolation to do uniform sampling and interpolation of both time-series dataframe.

Parameters
df1 : pandas.DataFrame

First timeseries datframe. First column name must be named ‘Time’ and second column must be ‘Message’

df2 : pandas.DataFrame

Second timeseries datframe. First column name must be named ‘Time’ and second column must be ‘Message’

rate : double | str

double: New uniform sampling rate

str: Inherting sampling rate from. If rate=”first”, then df2 will be sampled by inheriting time points from df1. If rate=”second” , then df1 will be sampled by inheriting time points from df2

method : str

Resampling method for dataset. Available methods: “cubic”, “nearest”, “linear”, “nearest”, “exact”

Returns

  • dfnew1 (pandas.DataFrame) – First new resampled timseries DataFrame

  • dfnew2 (pandas.DataFrame) – Second new resampled timseries DataFrame

static violinplot(df, title='Violin Plot')

A violin plot to show the data distribution

wheel_speed_fl()
Returns

pandas.DataFrame – Timeseeries data for wheel speed of front left tire from the CSV file

wheel_speed_fr()
Returns

pandas.DataFrame – Timeseeries data for wheel speed of front right tire from the CSV file

wheel_speed_rl()
Returns

pandas.DataFrame – Timeseeries data for wheel speed of rear left tire from the CSV file

wheel_speed_rr()
Returns

pandas.DataFrame – Timeseeries data for wheel speed of rear right tire from the CSV file

yaw_rate()
Returns

pandas.DataFrame – Timeseries data for yaw rate from the CSV file