Class strymread
¶
Import strymread
as:
from strym import strymread
for reading and analysing CAN bus csv files.
-
class
strym.
strymread
(csvfile, dbcfile='', **kwargs)¶ strymread reads the logged CAN data from the given CSV file. This class provides several utilities functions
- Parameters
- csvfile : str, pandas.DataFrame, default = None
The CSV file to be read. If pandas.DataFrame is supplied, then csvfile is set to None PandasDataFrame, if provided, must have columns [“Time”, “Message”, “MessageID”, “Bus”]
- dbcfile : str, default = “”
The DBC file which will provide codec for decoding CAN messages
- kwargs : variable list of argument in the dictionary format
- bus : list | default = None
A list of integer correspond to Bus ID.
- dbcfolder : str | default = None
Specifies a folder path where to look for appropriate dbc if dbcfile=’’ or dbcfile = None Appropriate dbc file can be inferred from <brand>_<model>_<year>.dbc If dbcfolder is None or empty string, then by default, strymread will look for dbc file in the dbc folder of the package where we ship sample dbc file to work with.
- verbose : bool
Option for verbosity, prints some information when True
- createdb : bool
If True, creates a sqlite3 database for raw CAN data if the database doesn’t exist
- dbdir : str
Optional argument that specifies where sqlite3 database will be stored. The default location is ~/.strym/
-
dbcfile
¶ The filepath of DBC file
- Type
str, default = “”
-
csvfile
¶ The filepath of CSV Data file, or, raw CAN Message DataFrame
- Type
str | pandas.DataFrame
-
dataframe
¶ Pandas dataframe that stores content of csvfile as dataframe
- Type
pandas.Dataframe
-
dataframe_raw
¶ Pandas original dataframe with all bus IDs. When bus= is passed to the constructor to filter out dataframe based on bus id, then original dataframe is save in dataframe_raw
- Type
pandas.Dataframe
-
candb
¶ CAN database fetched from DBC file
- Type
cantools.db
-
burst
¶ A boolean flag that checks if CAN data came in burst. If True, then CAN Data was captured in burst, else False. If CAN Data came in burst (as in say 64 messages at a time or so) then any further analysis might not be reliable. Always check that.
- Type
bool
-
success
¶ A boolean flag, if True, tells that reading of CSV file was successful.
- Type
bool
-
bus
¶ A list of integer correspond to Bus ID.
- Type
list | default = None
-
dbcfolder
¶ Specifies a folder path where to look for appropriate dbc if dbcfile=”” or dbcfile = None Appropriate dbc file can be inferred from <brand>_<model>_<year>.dbc If dbcfolder is None or empty string, then by default, strymread will look for dbc file in package’s dbcfolder where we ship sample dbc file to work with.
- Type
str | default = None
-
dbdir
¶ Location of database where sqlite3 database for CAN Dataframe will stored. Default location: ~/.strym/
- Type
str
-
database
¶ The name of the database corresponding to the model/make of the vehicle from which the CAN data was captured
- Type
str
-
inferred_dbc
¶ DBC file inferred from the name of the csvfile passed.
- Type
str
- Returns
strymread – Returns an object of type strymread upon successful reading or else return None
Example
>>> import strym >>> from strym import strymread >>> import matplotlib.pyplot as plt >>> import numpy as np >>> dbcfile = 'newToyotacode.dbc' >>> csvdata = '2020-03-20.csv' >>> r0 = strymread(csvfile=csvdata, dbcfile=dbcfile)
-
acc_state
(plot=False)¶ Get the cruise control state of the vehicle
- Returns
pandas.DataFrame – Timeseries data with different levels corresponding to different cruise control state
”disabled”: 2, “hold”: 11, “hold_waiting_user_cmd”: 10, “enabled”: 6, “faulted”: 5;
-
accelx
()¶ - Returns
pandas.DataFrame – Timeseries data for acceleration in x-direction (i.e. longitudinal acceleration) from the CSV file
-
accely
()¶ - Returns
pandas.DataFrame – Timeseries data for acceleration in y-direction from the CSV file
-
accelz
()¶ - Returns
pandas.DataFrame – Timeseries data for acceleration in z-direction from the CSV file
-
count
(plot=False)¶ A utility function to return and optionally plot the counts for each Message ID as bar graph
- Returns
pandas.DataFrame – A pandas DataFrame with total message counts per Message ID and total count by Bus
Example
>>> import strym >>> from strym import strymread >>> import matplotlib.pyplot as plt >>> import numpy as np >>> dbcfile = 'newToyotacode.dbc' >>> csvdata = '2020-03-20.csv' >>> r0 = strymread(csvfile=csvlist[0], dbcfile=dbcfile) >>> r0.count()
-
static
create_chunks
(df, continuous_threshold=3.0, column_of_interest='Message', plot=False)¶ create_chunks computes separate chunks from a timeseries data.
- Parameters
- df : pandas.DataFrame
DataFrame that needs to divided into chunks
- continuous_threshold : float, Default = 3.0
Continuous threshold above which we a change point detection is made, and signals start of a new chunk.
- column_of_interest : str , Default = “Message”
Column of interest in DataFrame on which continuous_threshold should act to detect change point for creation of chunks
- plot : bool, Default = False
If True, a scatter plot of Full timeseries of df overlaid with separate continuous chunks of df will be created.
- Returns
list of pandas.DataFrame – Returns a list of DataFrame with same columns as df
-
static
dateparse
(ts)¶ Converts POSIX timestamp to human readable Datformat as per GMT
- Parameters
- ts : float
POSIX formatted timestamp
- Returns
str – Human-readable timestamp as per GMT
-
dbconnect
(db_location)¶ Creates dbconnection and returns db connection object
- Parameters
- db_location : str
sqlite db url
-
static
denoise
(df, method='MA', **kwargs)¶ Denoise the time-series dataframe df using method. By default moving-average is used.
- Parameters
- df : pandas.DataFrame
Original Dataframe to denoise
- method : string, “MA”
Specifies method used for denoising
MA: moving average (default)
- window_size : int
window size used in moving-average based denoising method
Default value: 10
- Returns
pandas.DataFrame – Denoised Timeseries Data
-
static
differentiate
(df, method='S', **kwargs)¶ Differentiate the given timeseries datafrom using spline derivative
- Parameters
- df : pandas.DataFrame
Original Dataframe to be differentiated
- method : str
Specifies method used for differentiation
S: spline, spline based differentiation
AE: autoencoder based denoising-followed by discrete differentiation
- kwargs
variable keyword arguments
- epochs : int
Number of training epochs in case of AE method
- verbose : bool
If True, print logs
- dense_time_points : bool
Used in AutoEncoder AE based differentiation. If True, then differnetiation is computer on 50 times denser time points.
- Returns
pandas.DataFrame – Differentiated Timeseries Data
-
driving_characteristics
()¶ driving_characteristics provides driving characteristics for the given driving data in the form of python dictionary.
Currently, the dictionary contains following metadata from the driving data
File name of CSV-formatedd CAN data file
Associated DBC file used
Start time of the trip in human-readable date format
End time of the trip in human-readable date format
Total duration of the trip
Total distance traveled in meters
Total distance traveled in kilometers
Total distance traveled in miles
- Returns
dictionary – A python dictionary containing driving metadata
-
end_time
()¶ end_time retrieves the the human-readable time when logging of the data was stopped.
- Returns
str – Human-readable string-formatted time.
-
export2mat
(force_rewrite=False)¶ Extract the known messages in MAT file for further downstream analysis
- Parameters
- force_rewrite : bool, default: False
If the mat file exists then force_rewrite=True regenerates the file and overwrite the existing one. If the mat file doesn’t exist, then this parameter will be ignored.
- Returns
list – A list of strings that is file names of extracted data as .mat files
-
frequency
()¶ Retrieves the frequency of each message in a pandas.Dataframe()
MessageID
MeanRate
MedianRate
RateStd
MaxRate
MinRate
RateIQR
- Returns
pandas.DataFrame – Returns the a data frame containing mean rate, std rate, max rate, min rate, rate iqr
-
get_ts
(msg, signal, verbose=False)¶ get_ts returns Timeseries data by given msg_name and signal_name
- Parameters
- msg : string | int
A valid message that can be found in the given DBC file. Can be specified as message name or message ID
- signal : string | int
A valid signal in string format corresponding to msg_name that can be found in the given DBC file. Can be specified as signal name or signal ID
- verbose : bool, default = False
If True, print some information
-
static
integrate
(df, init=0.0, msg_axis='Message', integrator=<function cumtrapz>)¶ Integrate a timeseries data using scipy.integrate.cumtrapz
- Parameters
- df : pandas.Datframe
A two column Pandas data frame. First Column should have name ‘Time’ and Second Column Should be named ‘Message’
- init : double
Initial conditions for integration. Default Value: 0.0.
- msg_axis : str
The value of column in df the needs to be integrated with respect to the time.
Default is ‘Message`
- integrator : function
Integrator method. By default, it is scipy.integrate.cumptrapz
- Returns
df (pandas.Datframe) – A two column Pandas data frame with first column named ‘Time’ and second column named ‘Message’
-
lat_dist
(track_id)¶ utility function to return timeseries lateral distance from radar traces of particular track id
- Parameters
- track_id : int | numpy array | list
- Returns
pandas.DataFrame | list<pandas.DataFrame> – Timeseries lateral distance data from the CSV file
-
lead_distance
()¶ Get the distance information of lead vehicle
- Returns
pandas.DataFrame – Timeseeries data for lead distance from the CSV file
-
long_dist
(track_id)¶ utility function to return timeseries longitudinal distance from radar traces of particular track id
- Parameters
- track_id : int | numpy array | list
- Returns
pandas.DataFrame | list<pandas.DataFrame> – Timeseries longitduinal distance data from the CSV file
-
messageIDs
()¶ Retreives list of all messages IDs available in the given CSV-formatted CAN data file.
- Returns
list – A python list of all available message IDs in the given CSV-formatted CAN data file.
-
msg_subset
(**kwargs)¶ Get the subset of message dataframe based on a condition.
- Parameters
- conditions : str | list<str>
Human readable condition for subsetting of message dataframe. Following conditions are available:
lead vehicle present: Extracts only those messages for which there was lead vehicle present.
cruise control on: Extracts only those messages for which cruise control is on.
operand op x: Extracts those messages for which operator op is operated on operand to fulfil x.
Available operators op are [>,<,==, !=, >=,<=]
Available operand operand are [speed, acceleration, lead_distance, steering_angle, steering_rate, yaw_rate ]. Details of operands are as follows:
speed: timeseries longitudinal speed of the vehicle
acceleration: timeseries longitudinal acceleration of the vehicle
lead_distance: timeseries distance of lead vehicle from the vehicle
steering_angle: timeseries steering angle of the vehicle
steering_rate: timeseries steering rate of the vehicle
yaw_rate: timeseries yaw rate of the vehicle
For example, “speed < 2.3”
- time : (t0, t1)
t0 start elapsed-time t1 end elapsed-time
Extracts messages from time t0 to t1. t0 and t1 denotes elapsed-time and not the actual time.
- ids : list
Get message dataframe containing messages given the list id
- Returns
strymread – Returns strymread object with a modified dataframe attribute
-
plt_speed
()¶ Utility function to plot speed data
-
static
plt_ts
(df, title='', msg_axis='Message', **kwargs)¶ A utility function to plot a timeseries
-
static
ranalyze
(df, title='Timeseries', savefig=False)¶ A utility function to analyse rate of a timeseries data
- Parameters
- title : str
A descriptive string for this particular analysis
-
rel_accel
(track_id)¶ utility function to return timeseries relative acceleration of detected object from radar traces of particular track id
- Parameters
- track_id : int | numpy array | list
- Returns
pandas.DataFrame | list<pandas.DataFrame> – Timeseries relative acceleration data from the CSV file
-
rel_velocity
(track_id)¶ utility function to return timeseries lateral distance from radar traces of particular track id
- Parameters
- track_id : int | numpy array | list
- Returns
pandas.DataFrame | list<pandas.DataFrame> – Timeseries lateral distance data from the CSV file
-
relative_leadervel
()¶ Utility function to return timeseries relative velocity of the leader obtained through all RADAR traces
- Returns
pandas.DataFrame – Timeseries relative velocity of the leader
-
static
remove_duplicates
(df)¶ Remove rows with duplicate time index from the timeseries data
- Parameters
- df : pandas.DataFrame
A pandas dataframe with at least one column Time or DateTimeIndex type Index
-
static
resample
(df, rate=50, categorical=False, **kwargs)¶ Resample the time-series dataframe df of varying, non-uniform sampling.
Resampling is done using cubic interpolation and spline method.
- Parameters
- df : pandas.DataFrame
Original Dataframe to be resampled
- rate : double
Desired sampling rate in Hz
- cont_method : str
Resampling method for continuous dataset. Available methods: “cubic”, “nearest”, “linear”, “nearest”, “exact”
- cat_method : `str’
Resampling method for categorical dataset. Available method: “nearest”
- categorical : bool
Boolean flag specifying if dataframe being passed represents a categorical data
- time_col: str
Name of time column in df. Default value is “Time”
- msg_col : str
Name of message column in df. Default value is “Message”
- Returns
dfnew1 (pandas.DataFrame) – New resampled timseries DataFrame
-
static
scatterts
(ts, marker_size=10, stacked=True, taxis='elapsed', labels=None, return_fig=False, **kwargs)¶ - Parameters
- ts : list | `pd.DataFrame
A timeseries or a list of timeseries dataframe for creating a scatter plot
- marker_size : int
Markersize for scatter plot
- stacked : bool
If stacked is true, then only one plot will be created and all subplots will be overlaid.
- taxis : ["elapsed", "clock"]
How the time axis should be displayed is defined by taxis: If taxis = “elapsed”, then time axis starts with 0. If taxis = “clock”, then time axis will show human readable datetime
- labels : list
Labels to be used for legends
- return_fig : bool
-
speed
()¶ - Returns
pandas.DataFrame – Timeseries speed data from the CSV file
Example
>>> import strym >>> from strym import strymread >>> import matplotlib.pyplot as plt >>> import numpy as np >>> dbcfile = 'newToyotacode.dbc' >>> csvdata = '2020-03-20.csv' >>> r0 = strymread(csvfile=csvlist[0], dbcfile=dbcfile) >>> speed = r0.speed()
-
speed_limit
()¶ - Returns
pandas.DataFrame – Timeseries data for acceleration in speed limit from the CSV file
-
speed_raw
(bus)¶ Get Speed on All buss
-
static
split_ts
(df, by=30.0)¶ Split the timeseries data by by seconds
- Parameters
- df : pandas.DataFrame
dataframe to split
- by : double
Specify the interval in seconds by which the timseries dataframe needs to split
- Returns
pandas.DataFrame – dataframe with an extra column Second denoting splits specified by interval
pandas.DataFrame Array – An array of splitted pandas Dataframe by Seconds
-
start_time
()¶ start_time retrieves the the human-readable time when logging of the data started
- Returns
str – Human-readable string-formatted time.
-
state_space
(rate=20, cont_method='nearest', cat_method='nearest', todb=False)¶ state_space generates a DatFrame with Time column and several other signals - uniformly sampled with common start and end-points for further downstream analysis
-
steer_angle
()¶ - Returns
pandas.DataFrame – Timeseries data for steering angle from the CSV file
-
steer_fraction
()¶ - Returns
pandas.DataFrame – Timeseries data for steering fraction from the CSV file
-
steer_rate
()¶ - Returns
pandas.DataFrame – Timeseries data for steering rate from the CSV file
-
steer_torque
()¶ - Returns
pandas.DataFrame – Timeseries data for steering torque from the CSV file
-
static
temporalviolinplot
(dataframe, by=30, title='Timeseries')¶ A temporal plot showing evolution of distribution as a function by time
-
static
time_shift
(df1, df2, time_col1='Time', time_col2='Time', msg_col1='Message', msg_col2='Message', **kwargs)¶ Compute the time shift specified by time_col2 of df2 with respect to time of df1 specified by time_col1. Once you get time shift you will add it to time axis of second dataframe.
Caveat: Units of time in time columns of both timeseries dataframe must be same.
- Parameters
- df1 : pandas.DataFrame
First timeseries datframe.
- df2 : pandas.DataFrame
Second timeseries datframe.
- time_col1 : str
Name of time column in df1. Default value is “Time”
- time_col2 : str
Name of time column in df2. Default value is “Time”
- msg_col1 : str
Name of message column in df1. Default value is “Message”
- msg_col2 : str
Name of message column in df2. Default value is “Message”
- correlation_threshold : double
Correlation coefficient threshold in [0,1] at which to stop looking for better time-shift and return the result.
- Returns
double, double – Time shift in the unit of time as used in time columns of both timeseries dataframe.
Maximu correlation with given timeshift.
-
time_subset
(**kwargs)¶ Get the time slices satsifying a particular condition for the dataframe.
- Parameters
- conditions : str | list<str>
Human readable condition for subsetting of message dataframe. Following conditions are available:
- "lead vehicle present" : -
- Returns
list – A list of tuples with start and end time of slices. E.g. [(t0, t1), (t2, t3), …] satisfying the given conditions
-
static
timeindex
(df, inplace=False)¶ Convert multi Dataframe of which on column must be ‘Time’ to pandas-compatible timeseries where timestamp is used to replace indices The convesion happens with no time zone information, i.e. all Clock time are in GMT
- Parameters
- df : pandas.DataFrame
A pandas dataframe with two columns with the column names “Time” and “Message”
- inplace : bool
Modifies the actual dataframe, if true, otherwise doesn’t.
- Returns
pandas.DataFrame – Pandas compatible timeseries with a single column having column name “Message” where indices are timestamp in hum an readable format.
-
static
timeslices
(ts)¶ timeslices return a set of timeslices in the form of [(t0, t1), (t2, t3), …] from ts where ts is a square pulse (or a timeseries) representing two levels 0 and 1 or True and False where True for when a certain condition was satisfied and False for when condition was not satisfied. For example: ts should be a pandas Series (index with timestamp) with values [True, True, True, …., False, False, …, True, True, True ] which represents square pulses. In that case, t0, t2, … are times for edge rising, and t1, t2, … for edge falling.
- Parameters
- ts : pandas.core.series.Series
A valid pandas time series with timestamp as index for the series
- Returns
list – A list of tuples with start and end time of slices. E.g. [(t0, t1), (t2, t3), …]
-
topic2msgs
(topic)¶ Return a dictionary value with the message ID and signal name for this particular DBC file, based on the passed in topic name. This is needed because various DBC files have different default names and signal structures depending on manufacturer. This redirection provides robustness to strym when the dbc files are not standardized—as they will never be so.
- Parameters
- topic : string
The string name of the topic in question. Only limited topics are supported by default
- Returns
d (dictionary) – Dictionary with the key/value pairs for message and signal that should be passed to the corresponding strym function. To access the message signal, use d[‘message’] and d[‘signal’]
-
trajectory
(x_init=0.0, y_init=0.0, data_rate=50.0)¶ A simple trajectory tracing function based on CAN data
- Parameters
- x_init : double
Initial X-coordinate of the vehicle
- y_init : double
Initial Y-coordinate of the vehicle
- data_rate : double
Rate at which message are sampled.
- Returns
pandas.DataFrame – A pandas Dataframe with three columns: Time, X, Y, Vx, Vy
-
triplength
(time=- 1)¶ triplength returns total distance travelled while logging CAN data.
Alternative, one can provide a second argument time to query how much distance was traveled in, say 50 seconds from start.
- Parameters
- time : double
Provide a valid elapsed time in seconds to query how much distance was traveled time seconds since the logging of data was started.
-
triptime
()¶ triptime retrieves total duration of the recording for given CSV-formatted log file in seconds.
- Returns
double – Duration in seconds.
-
static
ts_sync
(df1, df2, rate=50, msg_col1='Message', msg_col2='Message', **kwargs)¶ Time-synchronize and resample two time-series dataframes of varying, non-uniform sampling.
In a non-ideal condition, the first time of df1 timeseries dataframe will not be same as the first time of df2 dataframe.
In that case, we will calculate the value of message at the latest of two first times of df1 and df2 using linear interpolation method. Call the latest of two first time as latest_first_time.
Similarly, we will calculate the value of message at the earliest of two end times of df1 and df2 using linear interpolation method. Call the latest of two first time as earliest_last_time.
Linear interpolation formula is
\[X_i = \cfrac{X_A - X_B}{a-b}(i-b) + X_B\]Next, we will truncate anything beyond [latest_first_time, earliest_last_time]
Once we have common first and last time in both timeseries dataframes, we will use cubic interpolation to do uniform sampling and interpolation of both time-series dataframe.
- Parameters
- df1 : pandas.DataFrame
First timeseries datframe. First column name must be named ‘Time’ and second column must be ‘Message’
- df2 : pandas.DataFrame
Second timeseries datframe. First column name must be named ‘Time’ and second column must be ‘Message’
- rate : double | str
double: New uniform sampling rate
str: Inherting sampling rate from. If rate=”first”, then df2 will be sampled by inheriting time points from df1. If rate=”second” , then df1 will be sampled by inheriting time points from df2
- method : str
Resampling method for dataset. Available methods: “cubic”, “nearest”, “linear”, “nearest”, “exact”
- Returns
dfnew1 (pandas.DataFrame) – First new resampled timseries DataFrame
dfnew2 (pandas.DataFrame) – Second new resampled timseries DataFrame
-
static
violinplot
(df, title='Violin Plot')¶ A violin plot to show the data distribution
-
wheel_speed_fl
()¶ - Returns
pandas.DataFrame – Timeseeries data for wheel speed of front left tire from the CSV file
-
wheel_speed_fr
()¶ - Returns
pandas.DataFrame – Timeseeries data for wheel speed of front right tire from the CSV file
-
wheel_speed_rl
()¶ - Returns
pandas.DataFrame – Timeseeries data for wheel speed of rear left tire from the CSV file
-
wheel_speed_rr
()¶ - Returns
pandas.DataFrame – Timeseeries data for wheel speed of rear right tire from the CSV file
-
yaw_rate
()¶ - Returns
pandas.DataFrame – Timeseries data for yaw rate from the CSV file