CAN data analysis using strym

In this notebook, we will analyze data rates throughput and the timeseries characteristics of certain CAN message collected from Toyota RAV4 using Giraffee connector and Panda. At the same time, out objective is to look for fuel data about which don’t have much information through DBC file.

Importing packages

Import required packages

[1]:
from strym import strymread
import strym
import matplotlib.pyplot as plt
import numpy as np
/home/ivory/anaconda3/envs/dbn/lib/python3.7/site-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
  import pandas.util.testing as tm
Loading BokehJS ...

Specify Data Location

[2]:
datafolder = "../../PandaData/2020_08_17/"
import glob
csvlist = glob.glob(datafolder+"*.csv")
[3]:
num_of_files = len(csvlist)
print("Total number of datafiles in {} is {}.".format(datafolder, num_of_files))
Total number of datafiles in ../../PandaData/2020_08_17/ is 18.

Analysis

1. CSV file containing all messages

In this section, we will analyze CSV-formatted CAN Data for data throughput, rates and data distribution.

[4]:
dbcfile = '../examples/newToyotacode.dbc'
drive1filename=datafolder + "2020-08-17-15-43-47_2T3Y1RFV8KC014025_CAN_Messages.csv"
r0 = strymread(csvfile=drive1filename, dbcfile=dbcfile)

First plot the count statistics of CAN messages

[5]:
r0.count(plot=True)
_images/CANDataAnalysis_Fuel_Edition_10_0.png
[5]:
MessageID Counts_Bus_0 Counts_Bus_1 Counts_Bus_2 TotalCount
36 36 8863 0 8863 17726
37 37 17726 0 17726 35452
170 170 17726 0 17726 35452
180 180 8863 0 8863 17726
186 186 5909 0 5909 11818
... ... ... ... ... ...
1787 1787 59 0 59 118
1788 1788 59 0 59 118
1789 1789 59 0 59 118
1792 1792 3049 84 3049 6182
1800 1800 4670 122 4670 9462

186 rows × 5 columns

As you can see this particular csv file recorded all messages.

Plot the speed as timeseries data

[7]:
speed = r0.speed()
strymread.plt_ts(speed, title="Speed (Km/h)")
_images/CANDataAnalysis_Fuel_Edition_13_0.png

Create Violin plot and box plot to see distribution of speed data

[8]:
# violin plot of speed data
strymread.violinplot(speed["Message"], title="Speed (Km/h)")
_images/CANDataAnalysis_Fuel_Edition_15_0.png

From the violin plot and box plot, we see that data is bimodal with majority of values around 0 km/h or above 40 km/h. Mean is around 20 km/h. It will be interesting to check the characteristics of violin plot for stop-and-go traffic.

Rate analysis of speed data

We can analyse data throughput of speed data by measuring some statistical characterisitcs of time differences and instantaneous frequency.

[11]:
import binascii
import bitstring
import time
import datetime
import serial
import csv
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd # Note that this is not commai Panda, but Database Pandas
import cantools
import matplotlib.animation as animation
from matplotlib import style
import uuid

from strym import strymread
from strym import strymmap
import pandas as pd # Note that this is not commai Panda, but Database Pandas

import binascii
import bitstring

import cantools
import strym.DBC_Read_Tools as DBC
from datetime import datetime

can_data_all = pd.read_csv(drive1filename)# read in the data
dbcfile = '../examples/newToyotacode_experiment.dbc'
db_file = cantools.db.load_file(dbcfile,strict=False)
# Specify your dbc file# let's get an example where the dat are mauybe not what we expect
_705 = DBC.convertData('GAS_PEDAL',1,can_data_all,db_file)

strymread.ranalyze(_705[0:-1],title="705")
# strym.ranalyze(_705[0:-1],title='Gas Pedal Data')
/home/ivory/anaconda3/envs/dbn/lib/python3.7/site-packages/strym-0.2.2-py3.7.egg/strym/strymread.py:2788: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  newdf['Time'] = df['Time']
/home/ivory/anaconda3/envs/dbn/lib/python3.7/site-packages/strym-0.2.2-py3.7.egg/strym/strymread.py:2790: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  newdf['Clock'] = pd.DatetimeIndex(Time)
Analyzing Timestamp and Data Rate of 705
Interquartile Range of Rate for 705 is 0.6081276952690864
_images/CANDataAnalysis_Fuel_Edition_18_2.png
[12]:
strymread.ranalyze(speed, title='Speed Data')
Analyzing Timestamp and Data Rate of Speed Data
Interquartile Range of Rate for Speed Data is 1.60617979189022
_images/CANDataAnalysis_Fuel_Edition_19_1.png

from above 2x2 plot, we see that speed data came at 50 Hz a little more than half of instances and at 25Hz for little less than half of instances. From box plot, we see that mean data rate is 34.67 Hz and inter-quartile range is 25.05 Hz. 3rd plot is timeseries of time-diffs. Arrival of most of the data has time-difference below 0.05 for most part and some datapoints have arrival interval of more than 0.15 seconds.

Rate analysis of RADAR traces: TRACK A 0

[13]:
long_dist = r0.long_dist(track_id = 0) # I want to analyze rate for TRACK_A_0 only

strymread.ranalyze(long_dist, title='Longitudinal Distance Data: TRACK A 0')
Analyzing Timestamp and Data Rate of Longitudinal Distance Data: TRACK A 0
Interquartile Range of Rate for Longitudinal Distance Data: TRACK A 0 is 0.1851292761100467
_images/CANDataAnalysis_Fuel_Edition_22_1.png

From above plot, we see that most of the RADAR traces arrive at 20 Hz.