Processed LIDAR data contained positions and dimensions of all objects (cars, bikes, pedestrians) crossing an intersection. Each tracked object had a unique ID number that stayed with the object until tracking was lost. However, the LIDAR loose the pedestrians during tracking and therefore, pedestrians had multiple IDs (sometime up to 25 different IDs). Also, LIDAR was unable to keep track of each pedestrian, especially when they are walking in the group or passing each other in the crosswalk. One of the task was to cluster and recreate complete individual trajectories.
We found it challenging to apply off-the-shelf clustering algorithms (DBSCAN, Spectral Clustering, etc.) due to high level of noise in the LIDAR data. Two of the major challenges identified in the initial stages were –
- To identify any pattern in noise in the data
- To determine the optimal hyperparameters of the clustering algorithms
I decided to visualize the data rather than (or before using) the brute force of machine learning algorithms for two reasons –
- Visualization can help us identify the cause of noise in the data, which can be removed using simpler logic. This simple process is transparent.
- It can also help us in determining the optimal set of hyperparameters.
Figures
Here are the two of many figures created to identify any pattern that may exist –


These series of plots helped our research team to understand the various patterns of systematic noise which otherwise were difficult to identify. For example – In figure 2, the same ID was labeled as pedestrian and cyclist. Such issues in the data were unknown until visualized.
Animated video
The tabular format of (LIDAR) data is very suitable for data analysis specially machine learning algorithms. Initial rounds of clustering attempts suggested that clustering algorithms are not able to cluster the trajectories in meaningful way. I had decided to dig into the data to understand the level and kind of noise. However, the tabular data are very non-intuitive of human brains to understand patterns. Exploratory Data Analysis can also provide few insides but the aggregation might still miss most of the details. Therefore, I decided to converted these tabular data into more human brain-understandable format i.e. videos. Below is a snapshot and YouTube link for the short video snippet.
The animated video was used to visualize the LIDAR data and was highly appreciated by other research colleagues.
Get new content delivered directly to your inbox.