While there is a continuous influx of new scholarly publications introducing innovative methods relying on complex and costly data collection, a considerable volume of data prepared using more economical methods remains underutilized. This oversight often occurs due to the assumption that their accuracy is insufficient for research purposes, a claim made without fully exploring the potential of data fusion methods.

On a daily basis, data is rapidly generated across various fields of science and engineering, yet the full potential of available data is often left unexplored. My research in data science stems from this recognition. Simply put, driving involves both longitudinal and lateral (lane-changing) maneuvers. While extensive analysis and modeling have been applied to the former, a significant knowledge gap exists regarding the latter. This gap is mainly due to the scarcity of suitable data, specifically 1Hz trajectories. Traditional data collection methods are expensive and temporary, resulting in limited spatial-temporal coverage (e.g., as seen in NGSIM and HighD). However, cost-effective data sources, initially perceived as less accurate for studying vehicle lane changes, can be leveraged. In this research and its related papers, I introduced two methods based on data fusion and big data mining to approximate trajectory data at nearly zero cost. The cost-effectiveness is attributed to the continuous collection of raw data by road authorities in transportation networks for management and maintenance purposes.

The graph below compares the output dataset of this research versus many well-known trajectory resources available in the literature.

When a smartphone connects to the GPS and uses a navigation app, a version of the travel path could be stored by the app as a trajectory. These trajectories are entirely anonymous. In this project, the trajectory data was collected through passive observation of existing users of the Touring Mobilis, RTL Traffic, and Flitsmeister smartphone applications of Be-Mobile commercial traffic service provider, available for both iOS and Android operating systems based on an agreement within the EU-CEF-project CONCORDA that I was involved in. However, the trajectories are not at the lane-level accuracy.

The problem is shown in the figure below; we equipped a probe vehicle with a differential DGP (d-GPS) with centimeter accuracy and nine cellphones, all with active Be-Mobile apps. The green line is the trajectory collected by the d-GPS, and the other lines are four samples of cellphone trajectories.

This research introduces a four-step approach to reconstructing and correcting lateral bias in the trajectories. The resulting lateral position is accurate enough to identify the driving lane and detect lane changes. The algorithm relies on a data fusion method using trajectory and loop detector data. Evaluation and validation using drones and CCTV data show that the algorithm correctly matches over 94% of trajectory and loop detector data. The lateral position error between successive detector stations is significantly corrected, reducing it to less than half the width of a standard motorway lane. Consequently, the processed trajectory sample points align with the correct lane. With just two calibration parameters, the algorithm is relatively simple and applicable to other test networks.

(Paper: Trans. Res. Part C-2022)

The resulting trajectories just cover that subset of vehicles that run and navigate the BeMobile (or any other navigator) app. The next episode of my research is an endeavor to approximate trajectories and infer lane changes for the whole traffic flow.

This research introduces a novel method referred to as Traffic Flow Crystallization (TFC), which demonstrates exceptional effectiveness by reaching success rates of 99.15% in non-congested situations and 96.97% in congested traffic scenarios, demonstrating a clear superiority over existing approaches that also rely solely on Individual Vehicle Data (IVD). The TFC is based on a vehicle re-identification algorithm. The vehicle re-identification problem refers to the challenge of accurately and consistently identifying the same vehicle across different locations and time instances within a traffic monitoring system. This problem is particularly pertinent in scenarios where surveillance cameras or sensors are deployed at various points, and there is a need to link and track a specific vehicle's movements throughout the monitored area. It involves developing robust algorithms and techniques to match and re-identify vehicles based on similarity measures or other identifiable characteristics, often in the absence of unique identifiers such as license plate numbers. In this project, IVD is used as the source for vehicle re-identification. The output is lane-level trajectories for the entire traffic flow, and conditional on the availability of the raw data, the trajectories have no spatio-temporal limitation.

(Papers: TRB-2023, TFTC-2023, IEEE-ITSC-2023, IEEE Trans Intell Transp Syst-2024)