Problem
Road accidents are often tied to risky driving behavior, but raw accelerometer and gyroscope readings are difficult for humans to interpret directly. The project explores whether machine learning can classify normal, slow, and aggressive driving behavior from mobile sensor data.
The dataset also includes timestamp values, which created a key risk: models could appear highly accurate by learning when data was collected instead of learning the actual motion patterns of the vehicle.
Approach
I used a Kaggle driving behavior dataset collected from smartphone accelerometer and gyroscope sensors. The data included three behavior classes: NORMAL, SLOW, and AGGRESSIVE.
The experiment was built in Orange Data Mining and compared Neural Network, Random Forest, and Logistic Regression models across two setups: one with timestamp removed and one with timestamp included.
I used cross validation, test scores, confusion matrices, and scatter plots to compare model performance and investigate whether timestamp data was distorting the results.
Outcome
Models trained with timestamp included reached nearly 100% accuracy, but the visual analysis showed that the models were separating behavior classes by timestamp gaps rather than sensor behavior.
After removing timestamp, the average accuracy dropped to about 56.46%, making the result less impressive but more trustworthy. The experiment showed that timestamp acted as a source of data leakage.
Key learning
A high accuracy score is not automatically a good result. The most important finding was that the 100% model was less reliable because it learned from timestamp artifacts, while the lower-scoring timestamp-free model better represented the real classification problem.