Driving Behavior Recognition

Classifying driving behavior from phone sensor data.

A machine learning study that classifies driving behavior from accelerometer and gyroscope sensor data. Using Orange Data Mining, the project compares models trained with and without timestamp data to show how misleading perfect accuracy can be when a model learns from data leakage instead of actual driving patterns.

ROLE
University Research Project
TEAM
Solo Project
TIMELINE
Course project
YEAR
2024
Machine LearningOrangeClassification

Problem

Road accidents are often tied to risky driving behavior, but raw accelerometer and gyroscope readings are difficult for humans to interpret directly. The project explores whether machine learning can classify normal, slow, and aggressive driving behavior from mobile sensor data.

The dataset also includes timestamp values, which created a key risk: models could appear highly accurate by learning when data was collected instead of learning the actual motion patterns of the vehicle.

Approach

I used a Kaggle driving behavior dataset collected from smartphone accelerometer and gyroscope sensors. The data included three behavior classes: NORMAL, SLOW, and AGGRESSIVE.

The experiment was built in Orange Data Mining and compared Neural Network, Random Forest, and Logistic Regression models across two setups: one with timestamp removed and one with timestamp included.

I used cross validation, test scores, confusion matrices, and scatter plots to compare model performance and investigate whether timestamp data was distorting the results.

Outcome

Models trained with timestamp included reached nearly 100% accuracy, but the visual analysis showed that the models were separating behavior classes by timestamp gaps rather than sensor behavior.

After removing timestamp, the average accuracy dropped to about 56.46%, making the result less impressive but more trustworthy. The experiment showed that timestamp acted as a source of data leakage.

Key learning

A high accuracy score is not automatically a good result. The most important finding was that the 100% model was less reliable because it learned from timestamp artifacts, while the lower-scoring timestamp-free model better represented the real classification problem.

BUILT WITH
Orange Data MiningMachine LearningAccelerometer DataGyroscope Data
← PREVIOUS
Vocabulary the Adventure
NEXT →
Computer Science Website