We trained and validated a crash-prediction model on 40,229 community-submitted PX4 ULog files — an 8× scale-up from our initial dataset of 4,800 samples. Using a 56-feature extraction pipeline and gradient-boosted classification, we identify the dominant crash predictors and the sensor signals that most reliably separate failed flights from healthy ones. Results are consistent across both dataset versions: maximum roll angle, impact G-force, and IMU accelerometer clipping collectively explain the large majority of crash variance.
Logs were sourced from the PX4 flight.review public database — a community-driven repository of real-world UAV telemetry. Each log is a binary ULog file containing synchronized timeseries from all onboard sensors, flight controller state, and autopilot estimates. We stream logs continuously, parsing each with our open-source forensic engine and storing 190 features per flight in a structured SQLite database for analysis and ML training.
Crash labels were assigned using a multi-signal telemetry heuristic: flights with crash_confidence ≥ 0.80 (derived from altitude drop rate, G-force signature, attitude divergence, and motor cutoff patterns) were labeled crash-positive. Flights with zero confidence and duration ≥ 30 s were labeled clean.
| Platform | Logs | Crash Rate |
|---|---|---|
| PX4 SITL (simulator)test/dev scenarios | 819 | 62.9% |
| MICOAIR H743 | 87 | 48.3% |
| HKUST NXT DUAL | 511 | 40.3% |
| MICOAIR H743 V2 | 573 | 37.7% |
| CUAV X7 Nano | 127 | 36.2% |
| PX4 FMU V6C (flagship)most common real HW | 1,278 | 31.7% |
The overall crash rate is 30.7% — nearly 1 in 3 logged flights ends in a crash or anomaly event. Rates vary substantially by vehicle configuration and autonomy level.
Not all sensors are present in every flight log. GPS is absent in 35.8% of flights, indicating widespread GPS-denied or GPS-degraded operations in the community fleet.
| RC signal lost41.7% crash rate when triggered | 12.1% |
| Battery warning triggered | 6.3% |
| Critical system failure | 6.3% |
| Motor failure detected | 0.85% |
| Imbalanced prop detected | 0.43% |
Comparing telemetry means between crashed and normal flights reveals systematic, statistically large differences across attitude, vibration, power, and control loop channels.
| Feature | Crashed (mean) | Normal (mean) | Ratio |
|---|---|---|---|
| Max roll angle | 68.0° | 13.6° | 5.0× |
| Max pitch angle | 38.3° | 13.8° | 2.8× |
| IMU accel clip events | 3,999 | 206 | 19.4× |
| Min battery voltage | 16.3 V | 20.6 V | — |
| Rate pitch error RMS | 121 °/s | 15.3 °/s | 7.9× |
| Rate oscillation amp (pitch) | 29.6 | 6.4 | 4.6× |
We trained an XGBoost gradient-boosted classifier on 40,229 labeled samples (v2 dataset, 8× larger than the initial 4,800-sample v1 run) using 5-fold stratified cross-validation. All features were clipped at the 0.1st / 99.9th percentile to remove outliers before imputing missing values with −1. The model converges to the same near-perfect AUC across both dataset sizes, confirming that the top predictive features are stable, not artefacts of small sample size.
A single feature — maximum roll angle — captures 40% of all discriminative signal. This is consistent with the raw means analysis: roll divergence is the clearest precursor to loss-of-control.
Maximum roll angle is the single most predictive crash signal, capturing ~40% of XGBoost model importance. Crashed flights exhibit 5× higher maximum roll than normal flights (68.0° vs 13.6° mean).
Autonomous mission mode carries a 62% higher crash rate than manual flight (42.9% vs 26.5%), implicating autopilot navigation failure modes as a disproportionate source of real-world incidents.
IMU accelerometer clipping is 19× more common in crashed flights. Flights with >100 clipping events crash at 65%, making heavy clipping one of the strongest single-feature predictors available.
GPS is absent in 35.8% of flights. 46.3% of all flights enter EKF dead-reckoning at some point. GPS dependency without adequate fallback is a systemic vulnerability across the community fleet.
VTOL vehicles crash most frequently (37.4%), likely due to transition-phase complexity. Hexacopters crash least (21.1%), consistent with motor redundancy providing a meaningful safety margin.
RC signal loss precedes crash in 41.7% of flights where it occurs. 58.3% survive RC loss via failsafe — effective failsafe configuration is measurably life-saving at scale.
Battery minimum voltage is 4.3 V lower on average in crashed flights (16.3 V vs 20.6 V). Deep discharge and potential brownout conditions are a significant and underappreciated crash contributor.
Crash labels are derived from telemetry heuristics rather than human expert verification. Ground-truth labeling by certified UAV safety investigators is a planned future milestone that would enable true out-of-distribution AUC measurement.
The dataset is biased toward PX4 firmware and the subset of operators who voluntarily submit logs to flight.review. ArduPilot, DJI, and commercial fleet logs are not represented.
Feature extraction runs on the complete flight log rather than a sliding window. Pre-crash precursor detection — identifying degradation in the 5–30 seconds before failure — requires temporal modeling not yet implemented.
The model currently classifies at the flight level. Per-segment classification (was takeoff healthy? was the approach phase nominal?) is a planned extension that would substantially increase operational utility.
The Goose forensic engine is open-source. Run it locally on your hardware — no cloud upload required. Upload a PX4 ULog and receive a full forensic report with findings, confidence scores, and timeseries visualization in seconds.