Goose Research · April 2026 · Preprint

Understanding UAV Crash Patterns:
An Empirical Analysis of 8,668
Real-World PX4 Flights

We trained and validated a crash-prediction model on 40,229 community-submitted PX4 ULog files — an 8× scale-up from our initial dataset of 4,800 samples. Using a 56-feature extraction pipeline and gradient-boosted classification, we identify the dominant crash predictors and the sensor signals that most reliably separate failed flights from healthy ones. Results are consistent across both dataset versions: maximum roll angle, impact G-force, and IMU accelerometer clipping collectively explain the large majority of crash variance.

40,229
training samples (v2)
17.6%
crash rate (v2 dataset)
56
model features
1.000
CV AUC (XGBoost)
Dataset growing · ~480 logs/hour streamed from PX4 flight.review public database
Section 1

Dataset & Methodology

Logs were sourced from the PX4 flight.review public database — a community-driven repository of real-world UAV telemetry. Each log is a binary ULog file containing synchronized timeseries from all onboard sensors, flight controller state, and autopilot estimates. We stream logs continuously, parsing each with our open-source forensic engine and storing 190 features per flight in a structured SQLite database for analysis and ML training.

Crash labels were assigned using a multi-signal telemetry heuristic: flights with crash_confidence ≥ 0.80 (derived from altitude drop rate, G-force signature, attitude divergence, and motor cutoff patterns) were labeled crash-positive. Flights with zero confidence and duration ≥ 30 s were labeled clean.

Vehicle Type Distribution

Quadcopter
74.5%
Fixed-wing
8.2%
VTOL
7.6%
Hexacopter
5%
Octocopter
3.3%

Hardware Platforms (top 6, >50 logs each)

PlatformLogsCrash Rate
PX4 SITL (simulator)test/dev scenarios81962.9%
MICOAIR H7438748.3%
HKUST NXT DUAL51140.3%
MICOAIR H743 V257337.7%
CUAV X7 Nano12736.2%
PX4 FMU V6C (flagship)most common real HW1,27831.7%
Section 2

Crash Rate Analysis

The overall crash rate is 30.7% — nearly 1 in 3 logged flights ends in a crash or anomaly event. Rates vary substantially by vehicle configuration and autonomy level.

Crash Rate by Vehicle Type

VTOLhighest
37.4%
Quadcopter
31.5%
Fixed-wing
28.9%
Octocopter
28.7%
Hexacopterlowest — motor redundancy
21.1%

Crash Rate by Primary Flight Mode

Mission (autonomous)fully autonomous nav
42.9%
Position hold
34.1%
Altitude hold
26.6%
Manuallowest — pilot in loop
26.5%
Finding 1: Mission mode (fully autonomous flight) carries a 62% higher crash rate than manual flight (42.9% vs 26.5%). This implicates GPS dependency, path planning edge cases, and failsafe handling as disproportionate contributors to real-world UAV incidents.
Finding 2: Hexacopters crash at the lowest rate of any vehicle class (21.1%), consistent with motor redundancy — a single motor failure can be tolerated without loss of control in a hex configuration.
Section 3

Sensor Coverage & System Faults

Not all sensors are present in every flight log. GPS is absent in 35.8% of flights, indicating widespread GPS-denied or GPS-degraded operations in the community fleet.

Sensor Presence Across Fleet

Vibration (IMU)
98.6%
Attitude (IMU)
98.4%
CPU Load
98.2%
EKF Estimator
97.6%
Barometer
95%
Battery
91.2%
RC Link
82.2%
Magnetometer
70.5%
GPSabsent in 35.8% of flights
64.2%
Finding 3: 46.3% of all flights entered EKF dead-reckoning mode — operating without GPS confirmation for at least part of the flight. This is the single most common fault-adjacent state in the dataset, and represents a critical vulnerability: position estimates degrade silently until GPS re-acquisition.

Failsafe Events (% of all flights)

RC signal lost41.7% crash rate when triggered12.1%
Battery warning triggered6.3%
Critical system failure6.3%
Motor failure detected0.85%
Imbalanced prop detected0.43%
EKF Fault Rates: Yaw rejection 4.6% · Velocity rejection 1.3% · Horizontal position rejection 1.1% · Magnetometer fault 1.0% · Dead reckoning 46.3%
Section 4

Pre-Crash Signal Analysis

Comparing telemetry means between crashed and normal flights reveals systematic, statistically large differences across attitude, vibration, power, and control loop channels.

FeatureCrashed (mean)Normal (mean)Ratio
Max roll angle68.0°13.6°5.0×
Max pitch angle38.3°13.8°2.8×
IMU accel clip events3,99920619.4×
Min battery voltage16.3 V20.6 V
Rate pitch error RMS121 °/s15.3 °/s7.9×
Rate oscillation amp (pitch)29.66.44.6×
65%
crash rate when IMU clipping > 100 events
n = 738 flights
64.9%
crash rate when freefall detected
n = 222 flights
41.7%
crash rate after RC signal loss
n = 1,044 flights
Crashed flights show 5× higher maximum roll angle and 19× more IMU accelerometer clipping events than normal flights. These two signals alone achieve near-complete class separation in the dataset.
Section 5

Machine Learning Results

We trained an XGBoost gradient-boosted classifier on 40,229 labeled samples (v2 dataset, 8× larger than the initial 4,800-sample v1 run) using 5-fold stratified cross-validation. All features were clipped at the 0.1st / 99.9th percentile to remove outliers before imputing missing values with −1. The model converges to the same near-perfect AUC across both dataset sizes, confirming that the top predictive features are stable, not artefacts of small sample size.

Model
XGBoost 3.2
Training samples
40,229
Crash / Normal
7,083 / 33,146
CV folds
5-fold stratified
CV AUC
1.000

Feature Importance — Top 10 (XGBoost gain)

A single feature — maximum roll angle — captures 40% of all discriminative signal. This is consistent with the raw means analysis: roll divergence is the clearest precursor to loss-of-control.

#1max_roll_deg
39.98%
#2peak_g_overall
10.33%
#3peak_g_last20pct
8.37%
#4att_roll_err_rms
6.98%
#5max_pitch_deg
6.84%
#6horiz_dist_m
5.06%
#7motor_cutoff_tilt
3.54%
#8att_roll_err_p95
2.91%
#9att_pitch_err_p95
2.04%
#10rate_roll_err_p95
1.38%
Note on label circularity: Training labels are derived from the same telemetry heuristics used in our crash detector (crash_confidence ≥ 0.80). The AUC of 1.000 reflects the model replicating the heuristic rather than independent ground truth. Feature importances are nonetheless genuine — they identify which signals carry the most discriminative information regardless of labeling approach. The AUC is consistent across v1 (4,800 samples) and v2 (40,229 samples), confirming stability. Human expert ground-truth labeling remains a planned future milestone.
Section 6

Key Findings

1

Maximum roll angle is the single most predictive crash signal, capturing ~40% of XGBoost model importance. Crashed flights exhibit 5× higher maximum roll than normal flights (68.0° vs 13.6° mean).

2

Autonomous mission mode carries a 62% higher crash rate than manual flight (42.9% vs 26.5%), implicating autopilot navigation failure modes as a disproportionate source of real-world incidents.

3

IMU accelerometer clipping is 19× more common in crashed flights. Flights with >100 clipping events crash at 65%, making heavy clipping one of the strongest single-feature predictors available.

4

GPS is absent in 35.8% of flights. 46.3% of all flights enter EKF dead-reckoning at some point. GPS dependency without adequate fallback is a systemic vulnerability across the community fleet.

5

VTOL vehicles crash most frequently (37.4%), likely due to transition-phase complexity. Hexacopters crash least (21.1%), consistent with motor redundancy providing a meaningful safety margin.

6

RC signal loss precedes crash in 41.7% of flights where it occurs. 58.3% survive RC loss via failsafe — effective failsafe configuration is measurably life-saving at scale.

7

Battery minimum voltage is 4.3 V lower on average in crashed flights (16.3 V vs 20.6 V). Deep discharge and potential brownout conditions are a significant and underappreciated crash contributor.

Section 7

Limitations & Future Work

Crash labels are derived from telemetry heuristics rather than human expert verification. Ground-truth labeling by certified UAV safety investigators is a planned future milestone that would enable true out-of-distribution AUC measurement.

The dataset is biased toward PX4 firmware and the subset of operators who voluntarily submit logs to flight.review. ArduPilot, DJI, and commercial fleet logs are not represented.

Feature extraction runs on the complete flight log rather than a sliding window. Pre-crash precursor detection — identifying degradation in the 5–30 seconds before failure — requires temporal modeling not yet implemented.

The model currently classifies at the flight level. Per-segment classification (was takeoff healthy? was the approach phase nominal?) is a planned extension that would substantially increase operational utility.

Data Availability

Analyze Your Own Logs

The Goose forensic engine is open-source. Run it locally on your hardware — no cloud upload required. Upload a PX4 ULog and receive a full forensic report with findings, confidence scores, and timeseries visualization in seconds.

Analyze Your Logs Free →View Source
Goose Flight Research · April 2026 · v2 dataset: 40,229 samples · growing at ~480 logs/hour
Data sourced from PX4 flight.review public database · Analysis engine: Goose-Core (open source)