Combining supervised classification and anomaly detection to defend against known and zero-day threats using the CIC-IDS-2018 benchmark.
Traditional intrusion detection matches known attack "fingerprints." When a new, never-before-seen threat appears — a zero-day attack — these systems stay silent. And the datasets used to train them are decades old.
Researchers agree: legacy datasets fail modern IDS research. CIC-IDS-2018 was designed with real traffic profiles and 14+ modern attack families — the gold standard.
| Criteria | NSL-KDD (1999) ✗ | CIC-IDS-2018 ✓ |
|---|---|---|
| Attack Types | 4 generic categories | 14+ realistic attack families |
| Traffic Realism | Simulated lab traffic | Real-world profiled flows |
| Scale | ~125K records | 8.2M+ flow records |
| Modern Attacks | No DDoS, no botnets | DDoS, Botnets, Brute-Force, Web Attacks |
| Method | ML Model | Training Data | Results | Challenge |
|---|---|---|---|---|
| Outlier Detector | One-Class SVM | CIC-IDS2017, NSL-KDD | Recall: 27%–99% | Inconsistent across attacks |
| 6-Detector Comparison | RF, MLP, KNN | CSE-CIC-IDS2018 | TPR up to 96% | Varying by attack type |
| Hybrid Method | RF & SVM | Private dataset | 88.54% zero-day detected | Private data, hard to compare |
| Transfer Learning | SVM, KNN, DT | NSL-KDD | 70% acc, 0.75 F1 | Low accuracy, limited tests |
8.2 million network flows — 74% benign, 26% attacks across 14 categories. Raw data contains infinite values, high correlations, and identifier columns that leak topology.
inf/-inf with NaN →
3. Dropped columns with >50% missing values →
4. Filled remaining NaN with 0 →
5. Normalized labels to lowercase
One classifies known attacks. The other detects anything abnormal. Together, they cover known threats and zero-day unknowns.
RF dominates known attacks. OCSVM catches what it can't. Our Hybrid approach combines both for adaptive defense.