Static Artificial Neural Network Datasets¶
We have a base datasets related to EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models paper called EMBER. You can find the datasets inside elastic/ember repository. We maintained our version of the ember repository scorpionantimalware/ember.
EMBERSim: A Large-Scale Databank for Boosting Similarity Search in Malware Analysis paper from NIPS conference 2023 that provide some enchancement inside CrowdStrike/embersim-databank repository.
Dataset Name |
Size |
Benign Samples |
Malware Samples |
Related Papers |
Links |
---|---|---|---|---|---|
ember-dataset (Public) |
1.6 GB JSONLs or 5 GB vectorized |
300,000 Train + 100,000 Test |
300,000 Train + 100,000 Test |
EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models |
Note
We use the EMBER dataset to train our base models and then we use our datasets to fine-tune the models.