Static Artificial Neural Network Datasets

We have a base datasets related to EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models paper called EMBER. You can find the datasets inside elastic/ember repository. We maintained our version of the ember repository scorpionantimalware/ember.

EMBERSim: A Large-Scale Databank for Boosting Similarity Search in Malware Analysis paper from NIPS conference 2023 that provide some enchancement inside CrowdStrike/embersim-databank repository.

Static Artificial Neural Network Datasets

Dataset Name

Size

Benign Samples

Malware Samples

Related Papers

Links

ember-dataset (Public)

1.6 GB JSONLs or 5 GB vectorized

300,000 Train + 100,000 Test

300,000 Train + 100,000 Test

EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models

https://www.kaggle.com/datasets/pwn3xt/ember-dataset

Note

We use the EMBER dataset to train our base models and then we use our datasets to fine-tune the models.