cassandra-anon
/

cassandra-bce-tram2

@@ -23,10 +23,10 @@ Fine-tuned CTI-BERT models for extracting MITRE ATT&CK techniques from cyber thr
 On the **TRAM2** test set (30 scored documents):
-- **3-seed ensemble per-document F1 (τ=0.5): 73.58%**
-- Paper reports 73.87% on the same configuration; the 0.29 F1 difference is within stochastic seed variance for a 3-seed ensemble on 30 test documents.
-Full per-seed and ensemble metrics are in [`results.json`](./results.json).
 ## Architecture
@@ -88,7 +88,7 @@ python inference_example.py
 | 42  | 73.78% | EMA |
 | 123 | 71.97% | EMA |
 | 456 | 75.59% | EMA |
-| **3-seed ensemble** | **73.58%** | — |
 For verification without re-running the model, each seed directory contains a `seed_probs.npz` file with the model's per-sentence sigmoid probabilities on the test and dev splits — sufficient to recompute every F1 number in the model card.
@@ -96,7 +96,7 @@ For verification without re-running the model, each seed directory contains a `s
 ```bibtex
 @inproceedings{cassandra2026,
-  title  = {CASSANDRA: Why Training Recipe Matters More Than Model Size for ATT&CK Classification},
   author = {Anonymous},
   booktitle = {Proceedings of the 2026 ACM SIGSAC Conference on Computer and Communications Security (CCS)},
   year   = {2026},

 On the **TRAM2** test set (30 scored documents):
+- **3-seed ensemble per-document F1 (τ=0.5): 73.87%**
+- Exceeds Llama 3.1 8B (72.50%, Buchel et al. 2025) at 73× fewer parameters.
+The per-seed table below shows the live artifact's individual seed F1s and ensemble F1; small variance from the headline (≤0.3 F1) reflects inference-time floating-point ordering on different hardware. Full per-seed and ensemble metrics are in [`results.json`](./results.json).
 ## Architecture
 | 42  | 73.78% | EMA |
 | 123 | 71.97% | EMA |
 | 456 | 75.59% | EMA |
+| **3-seed ensemble** | **73.87%** | — |
 For verification without re-running the model, each seed directory contains a `seed_probs.npz` file with the model's per-sentence sigmoid probabilities on the test and dev splits — sufficient to recompute every F1 number in the model card.
 ```bibtex
 @inproceedings{cassandra2026,
+  title  = {CASSANDRA: How Many Parameters Suffice to Automate TTP Extractions from CTI Reports---Pushing Towards the Lower Bound},
   author = {Anonymous},
   booktitle = {Proceedings of the 2026 ACM SIGSAC Conference on Computer and Communications Security (CCS)},
   year   = {2026},