An open-source, locally deployable 8B model bridging the gap between general knowledge and domain-specific security operations via agentic data augmentation.
Cybersecurity operations demand assistant LLMs that support diverse workflows without exposing sensitive data. Existing solutions either rely on proprietary APIs with privacy risks or on open models lacking domain adaptation. To bridge this gap, we introduce RedSage.
We curate 11.8B tokens of cybersecurity-focused continual pretraining data via large-scale web filtering and manual collection of high-quality resources. Building on this, we design an agentic augmentation pipeline that simulates expert workflows to generate 266K multi-turn cybersecurity samples for supervised fine-tuning.
To rigorously evaluate the models, we introduce RedSage-Bench, a benchmark with 30K multiple-choice and 240 open-ended Q&A items. At the 8B scale, RedSage achieves consistently better results, surpassing baseline models by up to +5.59 points on cybersecurity benchmarks and +5.05 points on Open LLM Leaderboard tasks.
| Model Name | Type | Best For | Link |
|---|---|---|---|
| RedSage-8B-Base | Base | Domain adaptation, further fine-tuning | HuggingFace → |
| RedSage-8B-Ins | Instruct | Multi-turn chat, step-by-step explanations | HuggingFace → |
| RedSage-8B-DPO RECOMMENDED | Chat (Aligned) | Production-ready assistants, aligned behavior | HuggingFace → |
from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_name = "RISys-Lab/RedSage-Qwen3-8B-Ins" tok = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto" ) messages = [ {"role": "system", "content": "You are RedSage."}, {"role": "user", "content": "List three SSRF mitigations."} ] text = tok.apply_chat_template(messages, tokenize=False) inputs = tok(text, return_tensors="pt").to(model.device) out = model.generate(**inputs, max_new_tokens=300) print(tok.decode(out[0]))
# 1. Install vLLM uv pip install vllm --torch-backend=auto # 2. Start the OpenAI-compatible server vllm serve RISys-Lab/RedSage-Qwen3-8B-DPO \ --port 8000 \ --max-model-len 32768 # 3. Query via curl curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "RISys-Lab/RedSage-Qwen3-8B-DPO", "messages": [ {"role": "user", "content": "Explain CTI."} ] }'
From raw web data to a specialized assistant via multi-stage training.
CyberFineWeb (11.8B Tokens) + RedSage Seed (28K Docs)
266K Multi-turn Conversations
RedSage-Bench + Public Benchmarks
Select an example below to view static model outputs (RedSage-8B-DPO).
*Evaluation on 30K domain-specific MCQs covering knowledge, skills, and tools.
*Mean accuracy across CTI-Bench (MCQ & RCM), CyberMetric (500), SecBench (En), SecEval, SECURE (CWET, KCV, MEAT), MMLU-CSec.
*Mean accuracy across MMLU, ARC-C, GSM8K, HellaSwag, TQA, WinoGrande, IFEval.
@inproceedings{suryanto2026redsage,
title={RedSage: A Cybersecurity Generalist {LLM}},
author={Suryanto, Naufal and Naseer, Muzammal and Li, Pengfei and Wasim, Syed Talal and Yi, Jinhui and Gall, Juergen and Ceravolo, Paolo and Damiani, Ernesto},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=W4FAenIrQ2}
}