Changing Cybersecurity with Natural Language Processing

If you’ve used a chatbot, predictive text to finish a thought in an email, or pressed “0” to speak to an operator, you’ve come across natural language processing (NLP). As more enterprises adopt NLP, the sub-field is developing beyond those popular use cases of machine-human communication to machines interpreting both human and non-human language. This creates an exciting opportunity for organizations to stay ahead of evolving cybersecurity threats.

This post was originally published on

NLP combines linguistics, computer science, and AI to support machine learning of human language. Human language is astonishingly complex. Relying on structured rules leaves machines with an incomplete understanding of it.

NLP enables machines to contextualize and learn instead of relying on rigid encoding so that they can adapt to different dialects, new expressions, or questions that the programmers never anticipated.

NLP research has driven the evolution of AI tech, like neural networks that are instrumental to machine learning across various fields and use cases. NLP has been primarily leveraged across machine-to-human communication to simplify interactions for enterprises and consumers.

NLP for cybersecurity

NLP was designed to enable machines to learn to communicate like humans, with humans. Many services that we use today leverage machine communications either to each other or in translation to become intelligible by humans. Cybersecurity is the perfect example of such a field where IT analysts can feel like they speak to more machines than people.

NLP can be leveraged in cybersecurity workflows to assist in breach protection, identification, and scale and scope analysis.


In the short term, NLP can be easily leveraged to enhance and simplify breach protection from phishing attempts.

In the context of phishing, NLP can be leveraged to understand bot or spam behavior in email text sent by a machine posing as a human. It can also be used to understand the internal structure of the email itself to identify patterns of spammers and the types of messages they send.

This example is the first extension of NLP, originally designed to understand just human language and now being applied to understand the combination of human language mixed with machine-level headers.

Log parsing

In the medium term, NLP can be leveraged to parse logs, a cyBERT use case.

In the current rules-based system, the mechanisms and systems required to parse raw logs and make them ready for analysts are brittle and need significant development and maintenance resources.

 Using NLP, parsing of raw logs becomes more flexible and less prone to breaking when changes occur to the log generators and sensors.

Going further, the neural networks used for parsing can generalize beyond the logs they were exposed to during training, creating methods to transform raw data into rich content ready for an analyst without the need to write explicit rules for these new or changed log types. 

As a result, NLP models are more accurate at parsing logs than traditional rules while being more flexible and fault-tolerant.

Synthetic languages

In the longer term, entirely synthetic languages can be created that represent machine-to-machine and human-to-machine communications.

If two machines can create an entirely new language, that language can then be analyzed using NLP techniques to identify errors in grammar, syntax, and composition. All these can be interpreted as anomalies and contextualized for analysts.

This new development can help identify known issues or attacks when they occur, and can also identify completely unknown misconfigurations and attacks, which helps analysts be more efficient and effective.


The phishing protection, log parsing, and synthetic language applications are just the beginning for NLP. To learn more about AI and cybersecurity, see Learn About the Latest Developments with AI-Powered Cybersecurity, one of many on-demand sessions from  NVIDIA GTC.

Discuss (0)