Trustworthy AI / Cybersecurity

Changing Cybersecurity with Natural Language Processing

Oct 19, 2022

By Bartley Richardson

Discuss (0)

AI-Generated Summary

Dislike

Natural Language Processing (NLP) is being used beyond its traditional applications to help machines interpret both human and non-human language, creating opportunities for organizations to stay ahead of evolving cybersecurity threats.
NLP can be applied to cybersecurity workflows to assist in breach protection, identification, and analysis, such as understanding phishing attempts by analyzing email text and internal structure to identify patterns of spammers.
The use of NLP in cybersecurity can also be extended to log parsing and the creation of synthetic languages to represent machine-to-machine and human-to-machine communications, helping to identify known and unknown issues or attacks.

AI-generated content may summarize information incompletely. Verify important information. Learn more

If you’ve used a chatbot, predictive text to finish a thought in an email, or pressed “0” to speak to an operator, you’ve come across natural language processing (NLP). As more enterprises adopt NLP, the sub-field is developing beyond those popular use cases of machine-human communication to machines interpreting both human and non-human language. This creates an exciting opportunity for organizations to stay ahead of evolving cybersecurity threats.

This post was originally published on CIO.com

NLP combines linguistics, computer science, and AI to support machine learning of human language. Human language is astonishingly complex. Relying on structured rules leaves machines with an incomplete understanding of it.

NLP enables machines to contextualize and learn instead of relying on rigid encoding so that they can adapt to different dialects, new expressions, or questions that the programmers never anticipated.

NLP research has driven the evolution of AI tech, like neural networks that are instrumental to machine learning across various fields and use cases. NLP has been primarily leveraged across machine-to-human communication to simplify interactions for enterprises and consumers.

NLP for cybersecurity

NLP was designed to enable machines to learn to communicate like humans, with humans. Many services that we use today leverage machine communications either to each other or in translation to become intelligible by humans. Cybersecurity is the perfect example of such a field where IT analysts can feel like they speak to more machines than people.

NLP can be leveraged in cybersecurity workflows to assist in breach protection, identification, and scale and scope analysis.

Phishing

In the short term, NLP can be easily leveraged to enhance and simplify breach protection from phishing attempts.

In the context of phishing, NLP can be leveraged to understand bot or spam behavior in email text sent by a machine posing as a human. It can also be used to understand the internal structure of the email itself to identify patterns of spammers and the types of messages they send.

This example is the first extension of NLP, originally designed to understand just human language and now being applied to understand the combination of human language mixed with machine-level headers.

Log parsing

In the medium term, NLP can be leveraged to parse logs, a cyBERT use case.

In the current rules-based system, the mechanisms and systems required to parse raw logs and make them ready for analysts are brittle and need significant development and maintenance resources.

Using NLP, parsing of raw logs becomes more flexible and less prone to breaking when changes occur to the log generators and sensors.

Going further, the neural networks used for parsing can generalize beyond the logs they were exposed to during training, creating methods to transform raw data into rich content ready for an analyst without the need to write explicit rules for these new or changed log types.

As a result, NLP models are more accurate at parsing logs than traditional rules while being more flexible and fault-tolerant.

Synthetic languages

In the longer term, entirely synthetic languages can be created that represent machine-to-machine and human-to-machine communications.

If two machines can create an entirely new language, that language can then be analyzed using NLP techniques to identify errors in grammar, syntax, and composition. All these can be interpreted as anomalies and contextualized for analysts.

This new development can help identify known issues or attacks when they occur, and can also identify completely unknown misconfigurations and attacks, which helps analysts be more efficient and effective.

Summary

The phishing protection, log parsing, and synthetic language applications are just the beginning for NLP. To learn more about AI and cybersecurity, see Learn About the Latest Developments with AI-Powered Cybersecurity, one of many on-demand sessions from NVIDIA GTC.

Discuss (0)

About the Authors

About Bartley Richardson
Bartley Richardson is the Engineering Manager of Morpheus at NVIDIA. He leads a cross-discipline team that researches GPU-accelerated ML and DL techniques and engineers new frameworks to address the cybersecurity challenges of today and tomorrow. Prior to joining NVIDIA, Bartley was a technical lead and performer on multiple DARPA research projects, where he applied data science and machine learning algorithms at scale to solve large cybersecurity problems. He was also the principal investigator of an Internet of Things research project which focused on applying machine and deep learning techniques to large amounts of IoT data to provide intelligence value relating to form function, and pattern-of-life. His primary research areas involve NLP and sequence-based methods applied to cyber network datasets as well as cross-domain applications of machine and deep learning solutions to tackle the growing number of cybersecurity threats. Bartley holds a Ph.D. in Computer Science and Engineering with a focus on loosely- and unstructured logical query optimization. His BS is in Computer Engineering with a concentration in software design and AI.

View all posts by Bartley Richardson