What is Natural Language Processing or NLP?

Natural Language Processing (or NLP from now on) is concerned with how computers understand and interpret human language. It involves computational linguistics, information engineering, and computer science. NLP is a type of AI (artificial intelligence), and more specifically, a type of machine learning.

Up until now, this might seem like it doesn’t have to do anything with you, reader. But it does!

Computers are programmed to identify words, both written and spoken. But what if that computer wants to communicate with people? Sure, maybe they can identify words but to really communicate, they have to understand meanings, and context.

What NLP does is to analyze the human language to make sense of it so it’s valuable. They use NLP to make sense of the words and understand the context in which they exist. Once they understand, they can communicate!

What you need to know is that if something is responsible for Alexa understanding you, that’s NLP – with some help from speech-to-text conversion.

But it’s not as easy as that…

There are some challenges when it comes to NLP.

We humans can understand conversations because we have the context to them. We know how grammatical rules work, we know the meanings of words, and we even understand the tone in which someone is saying something. Some people have a hard time with this and they don’t understand sarcasm, for example – but that’s the exception, not the rule.

Human language is imprecise and sometimes ambiguous. Some words have different meanings depending on the context. But we’re smart and we can understand these differences. Gee, even if we don’t know the exact meaning of a word, we can imply it by having some context.

For us, it’s easy. For computers, not that much. And that’s what makes NLP challenging.

How does NLP work

Natural language is unstructured data. It’s not organized in databases or sheets (like structured data), which is another challenge for NLP. This makes it much harder for the computer to analyze, but the ultimate goal of NLP is to find meaning in this unstructured data.

At a very basic level, what NLP does is to break down natural language into smaller pieces to try to understand the relationships between them. Then, it explores how these pieces work together to create meaning. Something like this:

Along the lines, many things can happen, like the computer not understanding the correct meaning of a sentence (like when Siri doesn’t understand you). That’s why there are several tasks that NLP relies on to do different things. In some cases, many tasks work together to get a final result, in others, just one is used. There are A LOT of these but some are more common than others.

Once you read them, you’ll also start remembering a lot of things from your English classes!

Let’s talk about some of them.

Syntax tasks

Lemmatization

Many words have a shared underlying or related meaning, but appear in different forms when we use them. Think about book, booking, and books – all these have a similar base, that’s called a lemma. But when we use them, they have different forms, called inflected forms. The task of lemmatization is about identifying the lemmas of the words based on their meaning and context.

Read more about how lemmatization varies from English to German here.

Stemming

It cuts the inflected forms to their root but WITHOUT context. It sounds similar to lemmatization, but they’re different in how they “cut” words:

By the way, chingar, although widely used, might be considered a curse word… don’t use it lightly!

Here are some great examples of NLP.

Cool examples of uses of NLP and the tasks you read before

DeepNLP

As said before, NLP is used to analyze unstructured data. Unstructured data is underexploited in many companies because it’s difficult and expensive to analyze manually. Luckily, we have machines. A cool example of this is the DeepNLP service of SparkCognition.

DeepNLP analyzes unstructured data within organizations so humans can focus on how that information can be used to make decisions. But they can say it better in this video:

Grammarly

Grammarly is a life-saver tool that corrects your spelling and grammar mistakes in real-time. When you’re writing an email or a document on Google, it automatically tells you where your mistakes are, suggesting changes.

Here’s a very well explained blog post where they describe how they use AI, machine learning, and NLP to reach their goal of making it possible for everyone to be heard and understood.

Health Language from Wolters Kluwer

It might sound similar to the case of DeepNLP but it was built just for healthcare organizations. Just imagine that almost 80% of any patient’s information is unstructured data! How do you make sure that everybody involved from the patient to the provider has access to the same relevant information? Health Language does precisely that. This platform extracts patient data found in free-text physician notes and from electronic health records.

You can read more about it here.

Bottom line

NLP has a key role in supporting machine-human interactions, remember this every time you ask Siri what the weather will be today!