Natural Language Processing (or NLP from now on) is concerned with how computers understand and interpret human language. It involves computational linguistics, information engineering, and computer science. NLP is a type of AI (artificial intelligence), and more specifically, a type of machine learning.
Up until now, this might seem like it doesn’t have to do anything with you, reader. But it does!
Computers are programmed to identify words, both written and spoken. But what if that computer wants to communicate with people? Sure, maybe they can identify words but to really communicate, they have to understand meanings, and context.
What NLP does is to analyze the human language to make sense of it so it’s valuable. They use NLP to make sense of the words and understand the context in which they exist. Once they understand, they can communicate!
What you need to know is that if something is responsible for Alexa understanding you, that’s NLP – with some help from speech-to-text conversion.
But it’s not as easy as that…
There are some challenges when it comes to NLP.
We humans can understand conversations because we have the context to them. We know how grammatical rules work, we know the meanings of words, and we even understand the tone in which someone is saying something. Some people have a hard time with this and they don’t understand sarcasm, for example – but that’s the exception, not the rule.
Human language is imprecise and sometimes ambiguous. Some words have different meanings depending on the context. But we’re smart and we can understand these differences. Gee, even if we don’t know the exact meaning of a word, we can imply it by having some context.
For us, it’s easy. For computers, not that much. And that’s what makes NLP challenging.
How does NLP work
Natural language is unstructured data. It’s not organized in databases or sheets (like structured data), which is another challenge for NLP. This makes it much harder for the computer to analyze, but the ultimate goal of NLP is to find meaning in this unstructured data.
At a very basic level, what NLP does is to break down natural language into smaller pieces to try to understand the relationships between them. Then, it explores how these pieces work together to create meaning. Something like this:
Along the lines, many things can happen, like the computer not understanding the correct meaning of a sentence (like when Siri doesn’t understand you). That’s why there are several tasks that NLP relies on to do different things. In some cases, many tasks work together to get a final result, in others, just one is used. There are A LOT of these but some are more common than others.
Once you read them, you’ll also start remembering a lot of things from your English classes!
Let’s talk about some of them.
Many words have a shared underlying or related meaning, but appear in different forms when we use them. Think about book, booking, and books – all these have a similar base, that’s called a lemma. But when we use them, they have different forms, called inflected forms. The task of lemmatization is about identifying the lemmas of the words based on their meaning and context.
Read more about how lemmatization varies from English to German here.
It cuts the inflected forms to their root but WITHOUT context. It sounds similar to lemmatization, but they’re different in how they “cut” words:
Grammarly is a life-saver tool that corrects your spelling and grammar mistakes in real-time. When you’re writing an email or a document on Google, it automatically tells you where your mistakes are, suggesting changes.
Here’s a very well explained blog post where they describe how they use AI, machine learning, and NLP to reach their goal of making it possible for everyone to be heard and understood.
Health Language from Wolters Kluwer
It might sound similar to the case of DeepNLP but it was built just for healthcare organizations. Just imagine that almost 80% of any patient’s information is unstructured data! How do you make sure that everybody involved from the patient to the provider has access to the same relevant information? Health Language does precisely that. This platform extracts patient data found in free-text physician notes and from electronic health records.
You can read more about it here.
NLP has a key role in supporting machine-human interactions, remember this every time you ask Siri what the weather will be today!