Natural language processing is a branch of computer science and artificial intelligence that deals with computers and human language. Here, the unstructured data is translated into structured data, called the Natural Language Understanding (NLU). The opposite is called NLG (Natural Language Generation), i.e., structured to unstructured data translation. Natural language understanding is usually harder than natural language generation as it is difficult to understand a particular language, especially if you are not a human.
Text Mining
Another important term to know when understanding NLP is ‘text mining’. Text Mining is the process of deriving meaningful information from natural language text. Text mining and NLP go hand in hand. It involves the process of structuring the input text, deriving patterns from it and lastly, evaluating and interpreting the output. It focuses more on the structure than the meaning of data. Text mining is an automatic process that derives valuable insights from unstructured data.
There are various components of NLP:-
1. Tokenisation
Tokenisation is the process of breaking strings (sequence of characters) into tokens (units). For example, the string ‘Add bread and eggs to shopping list’ has 7 tokens. These chunks help in understanding the context and meaning of the string. AI uses the sequence of the tokens to analyse the meaning and sequence of words. After this, each token is worked upon individually. There are various methods of tokenisation :
- Dictionary-based tokenisation – process of partitioning text into a sequence of word, whitespaces and punctuation tokens.
- Rule-based tokenisation – its objective resides in dividing written text into meaningful units
- White space tokenisation -It splits on and discards whitespace characters
- Subword tokenisation – It helps in solving problems by breaking words
2. Stemming
Stemming is the deriving of the word stem for a given token. It normalises the word into its base or root form. The stemming algorithm removes suffixes and prefixes and normalises the tense of the word. For example, the word stem for ‘walking’, ‘walks’ and ‘walked is ‘walk’.
Stemming has its limitations as it doesn’t work well for every token. For example, the word stem for ‘university’ and ‘universal’ is not ‘universe’. For similar situations, we have another tool available called Lemmatisation.
3. Lemmatisation
It takes a given token, learns its meaning from a dictionary definition, and derives its root word or ‘Lemma’. Lemmatisation is similar to Stemming as it maps several words to one common root word. The major difference between the two is that Lemmatisation output is a proper word. For example, the root (or Lemma) of the word ‘better’ is not ‘bet’ but ‘good’.
Parts Of Speech (POS)
The grammatical tag of the words is called the POS tag like noun, verb, adjective, adverb, etc. It indicates how a word functions in meaning and grammar within the sentence. A word can have more than one part of speech based on its context. For example, in the sentence, ‘Google something on the internet’, here the word ‘google’ is used as a verb even though it is a proper noun.
Named Entity Recognition (NER)
It is the process of classifying and locating named entities. With its help, we can extract essential information to understand the text. It involves the identification of critical information in the text and classifying it into a set of predefined categories. The entity is the object that is the main subject of the text. Some of the most important categories in NER are person, organisation and place/location.
‘Apple’s CEO Tim introduced the new iPhone 13 at the New York Central Mall’.
Chunking
Chunking is picking individual pieces of information and grouping them. The put-together pieces are called chunks. Chunking is important when information such as location and a person’s name is to be extracted from texts. Chunk size is the level of specificity.
These were the components of Natural Language Processing. The use of NLP is vital and growing, taking over several objects and machines. NLP has many services in our daily lives where we use its applications.
Some applications of NLP are listed below :
I) Machine Translation
While we translate from one language to another, the context is very important. Using NLP, computers can identify keywords, sense of language and translate them accurately to generate an appropriate response. One of the main advantages of machine translation is that it can translate large pieces of text in a short period of time.
II) Smart Assistants
Some popular examples of smart assistants are Apple’s Siri and Amazon’s Alexa. A pop-up occurs when you say a keyword like “Hey Siri” or “Alexa” and they are ready to carry out commands. They understand the context of our statements and reply with appropriate responses. Smart assistants like Alexa can even perform basic tasks like switching on/off lights, changing the volume and tracks while playing music or altering your shopping list.
III) Virtual Assistant Chatbot
These are AI assistants that are used in conversations via text. Many companies and websites use these to save manpower. They are programmed to understand your problem and provide a viable solution. They are commonly used in product assistance, brand engagement and marketing.
There are many more applications of NLP used in our daily lives. As the tech industry advances, the use of NLP will increase in many sectors. As of today, we have not generated the technology to imitate and understand human behaviours completely, but using NLP has brought us the closest to achieving it.
Frequently Asked Questions :
1. How is NLP used in robots?
Any form of communication requires a medium of translation and communication. NLP is the bridge that makes both sensible and understandable to one another.
2. What does human intent mean?
Often while talking, humans include sarcasm and puns that do not comply with the verbal meaning of the sentence. It simply means the true meaning of our words, which may sometimes differ from the words used.
3. What are the various techniques of NLP?
Sentiment analysis, topic modelling, text classification, keyword extraction and name entry recognition are common techniques for natural language processing in AI.