Natural-Language-Processing-NLP-AI.jpeg

What is NLP?

NLP or Natural Language Processing, is a way for computers to understand human language (e.g., English, Chinese, Indonesian, etc) as opposed to binary data. Computers are able to do this by encoding words in human languages into a set of numbers that is arranged in such a way that would make computers understand contexts of words within a text.

Problems that we want to solve using NLP in this project

We realised that there are 2 major problems that we can tackle by implementing NLP to our project.  

Manual travel guide filtering by humans is not scalable

One of the most important tasks that we do is to make sure that audio files that are uploaded to the platform are valid. The initial plan that we had was to ask humans to listen to each of those uploaded audio files and judge whether they are valid or not. This could take a long time if there is a lot of audio files uploaded. Because of this, we need an AI solution that can process all of these travel guides candidates at once. An NLP model would be a scalable solution for this since we can process text candidates in parallel.

There are no audio for users during their journey to their destinations during navigation We have a new feature that is built during Release 2, and that is navigation.

This feature allows users to go from one place to another, and listen to audio travel guides when they reach each of their destinations for the trip. Even though this is great, users currently will have to find someone to talk to, or stay silent, or listen to other entertainment sources during their way from 1 place to another (before reaching destinations). We want to solve this problem by introducing a chatbot (which uses NLP). The initial plan is simple, make sure that this chatbot can answer user's question about direction during their navigation.

IMPLEMENTATION

How do we implement an NLP model?

As mentioned above, computers are able to understand natural languages by converting words to a set of numbers. This set of numbers is called "word vectors". Essentially, a word vector is a number vector that represents a meaning of word. If a word's meaning is close together with another word, then those 2 words will have similar word vectors.

How do we generate these word vectors?

In the old days, people used to use an algorithm such as Word2Vec. However, since deep learning has been gaining traction nowadays, we utilise neural networks to generate these word vectors.

transformers_nlp.jpeg.1

Transformers

As mentioned in the previous section, we decided to use neural networks to generate the word vectors that will represent words. There is a lot of neural network applications out there that is catered for natural language processing. However, we focus on transformers for this project.

What is a transformer?

A transformer is a type of neural network that implements self-attention. Before explaining what self-attention is, we'll have to know what "attention" means in terms of NLP deep learning. Attention is basically a way to know which words are more important than the others in the context of a text. Usually, attention is used for translation tasks, where there are 2 texts (origin language and destination language). During the translation, we "attend" the origin text (the origin language) to know which of the words is more important than the others. Notice that we don't "attend" the destination text that we're currently building. Self-attention does this differently. While we're building word vector, self-attention allows us to focus on the text we're currently working on, instead of working on the destination text.

Benefit of self-attention

Self-attention allows us to do matrix multiplication to computer the attention matrices. This allows us to do a more efficient computation, hence speeding up the training.

Transformer Pre-trained Models

BERT for text classification (travel guide filtering)

BERT is a bidirectional transformer that only uses the encoder part. Because of its bidirectional characteristics, it understand the context of each word from both directions. This is useful for tasks that require understanding the full context of a text, e.g. text classification.

GPT for language model (navigation chatbot)

GPT is a transformer decoder. This means that it only understands the context of each word from the previous words behind it. This is useful for language model tasks. Language model is the process of predicting the next word, given some amount of word that appear before.

CHOICE OF DATASET

Travel Guide Filtering Problem (Text Classification)

Our initial thought was to use TripAdvisor's data. This made sense at the beginning since TripAdvisor provides a lot of varying lengths of text that can be used to fine-tune the model. However, we soon encountered a problem, which is that TripAdvisor doesn't provide an API for us to get the data from. Hence, we decided to scrap data manually from the internet. We collected 3 types of data classes: valid, invalid - harassment & invalid - others.

Navigation Chatbot Problem (Language Model)

For this task, we plan to use TalktheWalk dataset. This dataset is a collection of data that can be used to train a language model to predict what should be said next based on the following input:

- user's current position in the map

- what the user said to the AI

 

We expect that this dataset is enough to train the AI to be able to help navigate the user verbally.