Earlier this year I decided that I wanted to create a personal chatbot. Lately, I had started spending a lot of my time on my computer because of remote school and wanted a chatbot to make things run smoother. Can you think of anything cooler than having a digital assistant? So, I made a very basic chatbot. I named her Nova(since you can’t have a chatbot without a name :).
While Nova could do some handy things, she wasn’t very useful, mainly for the reason that she didn’t use any machine learning, which meant that whenever I wanted her to execute a command, I needed to enter the exact word which was hard-encoded into the program, or else Nova would have no idea what I was talking about.
This was pretty frustrating and eventually prompted me to dive into the field of Natural Language Processing(NLP), a sub-field of AI dealing with data in the form of strings as opposed to more commonly used numerical data, so that I could significantly improve Nova using deep learning. The result was a far more intelligent, frankly more capable chatbot for my own personal use.
First, I’ll explain how I got the chatbot to understand what I am saying, even if it’s not hard-encoded into the chatbot, and next, I’ll talk about some of the cool features I implemented in the brand new version.
Another Type of Chatbot
The aim of the new version of Nova was that I would be able to give it an input, one it had never seen before, and it would be able to figure out what I’m saying and return a pre-programmed response.
This new type of chatbot is more complex and intelligent than the previous version we were using(which was really just a few if/else statements), but simpler than a generative/conversational chatbot with which one can have a conversation.
I decided to use a Bag-of-Words(BoW) chatbot since, with Nova, the emphasis is on functionality, and I didn’t need her to be able to converse with me. Let’s get into how this chatbot actually works.
The Bag-Of-Words Algorithm
Here’s a quick breakdown of how the chatbot works:
- Chatbot receives input from the user which it most likely hasn’t seen before.
- Chatbot sorts the input into the category it most likely belongs in.
- Chatbot returns a pre-programmed response based on the category the input ended up being sorted into. If applicable, the chatbot will also execute the function of the given category
As is the case with most ML projects, we will be needing data, and we’ll store this data in a JSON file.
As an example, let’s use just two categories: “greeting” and “games. Each of these categories will have three “patterns” which are examples of what a user might say in order to trigger this response(reminder: these are just patterns, and almost never have the exact phrase/word a user will actually use). In addition, each of these categories will have three “responses”. The reason we have more than one response is simply to create variation since the chatbot will randomly return one of the three responses. Here’s what the JSON file will look like:
We will be training our machine learning model on the “patterns” under each category so that it can sort a brand new phrase into one of these categories, but before we do that we need to pre-process our training data since you can’t train a machine learning model on string data, you need to convert it to numerical data first. Doing so is a bit of a process, but it’s not too tricky. Using the BoW method allows us to represent this data as arrays as opposed to strings.
The very first thing we’ll do is convert all of the letters to lowercase, and remove punctuation since in this case both are irrelevant and we want our data to be as uniform as possible so that the model only picks up on relevant differences. Next, we will do something called sentence tokenization, which is just when we take a phrase and split it up to make a list of words. The result will look something like this:
Now we’ll do something called stemming, which is when we take a word, for example, if we take the word “going”, and break it down to find what, to the computer, is its root word. This means that “going” will be converted to “go”. Note that this isn’t applicable for every word, since some words are already in their stemmed form, such as the word “hello” which remains unchanged.
When coding this process, the NLTK(Natural Language ToolKit) library helps pre-process our data. NLTK is a library that helps us to work with and process string data. This is a great resource if you’re looking to learn more about NLTK.
Then, to represent individual phrases in an array, we use a process called one-hot-encoding. Here we compile a list of all of the words in our entire training data set(across our two categories, no duplicates) and make a new array with a 0 corresponding for each word. In order to represent individual phrases, we then take the phrase and go through the array of zeroes if the word is present in the phrase we are trying to represent, we replace the 0 with a 1 in that position, thereby making a new array. This array is what we are figuratively calling a “bag of words”. This is what it looks like when we one-hot-encode all of our phrases:
So, once we’ve converted our training data into an acceptable format, we simply train the machine learning model using this data. In order to do so, we can use PyTorch, a popular machine learning library.
I touch upon neural networks, and training them using PyTorch, in this article I wrote on identifying malaria in cells using computer vision so you can check out that article for more information:
Using a Convolutional Neural Network(CNN) to Diagnose Malaria
I’m sure that you’ve probably heard of computer vision, a pretty self-explanatory term: giving the computer the ability…
So, once the chatbot knows the tag of the input we’ve given it, it can either randomly choose one of the responses under the “responses” key in the JSON file, or we can have the chatbot run some other function for us, for example, playing rock-paper-scissors.
Features of my chatbot!
Once I made the new version of the chatbot, it was time to add some cool new features! I made a new module for each feature in order to keep things organized. Then, when the chatbot sorted an input into a certain category, the chatbot would run the function based on the category it was sorted into. Here’s what that looks like:
Here is a list of all of Nova’s features:
- General knowledge search queries using the Wikipedia API
- Writing down important information to a file so that I can access it later(helpful if there’s something I need to remember in the future)
- Starting my day by opening all of the links that I need throughout the day
- Messaging contacts
- Opening links for classes
- Playing small games like rock-paper-scissors and battleship :)
Here’s a demonstration if you want to see all of these features at work!
Next Steps: Things I Might Implement in The Future
There are two main things I’m considering
- Multi-threading, so that I can run background processes while still being able to use Nova. For example, having her text me notifications at certain times throughout the day as reminders. I would enable this feature through Nova and have her do this while simultaneously interacting with me.
- A user interface, so that I don’t have to chat with her from my terminal(lol).
Sources & Helpful References
Chatbot course: https://www.python-engineer.com/posts/chatbot-pytorch/
NLTK course: https://www.youtube.com/watch?v=FLZvO...
Pytorch course: https://www.youtube.com/watch?v=BzcBs...
Using the Wikipedia API: https://towardsdatascience.com/wikipe...