Conversational AI has taken a turn for the better in the last eight months, and it all started with ChatGPT. OpenAI’s very own widely deployed large language model, built upon their battle-tested GPT 3.5, showed even the layman the power of AI and the sheer potential of such models in fields ranging far and wide, all the way from creative tasks (writing content, paraphrasing, captions and much more) to technical tasks (debugging and annotating code, translate from one coding language to others, performing boilerplate tasks, and much more).
However, this sudden freedom to generate content was short-lived, with various allegations regarding the societal and ethical aspects of having such large language models. Microsoft’s massive investment into OpenAI, ahead of ChatGPT launch, didn’t soften the blow, as a race began among the Fortune Companies. Limitations like acute monitoring, filtering, and moderation were introduced to conserve the public image of such organizations.
In this blog, we will explore the various shortcomings of Large Language Models by Fortune companies, especially ChatGPT, and how and how the average Joe can use all alternatives to rejoice in Research and Freedom in AI.
ChatGPT is a Large Language Model trained using Reinforcement Learning from Human Feedback on GPT3.5 and later GPT4 by OpenAI. The model stands apart from the competition due to the conversational setting it has been deployed, giving it the option and ability to answer follow-up questions, rectify mistakes, and produce reworks on the prompts on the go.
However, due to various reasons like the data it was trained upon, the moderations put by OpenAI, and the general ability of such an LLM in a real-world situation, the model can be annoying to a professional or a student using this model, something which surface level users may overlook. Let us take a look at some of such limitations:
1. Common sense is rare: Trained solely on the data collected by OpenAI until 2021, the model has some dire lapses in common sense regarding concepts, words, and happenings in the last two years. Such lapses often lead to these “human-like” conversations losing meaning fast and becoming irrelevant to the user.
2. Honey, the internet needs to be fixed: ChatGPT needs access to the internet and is replying to the prompts on its outdated data. This drawback is why such an expensively trained model, over the years, still can’t answer small questions like the exchange rate between INR to USD. From a content perspective, it limits the use cases like paraphrasing, synopsis generation, and much more tasks that rely on importing data from a link or pdf online.
3. Yeah, I filled the book cabinet with coffee-table books: This limitation ties into the above mentioned lapse in the model regarding internet access. However, the extensive knowledge reservoir ChatGPT has been trained upon; you will eventually note that the model cannot provide you a solution for prompts that are out of its information tank.
4. Bazinga, wait, I didn’t get it: An issue faced by marketers, especially the ones working with GenZ demographics, is that ChatGPT just doesn’t get humor and sarcasm. The inability to identify the tone of the prompts and counter prompts often leads to the model acting weird and producing wrong answers.
5. Hey, why are you getting offended: In the early February days, when the hype for the bot had died down, hardcore testers for such chatbots came out with different examples of the bot acting offended and even went as far as to insult and gaslight its users when faced with criticism regarding the solutions it had been producing.
Now we understand that such shortcomings will always be there to some extent and will be worked on gradually. However, are you, the user, ready for this wait? How about you not wait for OpenAI as an organization to adapt to the requirements and look for alternatives?
Given the current limitations that ChatGPT poses, be it through technical or organizational bars, let us look at some of the aspects you need to consider while choosing the best alternatives that fit your use cases.
1. Accuracy: ChatGPT comes packed with years of trained weights that have been trained on TBs and TBs of data, which means that the alternative you go for needs to be trained on data almost as big as that even to reach that level of accuracy. (pst. Heard of LLaMA🦙)
2. Creativity: At its core, what got everyone reeled and hooked on ChatGPT, was the bot’s ability to act creatively and create texts and content by taking a role and writing in different styles, having a wide range of genres. Your alternative should be able to mimic that level of creativity, barring the various moderations and walls that ChatGPT comes with.
3. Privacy: The Chatbot or the LLM needs to be transparent about how they use your data because there is a high chance you are using it to help write your thesis. Once such trust is established, or even in the case of no such provision for data sharing, the model should be simple enough to be deployed on private and closed servers.
4. Ease of Use: Having LLMs in a conversational setting is the USP for most deployed language models, giving the user the provision to have the ability to ask the bot to take up the role of a professional in the field and answer accordingly.
5. Cost: The cost of deploying such a model is something that needs to be taken into account before using or even hosting your LLM. The price of such a model includes various aspects such as procuring the data, training on custom data, and deploying such a model through LangChain or GPT Codex.
Now that we know the various metrics where you can judge your ChatGPT Alternatives let us see some models you can go for.
Ahead of its launch, LLaMA, Meta’s take on large language models, the model was leaked on 4Chan as a downloadable torrent on March 3, 2023. Well, now, Large Language Models were not just a magic trick for the average user, with the trained weights ranging over 65 Billion Parameters launched and publicly available, it started a new era of open-source and readily available trainable Large Langauge Models which not only gave the average OSS developer a chance to study them but other enterprises around the globe experiment and deploy them for “Research-Purposes only.”
Let us see how this particular leak shaped the entire face of Research-Purpose only Large Language Models, starting from fine-tuned versions of the model trained on data it hasn’t seen before and, wait, a brand new version of the model by Meta itself.
Do note if you do not want to go through the trouble of deploying these models yourself, how about trying them out on chat.nbox.ai now?
LLaMA 2 is Meta AI’s freely available model for research and commercial use (one of the key factors making it different from OpenAI’s GPT Series). Ahead of the success LLaMA got due to the sheer resources and channels available to fine-tune the model to custom use cases, over 100,000 requests to access the weights, and countless new variants in the market, Meta has decided to capitalize on the revolution by teaming up with Microsoft.
With the initial strategy of having different variants of LLaMA from the get-go, them being LLaMA 7B, LLaMA 33B, and LLaMA 65B, Meta AI has always been putting a preface on smaller models trained on more tokens, which has been a defining feature making even locally deployed models to have performance which will be somewhat comparable to the ones generated by OpenAI APIs.
Now where does this already strong foundation with LLaMA bring about for LLaMA 2? LLaMA 2 has received over 40% more data than its predecessor to train upon and can process double the context. The model comes in 3 public and private releases, LLaMA 2 7B, LLaMA 2 13B and LLaMA 2 70B being the public ones, and LLaMA 2 34B being the still closed source. LLaMA 2 has trained more than 1 million human annotations.
Limitations: The lower number of parameters does bring in shorter training and deploying times, but it still is outshined by the more extended prompt and context GPT4 can present.
WizardVicuna 30B is a model with the collaboration of the two biggest open-sourced LLaMA-based LLMs. WizardLM is a novel method that uses Evol-Instruct, an algorithm that automatically generates open-domain instructions of various difficulty levels and skill ranges. VicunaLM is a 13-billion parameter model that is the best free chatbot according to GPT-4.
Ehartford has fine-tuned the model, and the 4bit version is available for download at HuggingFace. This model variant is rather interesting, given the model comes with no guardrails and can be fine-tuned according to one’s needs, with provisions like adding aspects like Reinforcement Learning from Human Feedback to the model, much like the original ChatGPT release.
However, the model in focus is the release built upon the variant from Ehartford by theBloke on HuggingFace, merging the model with KaioKen’s SuperHot 13B making the model capable of ingesting more context for better understanding. The model also comes with LoRa and has been trained with over ~1200 samples (~400 samples over 2048 sequence length) to reinforce the training. Low-Rank Adaptation of Large Language Models (LoRA) is a training method that accelerates the training of large models while consuming less memory.
Limitations: However, the model comes with a context of 8k, there still exists a GPT4 variant capable of processing 32k (not many people have access to it).
A straightforward Alpaca Prompt format large language model NouseHermes can take inputs ranging from instructions, questions, contexts, or any other specific details that outline the desired output. The resulting work can take the form of a single sentence, a paragraph, a captivating story, a code snippet, a formula, or any other conceivable expression within the realm of natural language.
The cutting-edge language model (LM) has undergone thorough refinement on a diverse dataset containing more than 300,000 instructions, encompassing a broad spectrum of subjects and activities. Created by Nous Research, a distinguished AI research organization known for its groundbreaking solutions in multiple sectors and fields, the fine-tuning process of this model was led by Teknium and Karan4D.
Breaking down the name of the specific variant of the model “SuperHOT 8k” refers to the beefed-up version made public by TheBloke on HuggingFace by merging the original model by NousResearch with KaioKen’s SuperHot 13B , which makes it able to process 8000 tokens instead of the legacy 2000 tokens that were possible on the base variant.
Limitations: Generating accurate and coherent code or text becomes challenging for the model presented with input involving a lengthy sequence of operations or steps.
In conclusion, the rise of big players in the tech industry and their dominance over large language models (LLMs) has led to a significant moderation and limitation of the outputs provided to users. However, the advent of open-source alternatives for chatbot applications has opened up new possibilities and empowered developers and users alike. This blog has explored various open-source options that can be used as alternatives to traditional LLMs, offering greater control and customization options.
By embracing these alternatives, we can foster innovation, transparency, and accessibility in the chatbot landscape, ensuring that users have more diverse and meaningful experiences. These open-source solutions provide a pathway to a more democratized and inclusive future, where the limitations imposed by centralized models are overcome, and the power to shape AI-driven conversations is distributed more widely.