We all humans have a constraint that we have to describe things, label them. Often with unintended consequences of degrading its true character or beauty. At the same time, it is unavoidable to escape it. Otherwise it would be rather difficult to ponder over stuff, discuss them with others, and build on top of them. And to label something, we have to stick knowns over an unknown. This makes ‘understanding via analogy’, a powerful tool.
Our topic of interest today is LLMs, the new exciting and complex piece of technology, which is generating a new use-case every day. To understand it better, lets use the tool of analogies. I will describe LLMs through the characteristics it adopts, the characteristics of five existing product and tech categories.
Characteristic of LLMs as a
LLMs as Search engine
Internet is one grand network where huge amount of content and knowledge gets created every moment. Search engines like Google have been a great multiplier of knowledge distribution, with somebody publishing a blog in Venezuela can be read by someone in Bangladesh. A Japanese coder facing issue in writing regex code, can look up a solution written by a Serbian coder.
A LLM like chatGPT takes that to a step further. Since its understanding of the world’s online knowledge is deeper than google, it is able to precisely serve the answers to input queries. Google suggests you web links that are likely to contain the answer, but chatGPT just gives you the answer.
Off-course it won’t be a complete replacement of search engine, as there will be major areas where going to web-pages will be more relevant. Especially for someone who is doing a thorough search about a topic. Bur for large number of use-cases, search engines would just feel very inefficient.
LLMs as media generators
Under the umbrella term of Generative AI, users are using AI to generate multi-format media content, whether its text, audio, images and movie clips. LLMs are being specifically used for text format content like essays, blogs, social media posts and marketing emails.
Over time, there will be increasingly complex tools combining these format specific AIs to generate composite media like presentations, movies, songs and ad campaigns.
LLMs as a developer platform
First version of chatGPT was restrictive in a sense that for answering queries, it could only use the neural-net it was trained upon. Which is very useful for general search queries or writing essays, but if we want to use chatGPT for specialised tasks like solving mathematical equations, the responses were not reliable. The reason being, LLMs like chatGPT by their very nature, have following 2 restrictions -
a) They give answers to a query from a text completion probability standpoint, but do not have explicit theoretical models in place, to apply on inputs and deduce answers
b) They are only trained on publicly available text content over web, which is vast in itself but not sufficient to answer all sorts of domain specific queries. Much of this domain knowledge resides under specific databases of scientific papers, medical research, travel data etc.
This problem was solved by release of plugins, where chatGPT can use third party tools, and hence have access to third party APIs & databases. So for a query regarding a calculation, instead of neural-net under the hood, it can use a calculator, for query asking meaning of a word, it can use a dictionary, and for checking flight timings, an airline website. This becomes powerful in the same sense that an operating system like Windows became powerful. Windows only had to provide few core applications (like docs, excel, image) to kick-start, and remaining functionalities came through third party applications built over Windows platform.
LLMs as an API
One noteworthy strategy that OpenAI has adopted is to make the GPT capabilities accessible via APIs. So any app can fine-tune the chatGPT for their specific needs, and plug it in inside its user interface. Like an airline ticketing website having a GPT-powered support chatbot answering all queries regarding a booking refund. Or a dev support chatbot of a payment gateway helping with all code integration queries.
And as we just discussed above, being a developer platform, chatGPT itself can use third party APIs, so this means there is a bi-directionality in the nature of chatGPT interacting with existing Apps.
This bi-directionality is nothing new. Google in its early days was powering intra-website search for some of the most popular websites in the world, in addition to being a search engine itself. Somehow, intra-website search never became the preferred way of discovering content, and Google went completely focused on powering the search directly.
So it will be interesting to see how this plays out in the LLM space. Will the users primarily use LLMs via in-app chatbots, or will they directly use LLMs to take app actions.
LLMs as a task assistant
On Twitter, some independent builders achieved an impressive feat, where they stitched together multiple chatGPT instances, where they gave one GPT the role of task manager, and other GPTs the role of task executors. The manager GPT ensured that task executor GPTs complete their tasks as per the original prompt. Result is a GPT Agent, which can run for hours in background till it achieves its goals. This is early version of a much more sophisticated system to come, where an AI task assistant will keep running in background and will be able to achieve more complex tasks than just simply answering queries.
And since LLMs have characteristic of a developer platform, the LLM powered assistants can execute tasks on third party apps via user prompts. So in near future, many tasks like creating an event in calendar, sending acknowledgement emails, creating goal lists, will be accomplished by these assistants. Interestingly, this happens to be the original vision behind voice assistants like Siri/Alexa.
As time proceeds, we can expect even more sophisticated assistants like a personal-finance-assistant that reports irregular expenditures, pays monthly bills, and finds new investment opportunities. Or a social media assistant, that suggests the content (text/videos/images), and posts regularly on our behalf.
What is really remarkable in the end that it is just a neural net of few GBs in size. One that can be downloaded on a single iPhone. This small size neural-net can have such profound properties is mind-blowing. So far we have been living in the world of specialised software systems, and this our first foray in the new world of softwares resembling general intelligence.