There comes a time when we make the decision to equip ourselves with new technology. Once we have made the decision, understood how things work, we begin to look at the market and choose the proper AI model. But it's not always as easy as it seems.
If we talk about PCs, having understood that we want a notebook for example, we start by looking at all the notebooks of certain brands, in our price range, with the performance, capacity and aesthetic characteristics we are looking for. We spend some time understanding and comparing and after a few hours... we buy.
Does this also work for Generative AI?
I'm afraid not. The situation is longer and more complex.
Let's start from an assumption: we are in a market phase in which Generative AI is booming but we are still far away from the "Productivity Plateau", that is, that phase in which a technology can be said to be calmly consolidated and we enjoy its benefits.
What does it mean? That for this long period (and probably even after) we will have to expect a lot of instability, twists - positive and negative - and disillusionments before making a choice on which to consolidate our corporate business models.
As I tend to repeat very often, there is evidence of the fact that "we are working with the worst version of AI available" or that we are still far from having stable, tested products with predictable outcomes .
But just as often I repeat that this must not be an excuse to postpone the choice, to let others do it and see what happens: we must get our hands dirty today to create a mindset ready for this phase of consolidation.
We understood that that we must have new approaches, that what they can help us do Generative (and non-Generative) AI and in what form do they present themselves to us.
But now we will have to figure out which AI model to base our project on. And here the choice is not necessarily simple.
What is an AI Model?
Trying to give a further definition (waiting for some dictionary to decide to do so):
An artificial intelligence (AI) model is a computer file that functions similarly to a virtual brain. It has been trained to perform specific tasks based on the data it has been trained on and includes both data and the algorithms that determine how it works.
Remember the old cartoons with computers where you ate tons of books, digested them and were able to answer any question?
Here, imagine that an AI model is like having one of these objects available on your desk, someone has already loaded (almost) all the books that exist and made them 'digest' them with a training process. He will do nothing but accept your questions (in the form of texts, images, sounds) and produce answers based on what and how he has digested.
Be careful, we're not talking about a PC! That, with its software, is a different object: it is used to doing exactly what you ask of it: it processes your inputs according to predictable (deterministic) rules and formulas. If you do a sum or an average in Excel, with the same input you will always have the same output.
With AI, especially generative AI, we are talking about something non-deterministic (stochastic) because it thinks, more or less, based on probability (here you will find a post that explains how GPT works).
Therefore, if you ask the same question, you will have similar but always different answers.
(I always apologize to the more technical people for my simplifications).
And how does a model respond?
You have to train him. With a lot of data consistent with the type of answers we want to obtain.
And how is it trained?
Based on the education she received: that is, the series of questions and answers already written that gave her an idea of the world to represent.
It analyzes the data it has been trained on: if you want it to tell you about Shakespeare or recognize faces you have to provide it with its books or a set of images of human faces.
Modelling his sense of precision. That is, it is designed to be more or less precise depending on the purpose for which it was created.
Allowing it to adapt: a model created with all the literature in the world might be able to tell you about love or war without referring to a particular text because it has "distilled" the concepts.
On this last point, allow me a brief digression. The greater the number of parameters provided to a model, the greater its capabilities. This video (for which I lost the sources, I apologize to the author) explains it very well! The more parameters the more innate skills he gain.
Training a model can be a very time-consuming, complex and expensive task. We'll talk more about it in the next episodes.
OK, let's choose a template
Taking Hugging Face as our catalogue, the open source community that develops tools to create, manage and train AI models: as of today, October 2023, there are approximately… 365,000 (Threehundredsixtyfivethousand) models in total!
I'll give you some time to reread and rethink the number above.
Huggingface has convenient configurator that allows you to choose some features of the model you're looking for.
The selection parameters are quite varied:
The purpose for which you are choosing them (Multimodal, Computer Vision, Text analysis and generation, Audio)
The libraries on which they are based (from PyTorch to Tensor Flow, but there are about fifty of them)
The reference datasets (i.e. the 'labelled' data that are provided to them to learn, a few hundred datasets serve to define the most popular ones.)
The languages in which they are trained (English, Italian, Chinese etc... are more or less all there)
Usage licenses (Commercial, non-commercial, and another fifty options with various constraints)
Other parameters (interesting is the one on the indication of CO2 emissions used for training)
Miscounted are a few hundred million different options!
But, to make an informed choice, we must ask ourselves a few questions more:
Is it fit for purpose?Translating languages, synthesizing or generating texts, carrying out sentiment analysis, generating or interpreting documents or images are very different activities. It is important to select a model based on the specific needs of the task at hand.
What training and fine tuning was done?Models are usually pre-trained on a large corpus of data and may be available pre-tuned for specific tasks. The extent of fine-tuning needed may depend on how well the pre-trained model already performs the desired task. It is important in this case to understand whether the training was done via Human Reinforcement Feedback (RHLF).
What data is it based on?The larger and more diverse the training dataset, and the more representative the dataset is of the languages and context that the model will handle, the more versatility and less risk we will have. Risks? Yes: reproduction of Bias, cultural inaccuracy, failure to understand the local context, incorrect decisions, loss of trust, legal compliance and reputation, difficulties in personalization, rectification costs, exclusion of relevant information in the output, lack of transparency, ethical risks.Is that enough for you?
What is the performance and response latency?The models can be hosted in different data centers and be more or less slow to respond also depending on their intrinsic complexity. How important is it to have an answer within 2 seconds or within 20? How many answers per second do we want to ask him (obviously in an automation context)
How much will it cost to implement and maintain it?The appearance of the concept of "Taking an open-source model" and implementing it independently hides hundreds of pitfalls in terms of hidden costs: Analysis of the necessary computational resources - the result of the complexity of the model -, Finding technicians trained for this task, Selection of a suitable hosting provider, Punctual evaluation of execution costs, Absence of surprises during construction, Scalability in case of growth are just a few.
How are we in legal compliance?This topic alone deserves a separate post which will arrive soon with a lawyer who is an expert on the subject. The issues are of legal responsibility regarding, for example, the copyright of the data used, the lawfulness of their use, the regulations in terms of data protection. I'd say that's enough for a nice headache
Who can get their hands on it? How much is documented?Finding suitable technicians, as mentioned above, could be your worst nightmare, even in terms of costs. Using more documented models offers some more hope of getting to the end of the project
Will I have to train him further?If your company specializes in persuasion tautology in low-hydrogen environments, you will probably need to add documentation to the training already done with Fine Tuning processes. Which are expensive and increase operating costs.
Can I integrate it with my information system?Assuming you have open systems and data available, the interoperability of the model with your systems must be evaluated. Unless you plan to wear out the CTRL + C + V keys on all your keyboards.
Is it safe?That is, if I give him all the company knowledge, how can I be sure that it won't be used by bad guys? Is the sensitive data I will provide him protected? Will I be able to sleep at night?
I'll stop here, there are others, but at least these are issues to put on the table, under penalty of failure to adhere to the actual final result for the use-case you have chosen and obtaining unpredictable results.
And, I repeat myself again, the fact that we are currently using the worst possible version of AI implies other reasoning about the probable future life of each model. Investing in a branch that will then dry out generates a lot of headaches.
Is everything clear?
Easy?
No?
Let's make order
Let's calm down and take a deep breath.
Choices with a complexity of magnitude like the one above must be made when a model becomes part of a larger project and an extremely complex cherry picking process is necessary.
But if your company does not deal with technology and has never entered the world of artificial intelligence, the choice parameters are decidedly fewer.
The concept of Foundation Models comes to our aid.
What are Foundation Models?
In the first article of the series I only mentioned the Foundation Models, that is, the largest and most important models, those capable of "doing the greatest number of things" (even regardless of their size).
In practice, having a model capable of cooking it will be easier to use it to create a new recipe, working with a model trained on one hundred languages it will be easier to teach him the one hundred and first, playing with a model trained on musical bases it will be more effective to ask him to produce a new melody. Then, if a model knows how to do all these things, we will be able to ask him for our recipe sung in the Sardinian language and played against the best musical background, possibly referring to some cinematographic work from the 60s.
FMs are therefore models conceived and designed to provide a vast and generic variety of results, applicable to very different tasks, such as the production of text, audio, images.
They are general models, referring to the breadth of tasks, the range of applications and the types of output they can produce.
Some examples of Foundation Models with the most important applications that use them:
GPT-4 → API → Chat GPT Plus, Khanmigo, Duolingo Max, Bing Chat, …
GPT-3.5 → API → GPT Chat
PaLM 2 → API → Google BARD
Claude → API → Anthropic Claude
Dall-E 3 → API → Chat GPT Plus
LLaMA 2 → API → Meta AI
Mistral → API → Mistral AI
and someone else that I omit for brevity and not for merit.
Note the constant presence of APIs (Application program interfaces) that often add software functionality to LLMs. For example, GPT-4 is not made from a single model but from a MOE (Mixture Of Experts), i.e. about ten different models. Among other things, the API automatically decides which expert model to turn to based on your prompts. Or they take care of integrating additional features like Plugins.
Are there still too many?
Fortunately, these models can also be further selected.
In fact, there are SOTA FMs (or State Of The Art Foundation Models), that is, those who, in addition to doing more things, work better than the others.
In this list, if we consider that we want to work with a multimodal system (that is, capable of processing text, images, sounds) we have a rather simple one-item-list, in my opinion:
GPT 4.0 by Open AI.
because today it is able to support any type of input for any type of output (excluding videos for now) and is able to provide you with a level of flexibility that is not yet present in other models through a single point of access.
I think Claude is doing a great job (But he has a lot of catching up to do with data protection compliancy and API openness) and the possible applications with Meta's LLaMA are emerging in an interesting way; it seems that Google's Gemini will soon outclass everyone. But today, the only one I would recommend starting and practicing with is Open AI's GPT-4.
Then over time your needs will become clearer and the other models will mature, become productive and stable, and you can decide to use them.
And how much does it cost?
The cost at this moment is one of the most critical points to understand.
Let's assume the following scenarios:
In the first, MAKE, we hypothesize how much it costs to develop your model from scratch
In the second, USE, let's assume that you adopt an Open Source model (e.g. Llama 2 70b) and put it on your own Housing/cloud solution.
In the third, BUY, you only worry about interfacing your APIs with those of the model provider.
The calculations, in general, exclude all the costs of creating your software project and the cleaning of input data if necessary.
The Make scenarios exclude the costs of purchase, implementation, maintenance of the housing/cloud platform, the internal or external staff necessary to implement it and various extra costs as desired.
MAKE - New Model
Training a model from scratch is more or less expensive depending on how much data, tokens, have been used. For example, to train GPT 3.5, 1,204 GPUs were used for 34 days of processing with a pure computation cost of 4.6 million dollars! But we are talking about figures that calmly fluctuate between 50 and 100 million, with even higher undeclared peaks, for larger models.
Training smaller models (7 Billion parameters), like LLaMa costs instead only around 250,000 USD.
Choosing to start with a proprietary model, based on your own data, trained and configured as we (or our company) might expect, has a decidedly challenging upfront. It will then be necessary to understand how long the model will be usable before having to start from scratch to develop a new one, perhaps with a new technology.
To which you must add the costs excluded above.
USE - Existing model
In this case, what we saw above applies to the models available on HuggingFace that you can use on their platform or decide to port them to your own infrastructure.
I won't stop here for long because the costs that I don't consider in this analysis are exactly those excluded in the introduction to this chapter.
Implementing an existing model, perhaps open-source and free, has hidden costs that should be analyzed and depend a lot on the performance you want to achieve, the amount of data you think will be necessary, and any subsequent fine-tuning processes.
BUY
The only "easy" thing today are the monthly fee prices for chatbot-style services. With around 20 Euros/month you will have Chat GPT Plus available which includes all the advanced Open AI services.
Or between 10 and 120 Euros you can subscribe to MidJourney to produce the images you like best they like them.
But these solutions will only provide you with a small portion of the capabilities otherwise available using APIs.
Things start to get complicated when you decide to integrate a model into your business process through APIs. Maybe after doing some fine-tuning.
In images it is quite simple: depending on the desired final resolution you have a cost per image.
Regarding the text, it should be considered that the "traffic" of questions and answers has different costs (Input cost of the Prompt and Output cost of the answer).
Predicting traffic in tokens starting from the number of characters you plan to use is not difficult: divide by 3 (to be pessimistic; in English we are at around 2.5 while in Italian around 2.9) the number of characters per obtain the tokens on which the pricing is based (the number varies greatly depending on the languages you expect to be used, if you want to be precise here you can find Tokenizer, the tool made available by Open AI precisely to do this calculation).
However, predicting the number of characters involved in traffic is not necessarily simple: it depends on the type of interactions you plan to make and the quantity of them.
For example:
suppose we want to design an automation that receives PDFs of Curriculum Vitae accompanied by a cover letter.
And that the prompt including the CV text is approximately 10,000 characters = approximately 3,000 tokens with an equal amount of responses for simplicity.
Assuming we process 100 CVs per day (I know that's a lot 🙂) on which you get feedback from interactions we have 36,500 interactions per year.
With the prices of GPT-3.5 Turbo 16k, shown and explained below, we are talking about $766.00/year. Not bad right?
The same thing done with GPT-4 32k, costs $19,710.00 instead. 25 times as much (but still much less than the team of analysts who should process them!)
Here you will find the calculation done on docsbot.ai, which offers an interesting tool for calculating the presumed costs of the different models . I would say you need to be careful!
For simplicity - Tokens are an impractical variable today - I therefore made a simple cost calculation per GB (Always with Input = Output) and in the graph below you can see how it is very important to choose the right model based on all the characteristics you want from your model with a unit of measurement that is most familiar to you.
The conclusions are yours.
So?
So the choice of an AI Model, if you want to consider all the options, is not at all simple.
As reiterated in previous posts, the most 'correct' approach is to proceed in short steps, developing prototypes and maintaining expectations of small initial results.
The best way to do this today is to use a single Foundation Model capable of doing almost everything, integrating it with APIs and paying for its use to be able, over time, to evaluate alternatives.
Why?
The need for very complex and expensive selections is drastically reduced.
The risks deriving from the use of the wrong model are mitigated (in terms of performance, ethical biases, future availability of the model).
There is someone who will continue to work for you in the new releases (There are already projects that are no longer supported).
There is no need to fine-tune with additional data.
There is greater versatility with the use of models already available to carry out different tasks.
They offer greater interoperability with your applications, and will increasingly do so.
It is more interesting to understand the greater number of possibilities offered.
All of this while remaining "Model Agnostic", i.e. designing solutions that, while exploiting the most famous models, can be adapted over time, even for just a few use cases, to models that will emerge over time and prove to be better for everyone the fronts.
As always, I invite you to reflect, comment, and share ideas by sharing this post with people you think might be of interest.
To stay updated on my contents:
Read Glimpse, my novel about AI.
Or contact me here .
See you next time!
Massimiliano Turazzini
Comments