As financial institutions experiment with large language models, OpenAI and its ChatGPT product have become the most recognizable names in the artificial intelligence space. Yet alternatives exist.
So the biggest question many banks, research firms, academics and others are racing to answer is this: How does a company know which language model is right for it?
One of OpenAI's main competitors is Anthropic, which recently landed
In other words, to take advantage of OpenAI's model GPT-4 (aka ChatGPT), Cohere's Command or Anthropic's Claude, banks must typically trust these companies with their data — and some have. For example, Morgan Stanley has provided GPT-4 with access to its content library of "hundreds of thousands of pages" of information to make it easier for employees to tap the company's collective knowledge,
Similarly, Anthropic and Cohere each allow companies to tweak their language models through their own APIs or through Amazon Web Services. Specifically, Claude and Command are available through
Bedrock also offers access to language models built by other companies, including Meta, AI21 and Amazon itself. These models are sometimes called foundation models, a term that refers to the ability to build on the models as a foundation for customization purposes.
Through its commitment of up to $4 billion in Anthropic, the e-commerce giant can bring advanced analysis, shopping and checkout tech closer to the point of sale — and farther away from traditional transaction processors.
One downside to modifying proprietary models for a specific application is the need to relinquish some control of the data. The companies have developed some protections; for example, Amazon
Still, financial institutions have options for building with language models without sharing their data with an AI company or racking up cloud-computing bills. Many companies offer free and open-source models that developers can download and use on their own computers.
A leader in open-source language models is Cerebras, which in March
One common way to compare models is by the parameters — akin to neural connections in a human brain — each has. Generally, the more parameters a model has,
OpenAI released GPT-3 in June 2020 with 175 billion parameters. GPT-2, which came out in 2019, had 1.5 billion parameters and its predecessor GPT-1 had 0.12 billion parameters. The company has not disclosed how many parameters GPT-4 has; the historical trend suggests it has over a trillion parameters.
Elsewhere, the French-American company Hugging Face this month released the open-source Falcon 180B, which it
Indeed, other free models are not nearly as large as Falcon. The largest version of Meta's free model Llama 2 has 70 billion parameters. The largest model Cerebras released this year has 13 billion parameters.
Parameters only provide a rough estimate of how well a model might perform compared with another. Models can have advantages and disadvantages depending on the task at hand and the context in which the task is performed. This has inspired Stanford University's Center for Research on Foundation Models to benchmark the most prominent language models in what it calls the Holistic Evaluation of Language Models (HELM).
HELM plans to evaluate each model in additional scenarios such as fact verification and copywriting, and it has plans to add models such as Databricks' Dolly 2 and Google's PaLM to its evaluations.
While these tests developed and run by academics provide a basis on which to compare models, many companies seek to have the evaluations tailored to their particular purposes. To provide that service, Arthur AI, a New York-based AI performance company, offers an open-source product called
Ultimately, evaluating a language model's performance in any application is an ongoing effort that is bolstered by greater transparency from the creators of language models, according to the researchers behind Stanford's HELM, who advocate for "holistic, pluralistic and democratic benchmarks" for language models.
"Transparency begets trust and standards," said three HELM authors in an update on the project. "By taking a step towards transparency, we aim to transform foundation models from an immature emerging technology to a reliable infrastructure that embodies human values."