Foundation Models are a game change and a disruptor for many industries. Especially since ChatGPT has been released, people realize a new era of AI has begun. In this blog I share my experience learning Foundation Models with a focus on capabilities that IBM provides for enterprise clints and partners.
Below is a list of blog posts which I’ll update with new posts so that this page is the entry point in everything I have done and will do in the context of
Since I’m excited by the incredible capabilities which technologies like ChatGPT and Bard provide, I’m trying to understand better how they work. This post summarizes my current understanding about foundation models, transformers, BERT and GPT.
At IBM Think 2023 several exciting new Foundation Model capabilities have been announced. This post explains some of my highlights.
The speed in which AI technologies are progressing is amazing. The post below summarizes how I explain my children how AI has evolved over the last years and why many experts consider the new foundation model technique as revolutionary.
Sriram Raghavan, Vice President of IBM Research AI, talks in the video below about how IBM uses Foundation Models and shares his thoughts and ideas how foundation models will be operationalized.
As most of my readers will know there is a huge hype around ChatGPT, generative AI, large language models (LLMs) also known as foundation models and further related AI technologies. I’m trying to understand these technologies better and have put together below a couple of key concepts I’ve found out so far.
Training foundation models and even fine-tuning models for custom domains is expensive and requires lots of resources. To avoid changing the pretrained models, a new more resource-efficient technique has emerged, called Prompt Tuning.
Training Foundation Models is expensive. Techniques like Prompt Engineering address this by freezing the models and providing context in prompts to optimize results at the expense of losing performance compared to retraining models. New techniques like Prompt Tuning and Multi-task Prompt Tuning are evolving to address this.
Foundation Models are the foundation for different AI downstream tasks. To leverage these generic models for specific tasks, prompt engineering is a technique to optimize the results without having to re-train or fine-tune the models. This post shows some samples which demonstrate why prompt engineering is important.
One of the most impressive features of Large Language Models is the ability to answer questions in fluent language. This post describes some of the underlaying techniques and how to avoid hallucination.
Large Language Models can improve search results significantly, since they don’t try to find exact word matches but passages of text that fit best to the questions. The post below explains high level information retrieval concepts and IBM’s leading state of the art model, called Dr.Decr.
As Large Language Models have been trained with massive amounts of data, they can provide impressively fluent answers. Unfortunately, the answers are not always correct. Passing in context to questions helps reducing hallucination significantly.
Transformer based AI models can generate amazing answers to users’ questions. While the underlaying Large Language Models are not retrained, the performance of Question Answering AI can be improved by running experiments with different hyper parameters.
To find the best possible models and parameters for Question Answering via Generative AI, a lot of experiments need to be run. While some techniques have been proven successful, other approaches need to be tried out. Some findings can even be discovered coincidentally via trial and error. This post describes how experiments can be run automatically.
Large Language Models can improve the user experience of virtual assistants like Watson Assistant by providing answers rather than lists of links. With Watson Assistant’s ‘Bring your own Search’ capability these generative capabilities can easily be added via OpenAPI and custom HTML responses.
Generative foundation models like ChatGPT can handle fluent conversations pretty well. IBM Watson Assistant can be extended with NeuralSeek to search and provide answers from enterprise knowledge bases like IBM Watson Discovery.
Developing high quality software is not trivial and not cheap. To help developers, there are several attempts to leverage AI and special foundation models optimized for source code. This post gives a quick overview of popular projects with a focus on an open source project from IBM.
While there are several playgrounds to try Foundation Models, sometimes I prefer running everything locally during development and for early trial and error experimentations. This post explains how to set up the Anaconda environment via Docker and how to run the small Flan-T5 model locally.