Model Distillation is a very interesting concept to build small models which are almost as efficient as larger models for specific tasks. This post describes the concept in general and how it can b...
Fine-tuning small LLMs with Output from large LLMs
The innovation in the AI space is continuing in an incredible speed. This post describes a new technique that has evolved recently which allows smaller fine-tuned models to almost reach the perform...
Foundation Models, Transformers, BERT and GPT
Since I’m excited by the incredible capabilities which technologies like ChatGPT and Bard provide, I’m trying to understand better how they work. This post summarizes my current understanding about...
Synergizing Reasoning and Acting (ReAct) in LLMs
Large Language Models are extremely powerful, but they can only return data that existed when they were trained and they cannot invoke APIs and business logic. The technique ReAct combines Chain-of...
Reinforcement Learning from Human Feedback (RLHF)
Fine-tuning of Large Language Models optimizes models for certain AI tasks and/or improves performance for smaller and less resource intensive models. This post describes how to further improve mod...
Memory-efficient Fine-tuning with with QLoRA
LoRA-based fine-tuning of Large Language Models freezes the original weights and only trains a small number of parameters making the training much more efficient. QLoRA goes one step further and re...
Preparing Data for Fine-tunings of Large Language Models
Fine-tuning large language models with instructions is a great technique to customize models efficiently. This post explains briefly how data can be turned into instructions. In my earlier post In...
Text Generation Inference for Foundation Models
Serving AI models is resource intensive. There are various model inference platforms that help operating these models as efficiently as possible. This post summarizes two platforms for classic ML a...
Language Support for Large Language Models
Many of the leading Large Language Models only support limited languages currently, especially open-source models and models built by researchers. This post describes some options how to get these ...
Fine-tuning Models for Question Answering
Question Answering is one of the most interesting scenarios for Generative AI. While base models have often been trained with massive amounts of data, they have not always been fine-tuned for speci...