Watsonx.ai is IBM’s AI platform built for business. It is provided as SaaS and as software which can be deployed on multiple clouds and on-premises. This post describes how to deploy custom fine-tu...
Understanding the Watsonx.ai API
Watsonx.ai is IBM’s enterprise studio for AI builders to train, validate, tune and deploy Large Language Models. It comes with multiple open source and IBM LLMs which can be accessed via REST API. ...
Running Mistral on CPU via llama.cpp
Via quantization LLMs can run faster and on smaller hardware. This post describes how to run Mistral 7b on an older MacBook Pro without GPU. Llama.cpp is an inference stack implemented in C/C++ to...
Generating synthetic Data with Mixtral
Fine-tuning and aligning language models to follow instructions requires high quality data and a large quantity of data. IBM published a paper that describes how synthetic data can be generated wit...
Mixtral Agents with Tools for Multi-turn Conversations
Larger Large Language Models like ChatGPT can be prompted to behave as agents for specific use cases. They can return output in certain formats, and they can return instructions to invoke code. Thi...
Deploying LLMs via Hugging Face on IBM Cloud
With the Text Generation Inference toolkit from Hugging Face Large Language Models can be hosted efficiently. This post describes how to run open-source models or fine-tuned models on IBM Cloud. T...
Fine-tuning LLMs via Hugging Face on IBM Cloud
The speed of innovation in the AI community is amazing. What didn’t seem to be possible a year ago, is standard today. Fine-tuning is a great example. With the latest progress, you can fine-tune sm...
Highlights of my technical Work in 2023
What a great year 2023 has been! When ChatGPT was published at the end of 2022, I knew it would change the world. I wanted to learn and understand this technology. Fortunately, through my network ...
Evaluating LoRA Fine-Tuning Results
After Large Language Models have been fine-tuned, the quality needs to be evaluated. This post describes a simple s example utilizing a custom evaluation mechanism. For standard LLM tasks there ar...
Fine-Tuning LLMs with LoRA on a small GPU
Smaller and/or quanitzed Large Language Models can be fine-tuned on a single GPU. For example for FLAN T5 XL (3b) a Nvidia V100 GPU with 16GB is sufficient. This post demonstrates a simple example ...