Roundtable on LLM and RAG

udg · Feb 23, 2025

Hi all,
after some weeks spent playing with "toys" like ChatGPT and similar, made available both as online webpages and APIs by their owners, I came to the conclusion that the real business is creating specialized LLMs augmented by RAGs (Retrieval Augmented Generation).
I mean, chatting for a while with a generic chatbot on the most diverse themes is fascinating. Letting a tool create an image (or even a short video) based on given prompts is awesome. So to having TTS functionality or Assistant-like tools.

But what about a system tailored to a specific task? Something that knows almost everything about a product or the often-changing timetables of buses/trains/planes.
For what I read so far, we need a good LLM coupled with a RAG "module".
Now, generating a proprietary LLM is out of question due to power and costs associated. Maybe a RAG could be feasible (mainly if limited in scope).

So, the goal of this thread is to open a discussion on how to build a system based on the above concepts (and others that may emerge here).
Which tools to use? Any hands-on experience to share?
Is there already a system that loops on documents (PDFs, text, spreadsheets..) in a folder and build a RAG on those? Can it work in conjunction with a free (or low cost) LLM in order to bring up a complete, professionally looking solution about a specific matter?
Can we host the final "object" on a dedicated VPS making it available to a (more or less) large audience by the means of APIs ?

As said, this thread is intented as a 360 degrees collection of ideas and experiences. How B4X comes in to play? Well, don't underestimate B4X power and flexibility...

Erel · Feb 23, 2025

1. The new PyBridge tool was created exactly for these use cases. Check these posts: https://www.b4x.com/android/forum/threads/markdown-to-html-with-a-twist.165287/post-1014419

2. The new search engine is a type of RAG. It is based on https://github.com/stanford-futuredata/ColBERT
Works quite good.

udg · Feb 23, 2025

That's the reason for my last sentence..

Daestrum · Feb 23, 2025

You can add Agents to a lot of LLMs now. These are python functions that allow it to interact outside its data..
For example, you have an LLM where its data cut-off date was 2023. You can create an Agent that allows it to query google to 'fill-in' its missing knowledge with current data.
Some agents (like ToolCallingAgent) will create their own python code for functions they need and then use them. ( I tried a few and found I needed to put in the prompt a limit to what they were allowed to do - security wise it could do anything - a jarring thought).

LucaMs · Feb 23, 2025

[O.T.]

All great but I would like to know one little thing.

I create a SW (desktop or mobile) that uses ChatGPT (via API).
I need to have an account, get an API Key and pay based on the number of consultations.
What do you do? Do you use your ChatGPT account? So if your SW will be installed by thousands of people or, worse, millions, will you pay for all of them? Or will you force the user to create their own account?

P.S. Also, if you use your account, it will receive too many requests at the same time.

udg · Feb 23, 2025

Or will you force the user to create their own account?

Looks the way to go. On sw start you may check for a config file. If it's empty you instruct the user to register with OpenAi and copy his/her key(s) in your config panel.

Daestrum · Feb 23, 2025

Or use a local LLM that doesnt cost a penny to use. I currently run a few on my laptop just for testing with PyBridge.

LucaMs · Feb 23, 2025

Daestrum said:
Or use a local LLM that doesnt cost a penny to use. I currently run a few on my laptop just for testing.

udg said:
Now, generating a proprietary LLM is out of question due to power and costs associated

Daestrum · Feb 23, 2025

Easiest to start with a small trained model (~4GB), then tune it for the specific use. Start with one that understands conversations then build onto that.

LucaMs · Feb 23, 2025

Daestrum said:
Easiest to start with a small trained model (~4GB), then tune it for the specific use. Start with one that understands conversations then build onto that.

I assume it is not simple and will be much less powerful than the already poor and disappointing ChatGPT.

Daestrum said:
Or use a local LLM that doesnt cost a penny to use. I currently run a few on my laptop just for testing with PyBridge.

Can you give us some pointers on how you did it (preferably in a new thread)?

Daestrum · Feb 24, 2025

LucaMs said:
Can you give us some pointers on how you did it (preferably in a new thread)?

Do you mean how to set up a local LLM on a pc?

udg · Feb 24, 2025

Waiting for @LucaMs reply, I think his response could be: "partly yes".
Eventually open a Tutorial thread where you teach us (directly or pointing to links and docs on the Internet) how to set up a PC with a local LLM.
Then expand it with functions to call runtime data (from a DB or other sources)
Fundamentally what you already did so far.

In the current thread it would be nice to hear about your expeirence with some LLMs, which one looked more promising (for general or specific goal). How difficult was to integrate RAG or other means of personal documents digesting. Or, instead, the interaction with your own DB.

In my first post, I used as an example the timetable for buses, trains or airplanes. While those at first appear as static info, in fact they become dynamic when you consider delays, strikes,..
Once, for example, you have the timetable for a metro/subway, you should be able to ask a question like "how do I go from A to B?". Considering that it could be at all possible that you have to stop in C, wait a few minutes than go to B. So the response could include the total journey time (and cost) or other info based on the hour of the day. Well, imagination is the limit..

Daestrum · Feb 24, 2025

Ok this is a list of LLMs/SLMs I have played with Locally on my laptop.

microsoft--Phi-3.5-mini-instruct (chat)
deepseek-ai--DeepSeek-R1-Distill-Qwen-1.5B (chat)
Qwen--Qwen2.5-Coder-1.5B-Instruct (chat)
Qwen--Qwen2-1.5B-Instruct (chat)
Intel/ldm3d-4c (image generator)
CompVis/stable-diffusion-v1-4 (image generator or vision (cant remember))
EleutherAI/pythia-70m-deduped (chat)
Intel/dynamic_tinybert (chat)

There were some others but they took far too long to respond (I only have 8GB VRAM - so they spilled into system ram) as they were a tad large.

LucaMs · Feb 24, 2025

Daestrum said:
Ok this is a list of LLMs/SLMs I have played with Locally on my laptop.

microsoft--Phi-3.5-mini-instruct (chat)
deepseek-ai--DeepSeek-R1-Distill-Qwen-1.5B (chat)
Qwen--Qwen2.5-Coder-1.5B-Instruct (chat)
Qwen--Qwen2-1.5B-Instruct (chat)
Intel/ldm3d-4c (image generator)
CompVis/stable-diffusion-v1-4 (image generator or vision (cant remember))
EleutherAI/pythia-70m-deduped (chat)
Intel/dynamic_tinybert (chat)

There were some others but they took far too long to respond (I only have 8GB VRAM - so they spilled into system ram) as they were a tad large.

Could you suggest the lightest and fastest of them and how to install and use it.
Not for me, I only have a laptop from 1882, with Windows 7 (which will die soon and you will not have to suffer because you will never read my posts again

).

udg said:
While those at first appear as static info, in fact they become dynamic when you consider delays,

Any reference to the current brilliant Italian transport minister is purely coincidental

Roundtable on LLM and RAG

udg

Expert

Erel

B4X founder

udg

Expert

Daestrum

Expert

LucaMs

Expert

udg

Expert

Daestrum

Expert

LucaMs

Expert

Daestrum

Expert

LucaMs

Expert

Daestrum

Expert

udg

Expert

Daestrum

Expert

LucaMs

Expert