New search engine

Erel · Nov 5, 2024

A new search engine is now running on the forum. It is quite sophisticated and is based on a large language model. More information here: https://github.com/stanford-futuredata/ColBERT
The language model allows the engine to better "understand" the query and not just find the specific terms. This is not ChatGPT nor Google, but based on my tests it provides better results than the previous search engine, especially for longer queries.
I've also tuned it a bit differently and made it less focused on tutorials.

The previous search engine, which was quite good and served us well, is still there and will be used in case of failures.
Feedback is welcome. This is the first version and I'm sure that there are many ways to further improve it.

You can see whether it is the new search engine running with the small label at the bottom.

LucaMs · Nov 5, 2024

I had nothing to search for, I tried almost randomly.
The result seems very good to me, although I can't compare it to the base engine.

JohnC · Nov 5, 2024

Since this new search engine now allows us to type in a longer query and not just short terms, would it be possible to add maybe a second magnifier lens icon next to the current magnifier lens icon with maybe a "+" symbol added to it so its visually different and allow us to click on it and it will open up an advanced search window where the query textbox is much bigger so we can see our full prompt instead of that tiny textbox in the forum header.

Erel · Nov 5, 2024

Note that you can add the search engine to the browser directly. It is very convenient. Set a shortcut letter and then make the search directly. I use it all the time.

LucaMs · Nov 10, 2024

What happened?

P.S. The new engine doesn't like searching for the word "Singleton"

Sandman · Nov 10, 2024

LucaMs said:
P.S. The new engine doesn't like searching for the word "Singleton"

I can confirm I also get these results. (Here's how I search the site from time to time, as the built-in search engine doesn't index all the subforums: https://duckduckgo.com/?q=site:b4x.com+singleton)

Erel · Nov 11, 2024

It looks like it doesn't find any meaningful result. Note that the new index only holds posts that were created or last edited since 2015.

Maybe a score threshold is needed for such cases.

peacemaker · Nov 11, 2024

Just feedback - IMHO, search now is much better than old one.

Sandman · Nov 11, 2024

Erel said:
It looks like it doesn't find any meaningful result.

Just looking at some of the results it gave for "singleton", and from what I can tell there's no connection at all to the term there. I don't mean to sound stupid, but I feel I have to ask: When ColBERT finds nothing, is it somehow still forced to produce results, so it picks random threads? (Which might explain why the chosen snippets from the posts are somewhat insane too, because there is no relevant text to use.)

peacemaker said:
search now is much better than old one

FWIW, I have not seen any noticable change, in either direction.

aeric · Nov 11, 2024

If understand correctly, LLM needed to be trained.
How it is trained? Do we need to provide more searches?
Do we need to master and apply some kind of "prompt" skills?
It seems I am not clear how to use the search engine to produce the desired results I wanted.

byz · Nov 11, 2024

So far, the new search seems to understand what I want, and the search results are more accurate than before.

byz · Nov 11, 2024

What is the maximum number of search terms? If I ask for a solution, will the new search give you all sorts of relevant and interesting results? If so, then I would like to call it a b4x expert.

Erel · Nov 12, 2024

Sandman said:
When ColBERT finds nothing, is it somehow still forced to produce results, so it picks random threads?

It actually finds something but the relevance is too low to be meaningful.

aeric said:
How it is trained? Do we need to provide more searches?

I'm using a generic pretrained model. It is possible to fine tune it and I might experiment with it in the future.

byz said:
What is the maximum number of search terms? If I ask for a solution, will the new search give you all sorts of relevant and interesting results?

The search engine isn't built as a Q/A bot, such as ChatGPT. It is a token + context based search. The "context" is the real improvement over standard search engines.

The query can be up to about 250 terms. I don't know whether it will return good results with such queries. We are all learning

Erel · Tuesday at 9:07 AM

The "similar threads" feature at the bottom of most threads is now powered by ColBERT. This feature helps with discovery of related content.

Sandman · Tuesday at 9:35 AM

I'm not sure if this is something being worked on, I just wanted to highlight that Colbert still produce somewhat anemic results for some queries.

Search Results

www.b4x.com

With that said, I saw no problem using the old search engine and would prefer if not too much time is being spent on the new search engine.

Erel · Tuesday at 9:54 AM

Sandman said:
I'm not sure if this is something being worked on, I just wanted to highlight that Colbert still produce somewhat anemic results for some queries.

This happens when there are no valid results. The search engine currently doesn't index this subforum so there are no relevant results. I will check the thresholds at some point.

Sandman said:
would prefer if not too much time is being spent on the new search engine

1. I'm pretty confident that the new search engine is better than the old one and this by itself is very important.
2. I hope that in the future I will be able to find more fruitful usages for LLMs / AI assistants in the context of B4X. This is the first step.

Sandman · Tuesday at 10:31 AM

Erel said:
1. I'm pretty confident that the new search engine is better than the old one and this by itself is very important.

I will trust your judgment here. As a single data point, I will again say that I have not detected any noticable change. (Speaking as a fairly heavy user of the forum and the search.)

Erel said:
2. I hope that in the future I will be able to find more fruitful usages for LLMs / AI assistants in the context of B4X. This is the first step.

Conceptually I have absolutely no problem with this type of research. My issue is that other things will not move forward while you focus on LLM/AI. If you had a team of 2-3 people, I would even encourage you to dedicate a person now and then to projects like this. But that's not where we are, and judging by previous discussions, that's not where you want B4X to be. No need to repeat that discussion again. The bottom line is that you're effectively the only developer for B4X, and if you spend time on LLM/AI, you're not spending time on posted wishes, or other things that have a direct and huge impact for your users and customers.

byz · Tuesday at 11:07 AM

I think the progress Erel has made in LLM/AI will ultimately benefit the B4X community. include but not limited to: Language development, IDE optimization, use of programming assistants, etc. It will also reduce his workload.

josejad · Tuesday at 3:23 PM

byz said:
Erel has made in LLM/AI will ultimately benefit the B4X community

Finally, Erel's LLM/AI becomes self-aware at 02:14 am Eastern Time after its activation on November 5, 2024 and launches nuclear missiles against other development platforms who, in a panic, tried to disconnect it.

New search engine

Erel

B4X founder

LucaMs

Expert

JohnC

Expert

Erel

B4X founder

LucaMs

Expert

Sandman

Expert

Erel

B4X founder

peacemaker

Expert

Sandman

Expert

aeric

Expert

byz

Active Member

byz

Active Member

Erel

B4X founder

Erel

B4X founder

Sandman

Expert

Search Results

Erel

B4X founder

Sandman

Expert

byz

Active Member

josejad

Expert