I have been working on a library to use LLM's (like openAI) in B4J for our company. Using OpenAI is pretty simple to do with their REST API, but it quickly becomes quite costly as they not only make you pay for the output, but also for the input. In comes RAG (Retrieval Augmented Generation). It prepares your input data for the question you are going to ask, reducing the input tokens significantly. By introducing Query Routers in the RAG, the quality of the reduced input is also much better.
We can also add tools written in B4J to the chain. It allows the LLM to call, when necessary, one or more available tools, usually defined by the developer. A tool can be anything: a web search, a call to an external API, or the execution of a specific piece of code, etc. LLMs cannot actually call the tool themselves; instead, they express the intent to call a specific tool in their response (instead of responding in plain text). We, as developers, should then execute this tool with the provided arguments and report back the results of the tool execution.
Tools make the possibilities of LLMs endless.
And it has to be pretty simple to use my boss said...
In the example below we reduced the costs to make 2 calls to OpenAI from $0.603065 to only $0.017345, so we reduced the cost by approximately 97.12%.
EDIT: My boss just pointed out I was wrong with my calculations, as for 2 separate questions we would actually send the input twice without RAG. So without RAG it would've cost $1.20613 and with RAG still only $0.017345, so a reduced cost of approximately 98.56%
An example of the use of our LLM library (I commented out the sources part here for readability of the B4J log, but they contain snippets of the input documents, relevant to the question):
The code of the LogResult() method
The code of MyMovieTool in the AllMyTools class:
The result:
The input barcodes.png to read the barcodes from:
The output Donald Duck:
It is a fascinating world A.I. and a lot more to explore, but we're getting the hang of it
Alwaysbusy
We can also add tools written in B4J to the chain. It allows the LLM to call, when necessary, one or more available tools, usually defined by the developer. A tool can be anything: a web search, a call to an external API, or the execution of a specific piece of code, etc. LLMs cannot actually call the tool themselves; instead, they express the intent to call a specific tool in their response (instead of responding in plain text). We, as developers, should then execute this tool with the provided arguments and report back the results of the tool execution.
Tools make the possibilities of LLMs endless.
And it has to be pretty simple to use my boss said...
EDIT: My boss just pointed out I was wrong with my calculations, as for 2 separate questions we would actually send the input twice without RAG. So without RAG it would've cost $1.20613 and with RAG still only $0.017345, so a reduced cost of approximately 98.56%
An example of the use of our LLM library (I commented out the sources part here for readability of the B4J log, but they contain snippets of the input documents, relevant to the question):
B4X:
Dim question As String
Dim LLM As ABLLM
' We are going to use RAG (Retrieval Augmented Generation) in the chain.
' LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time
' that they were trained on. If you want to build AI applications that can reason about private data or data introduced after
' a model's cutoff date, you need to augment the knowledge of the model with the specific information it needs.
LLM.Initialize(True)
' the OpenAI keys
LLM.OPENAI_APIKEY = "sk-proj-Qus6to4w01WofSs6z0_2g3tN4T7ekRkWYWuS3..."
LLM.OPENAI_ORGANIZATIONID = "org-OeNM4oDm..."
' path where to store the Vecktorized documents
LLM.STORE_PATH = "K:/ABLLMStore"
' some other parameters can be set (these are the defaults)
LLM.LOG_REQUESTS = False
LLM.LOG_RESPONSES = False
LLM.OPENAI_MODELNAME = "gpt-4o"
LLM.OPENAI_IMAGEMODELNAME = "dall-e-3"
LLM.OPENAI_TEMPERATURE = 0.7
LLM.OPENAI_MAXTOKENS = 512
LLM.RAG_MAX_RESULTS = 3
LLM.RAG_MIN_SCORE = 0.7
LLM.RESPONSE_MAX_MESSAGES = 3
LLM.SPLITTER_SEGMENT_SIZE = 300
LLM.SPLITTER_SEGMENT_OVERLAP = 0
Dim NoRAGTokens As Long
' adding two documents, one about The Matrix and one about BANano
' the RAG router will only use segments the ones that are relevant to the question, reducing the input tokens significantly.
' You pay OpenAI for the number of tokens you input and output so reducing them is important
NoRAGTokens = NoRAGTokens + LLM.AddDocument("https://www.onetwowork.com/usecases", "description of use cases of One Two Work", "Work2020", False) ' the file is 108KB, or about 26.562 tokens.
NoRAGTokens = NoRAGTokens + LLM.AddDocument("K:\TheMatrix.txt", "description of the movie The Matrix", "Matrix", False) ' the file is 108KB, or about 26.562 tokens.
NoRAGTokens = NoRAGTokens + LLM.AddDocument("K:\BANanoEssentialsV1.09.pdf", "description of the programming language BANano for B4J", "BANano", False) ' the file is 390KB, or about 92.971 tokens.
NoRAGTokens = NoRAGTokens + LLM.AddDocument("K:\TeamCup-Ranking.xlsx", "scoring of the Team cup", "TeamCup", False)
NoRAGTokens = NoRAGTokens + LLM.AddDocument("K:\OpenAISources\B4xBasicLanguageV1_2.docx", "description of the B4J language", "B4J",False)
' load the store
NoRAGTokens = LLM.LoadStore
' you can add tools to the chain that do tasks or return extra information to the LLM in the chain.
Dim MyTools As AllMyTools
MyTools.Initialize
' the parameters we want the LLM to retun to our function. They are always in json format.
Dim ParamDescriptions As Map
ParamDescriptions.Initialize
ParamDescriptions.Put("MovieName", "the name of the movie")
ParamDescriptions.Put("Year", "the year the movie was released if it is known")
ParamDescriptions.Put("BoxOffice", "the box office results if it is known")
' describe what the tool does. If it is relevant to the question, it will be used.
LLM.AddTool(MyTools, "MyMovieTool", ParamDescriptions, "saves some details about the movie in a database")
' start the LLM chain
LLM.Start
Log("Chain started...")
Dim AllTokens As List
AllTokens.Initialize
' describe an image
' question = "What do you see?"
question = "Read barcodes"
Log("QUESTION: " & question)
Log("------------------------------------------------------------------------")
Dim result As ABLLMResult = LLM.QueryImage(question, "K:\barcodes.png", False)
Dim VisionTokens As TokensUsed = LogResult(result, False)
If VisionTokens.Input > 0 Then
AllTokens.Add(VisionTokens)
End If
' generate an image
question = "Donald Duck in New York, cartoon style"
Log("QUESTION: " & question)
Log("------------------------------------------------------------------------")
Dim result As ABLLMResult = LLM.GenerateImage(question, False)
Dim ImageTokens As TokensUsed = LogResult(result, False)
If ImageTokens.Input > 0 Then
AllTokens.Add(ImageTokens)
End If
question = "Summarize the use cases of One Two Work"
Log("QUESTION: " & question)
Log("------------------------------------------------------------------------")
Dim result As ABLLMResult = LLM.Chat(question, True, False)
Dim Q1Tokens As TokensUsed = LogResult(result, False)
If Q1Tokens.Input > 0 Then
AllTokens.Add(Q1Tokens)
End If
' this will only use an extract of the text segments (by using the RAG) of The Matrix file
' reducing Token cost significantly. Because we ask it to save some details, the MyMovieTool function will be called.
question = "What is the plot of the movie The Matrix and save the details."
'question = "Give Matrix (film) plot and save"
Log("QUESTION: " & question)
Log("------------------------------------------------------------------------")
Dim result As ABLLMResult = LLM.Chat(question, True, False)
Dim Q2Tokens As TokensUsed = LogResult(result, False)
If Q2Tokens.Input > 0 Then
AllTokens.Add(Q2Tokens)
End If
' this will only use an extract of the text segments (by using the RAG) of the BANano file
' reducing Token cost significantly
question = "What is BANano?"
Log("QUESTION: " & question)
Log("------------------------------------------------------------------------")
Dim result As ABLLMResult = LLM.Chat(question, True, False)
Dim Q3Tokens As TokensUsed = LogResult(result, False)
If Q3Tokens.Input > 0 Then
AllTokens.Add(Q3Tokens)
End If
Dim TotalINTokens As Long
Dim TotalOUTTokens As Long
Dim NumOfRAGQuestions As Long = AllTokens.Size
For i = 0 To NumOfRAGQuestions - 1
Dim tmpTokens As TokensUsed = AllTokens.Get(i)
TotalINTokens = TotalINTokens + tmpTokens.Input
TotalOUTTokens = TotalOUTTokens + tmpTokens.Output
Next
Dim NoRAG As Long = NoRAGTokens * NumOfRAGQuestions + TotalOUTTokens
Dim WithRAG As Long = TotalINTokens * NumOfRAGQuestions + TotalOUTTokens
Dim PriceNoRAG As Double = (NoRAGTokens * NumOfRAGQuestions * 5.0/1000000.0) + (TotalOUTTokens * 15.0/1000000.0)
Dim PriceWithRAG As Double = (TotalINTokens * 5.0/1000000.0) + (TotalOUTTokens * 15.0/1000000.0)
LogError("Without RAG, asking these " & NumOfRAGQuestions & " questions would've used " & NoRAG & " of our allowed tokens, cost: $" & PriceNoRAG)
LogError("With RAG, asking these " & NumOfRAGQuestions & " questions only used " & WithRAG & " our allowed tokens, cost: $" & PriceWithRAG)
The code of the LogResult() method
B4X:
public Sub LogResult(result As ABLLMResult, showSources As Boolean) As TokensUsed
Dim tokens As TokensUsed
tokens.initialize
If result.ResultType = result.RESULT_FAILED Then
Log("Question (or shortened question is empty?")
Else
If result.InputTokenUsage > 0 Then
Log("INPUT: " & result.InputTokenUsage & " after RAG")
tokens.Input = result.InputTokenUsage
tokens.Output = result.OutputTokenUsage
Log("OUTPUT: " & result.OutputTokenUsage)
Log("TOTAL: " & result.TotalTokenUsage)
End If
If showSources Then
'Shows the result of the RAG And what TextSegments were actually send To OpenAI
Dim sources As List = result.Sources
If sources.Size > 0 Then
Log("SOURCES USED:")
For i = 0 To sources.Size - 1
Log(sources.Get(i))
Next
End If
End If
Log("FINISHED REASON: " & result.FinishedReason)
Log("------------------------------------------------------------------------")
Log("ANSWER:")
Log(result.Text)
Log("========================================================================")
End If
Return tokens
End Sub
The code of MyMovieTool in the AllMyTools class:
B4X:
public Sub MyMovieTool(jsonString As String) As String
Log("In the MyMovieTool, parameter returned by OpenAI: " & jsonString)
Log("------------------------------------------------------------------------")
' Here you can save it in a database for example
' ...
' Tell the chain we're done so it can continue answering the question
Return "The details are now saved in the database"
End Sub
The result:
B4X:
https://www.onetwowork.com/usecases skipped. Already found in Work2020
K:\TheMatrix.txt skipped. Already found in Matrix
K:\BANanoEssentialsV1.09.pdf skipped. Already found in BANano
K:\TeamCup-Ranking.xlsx skipped. Already found in TeamCup
K:\OpenAISources\B4xBasicLanguageV1_2.docx skipped. Already found in B4J
Store Work2020 loaded...
Store Matrix loaded...
Store TeamCup loaded...
Store BANano loaded...
Store B4J loaded...
Chain started...
QUESTION: Read barcodes
------------------------------------------------------------------------
FINISHED REASON: N/A
------------------------------------------------------------------------
ANSWER:
The image contains various types of barcodes and corresponding types are labeled above each barcode. Below each barcode, the numbers or data encoded in the barcode are displayed as text. Here are the details:
1. **DataMatrix**: This barcode is a 2D matrix type, and the data it encodes is not visible in text form in this specific barcode.
2. **QR Code**: Another 2D barcode type, which also does not show the data it encodes visibly here.
3. **PDF417**: A type of stacked barcode; the data it encodes is not visible here.
4. **Codabar**: Encodes the numbers "0123456789".
5. **Code11**: Encodes the numbers "0123456789".
6. **Code25Standard**: Encodes the numbers "01234567".
7. **Code25Interleaved**: Encodes the numbers "0123456789".
8. **Code39**: Encodes the numbers "0123456".
9. **Code93**: Encodes the numbers "0123456789".
10. **Code128**: Encodes the numbers "0123456".
11. **Code39Extended**: Encodes the numbers "0123456".
12. **Code93Extended**: Encodes the numbers "0123456789".
Each barcode type is utilized for different applications based on the amount of data they can store and the environment in which they will be used.
========================================================================
QUESTION: Donald Duck in New York, cartoon style
------------------------------------------------------------------------
FINISHED REASON: N/A
------------------------------------------------------------------------
ANSWER:
https://oaidalleapiprodscus.blob.core.windows.net/private/org-OeNM4oDmnnq1QgOo9zdoNhKX/user-7b4mRza9oK9qZB13SyFe891w/img-ybwn8O48Wc3Jqpc0yYW6EF1W.png?st=2024-10-17T08%3A51%3A50Z&se=2024-10-17T10%3A51%3A50Z&sp=r&sv=2024-08-04&sr=b&rscd=inline&rsct=image/png&skoid=d505667d-d6c1-4a0a-bac7-5c84a87759f8&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2024-10-16T23%3A17%3A48Z&ske=2024-10-17T23%3A17%3A48Z&sks=b&skv=2024-08-04&sig=rJ/4tbDHuDvna65VQgVf2CkUNwCkci1wk0BrZ%2BNll54%3D
========================================================================
QUESTION: Summarize the use cases of One Two Work
------------------------------------------------------------------------
INPUT: 791 after RAG
OUTPUT: 67
TOTAL: 858
FINISHED REASON: STOP
------------------------------------------------------------------------
ANSWER:
One Two Work is a time recording application that provides users with valuable insights into the hours spent on various tasks, enabling them to make informed decisions. The platform offers a series of real-life examples demonstrating its practical applications in different scenarios. By accurately tracking time, One Two Work helps users optimize their productivity and manage their time more effectively.
========================================================================
QUESTION: What is the plot of the movie The Matrix and save the details.
------------------------------------------------------------------------
In the MyMovieTool, parameter returned by OpenAI: {"MovieName":"The Matrix","Year":"1999","BoxOffice":"$467.6 million"}
------------------------------------------------------------------------
INPUT: 2418 after RAG
OUTPUT: 168
TOTAL: 2586
FINISHED REASON: STOP
------------------------------------------------------------------------
ANSWER:
**The Matrix** is a 1999 science fiction action film written and directed by the Wachowskis. It stars Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe Pantoliano. The film is set in a dystopian future where humanity is unknowingly trapped inside the Matrix, a simulated reality created by intelligent machines to distract humans while using their bodies as an energy source. The protagonist, Thomas Anderson, also known as "Neo," is a computer programmer who discovers the truth and joins a rebellion against the machines with other people who have been freed from the Matrix.
The details of the movie have been saved in the database.
========================================================================
QUESTION: What is BANano?
------------------------------------------------------------------------
INPUT: 970 after RAG
OUTPUT: 330
TOTAL: 1300
FINISHED REASON: STOP
------------------------------------------------------------------------
ANSWER:
BANano is a set of methods and tools designed to assist in writing web code using B4J. It includes a Transpiler, which is a key component that helps convert B4J code into web-compatible code. BANano provides various objects and features that mimic typical JavaScript functionalities, allowing developers to write web applications in B4J while still having access to JavaScript-like capabilities.
Some key features of BANano include:
1. **Transpiler**: This is used to convert B4J code into web-compatible code. The `AppStart` method is unique as it is not transpiled and runs in the B4J IDE like regular B4J code, allowing developers to set directions for building a web project.
2. **JavaScript and CSS Writing**: BANano allows developers to write raw JavaScript and CSS directly within their B4J code, providing flexibility for custom solutions when needed.
3. **BANano Objects**: These include a variety of JavaScript-like objects such as `BANanoConsole`, `BANanoWindow`, `BANanoHistory`, `BANanoLocation`, `BANanoNavigator`, `BANanoScreen`, and more. These objects offer methods and properties typical of JavaScript, enabling developers to perform actions akin to what they would in a JavaScript environment.
4. **Special Features**: BANano also includes special components like Background Workers, Router, and WebComponent, enhancing its capability to build complex web applications.
Overall, BANano serves as a bridge between B4J and web development, allowing developers to leverage B4J's strengths while still creating robust web applications.
========================================================================
Without RAG, asking these 3 questions would've used 529975 of our allowed tokens, cost: $2.655525
With RAG, asking these 3 questions only used 13102 our allowed tokens, cost: $0.02937
The input barcodes.png to read the barcodes from:
The output Donald Duck:
It is a fascinating world A.I. and a lot more to explore, but we're getting the hang of it
Alwaysbusy
Last edited: