{ Analytica Model LLMs_as_optimizers_linear_regression, encoding="UTF-8" }
SoftwareVersion 6.4.1
{ System Variables with non-default values: }
SampleSize := 1000
TypeChecking := 1
Checking := 1
SaveOptions := 2
SaveValues := 0
DisplayInputs Object: Index Param_rec, Index Parameters_rec, Function Enum_vals_for_fn_par, Constant RE_for_Parameters_in, Function Embedding_similarity, Function Embedding_for
DisplayInputs Function: Index Param_rec, Index Parameters_rec, Function Enum_vals_for_fn_par, Constant RE_for_Parameters_in
{!-50299|DiagramColor Model: 65535,65535,65535}
{!-50299|DiagramColor Module: 65535,65535,65535}
{!-50299|DiagramColor LinkModule: 65535,65535,65535}
{!-50299|DiagramColor Library: 65535,65535,65535}
{!-50299|DiagramColor LinkLibrary: 65535,65535,65535}
{!-50299|DiagramColor Form: 65535,65535,65535}
DisplayInputs Variable: Function Embedding_similarity, Function Embedding_for
NodeInfo FormNode: 1,0,0,,0,0,,,,0,,,0
{!-50299|NodeColor Text: 62258,62258,62258}
{!-60000|Attribute AcpStyles}
Model LLMs_as_optimizers_linear_regression
Description: This paper out of Deep Mind last week:~
~
• Yang et al. (7-Sep-2023), "Large language models as optimizers", ArXiv 2309.0.409v1~
~
implemented optimization algorithms in English (see also the article related idea Programming in English). The LLM caries out each step of the optimization algorithm, proposing a new prospective guess at each step. The paper claims positive results for three examples:~
~
• 1-D linear regression (fitting slope & intercept to data)~
• Traveling saleman problem~
• Prompt design (for LLM prompting)~
~
The idea might be useful in gradient-free optimization problems.~
~
It is fascinating that an LLM would be able to implement optimization algorithms such as linear regression. So much so that I wanted to try it myself. This model is an attempt to reproduce their results for the linear regression task as closely as possible.~
~
I avoided one implementation detail that the authors used, expressed in this sentence from the paper:~
"We prompt the meta-prompt 8 times to generate at most 8 new (w, b) pairs in each step to improve optimization stability."~
Although this would be easy to include, I felt it would obscure whether the technique actually works or not. By generating multiple prospects at every step (and presumably taking only the one with the smallest loss), you obscure actual performance with performance of random search (which is an actual, but super inefficient, optimization method).~
~
My basic finding is that GPT-3.5-turbo does not converge -- i.e., the technique does not work. But GPT-4 does converge. It works. Of course, this is a horribly inefficient way to do linear regression. The point is for the academic curiosity of whether an LLM can actually implement an optimization algorithm.
Author: Lonnie Chrisman~
Lumina Decision Systems
Date: Tue, Sep 19, 2023 7:18 AM
DiagState: 2,0,0,1704,524,17,10
WindState: 2,199,94,720,648
FontStyle: Arial,15
FileInfo: 0,Model LLMs_as_optimizers_linear_regression,2,2,0,0,C:\Src\AI Assistant Prototype\Example models\LLMs as optimizers -- linear regression.ana
Decision ground_truth_w
Title: ground truth w
Description: The slope used for generating synthetic data.
Definition: 15
NodeLocation: 144,56,1
NodeSize: 64,24
Aliases: FormNode Fo980011091
Decision ground_truth_b
Title: ground truth b
Description: The y-intercept used for generating synthetic data.
Definition: 14
NodeLocation: 144,128,1
NodeSize: 64,24
Aliases: FormNode Fo2053752915
Variable X
Title: X
Description: Synthetic data for X for the linear regression tests.
Definition: Random(Uniform(1,50,over:Data_Index))
NodeLocation: 320,128,1
NodeSize: 64,24
Index Data_Index
Title: Data Index
Description: Indexes the training data points
Definition: 1..50
NodeLocation: 320,56,1
NodeSize: 64,24
Decision Ground_truth_std_dev
Title: Ground truth std dev
Description: The amount of noise added to the synthetic data. ~
The paper never specified how much noise they added to the data, only that they added Gaussian noise.
Definition: 20
NodeLocation: 144,200,1
NodeSize: 64,24
Aliases: FormNode Fo107595859
Variable Y
Title: Y
Description: Synthetic data for X for the linear regression tests.
Definition: ground_truth_w*X + ground_truth_b + Random(Normal(0,Ground_truth_std_dev,over:Data_Index))
NodeLocation: 320,200,1
NodeSize: 64,24
ValueState: 2,742,110,849,571,0,MIDM
GraphSetup: Graph_SymbolKey:1~
Att_ContLineStyle Graph_Primary_Valdim:4
ReformVal: [Data_Index,Null]
Att_ColorRole: Null
Att_SymbolRole: Null
Decision Model_to_use
Title: Model to use
Description: Which GPT model to use. You can set to ALL for comparison.
Definition: Choice(Self,0)
NodeLocation: 1304,200,1
NodeSize: 64,24
Aliases: FormNode Fo1919535187
{!40300|DomainExpr: Discrete('gpt-3.5-turbo','gpt-4',type:'text')}
Variable Prompt_point_list
Title: Prompt point list
Description: The part of the prompt that contains the data points so far.
Definition: Local CurGuess[] := Guess;~
Local GuessSoFar := 1..CurGuess;~
Local order := SortIndex(Loss[Guess=GuessSoFar], GuessSoFar, descending:true);~
Local g := guesses[Guess=order];~
Local loss := Loss[Guess=order];~
JoinText(~
f"input:~
w={g[wb='w']}, b={g[wb='b']}~
value: {loss:I}~
~
",~
GuessSoFar)
NodeLocation: 984,296,1
NodeSize: 64,24
ValueState: 2,148,154,1041,796,,MIDM
Index Guess
Title: Guess
Description: The guess number during the optimization.
Definition: 1..70
NodeLocation: 320,296,1
NodeSize: 64,24
Variable guesses
Title: guesses
Definition: Dynamic[Guess](~
if Guess <= Num_starting_guesses Then~
Initial_guesses~
Else ~
New_guess[Guess-1] ~
)
NodeLocation: 472,296,1
NodeSize: 64,24
ValueState: 2,585,441,854,185,,MIDM
Decision Num_starting_guesses
Title: Num starting guesses
Description: The optimization is started with a few random starting points provided to the LLM. This is the number to include. The paper used a value of 5.
Definition: 5
NodeLocation: 144,392,1
NodeSize: 64,24
Objective Loss
Title: Loss
Description: The goodness of fit of the guess to the data.
Definition: Sum( (y_guess - Y)^2, Data_Index)
NodeLocation: 800,200,1
NodeSize: 64,24
ValueState: 2,812,324,477,416,1,MIDM
GraphSetup: {!50400|Att_AxisOrKeyTitle Guess:Guess number}~
Att_ContLineStyle Graph_Primary_Valdim:5~
Att_GraphValueRange Loss:1,,0,,,,,0,3411356.4491271973,0
Variable Prompt
Title: Prompt
Description: The prompt sent to GPT.~
This is the same prompte used in the paper for this task but with one addition. I had to add the "Important" sentence at the end because it was adding a ton of descriptive text and not following an easily parseable format with their prompt.~
With the extra sentence, I still see it fail to follow the format about 1% of the time, but that point typically just gets ignored by the rest of the model logic.
Definition: f"Now you will help me minimize a function with two input variables w, b. I have some (w, b) pairs~
and the function values at those points. The pairs are arranged in descending order based on their~
function values, where lower values are better.~
~
{Prompt_point_list}~
Give me a new (w, b) pair that is different from all pairs above, and has a function value lower than~
any of the above. Do not write code. The output must end with a pair w, b, where w and b are~
numerical values.~
~
Important: Your response must contain only two numbers separated by a comma, with no other descriptive text or explanation."
NodeLocation: 1144,296,1
NodeSize: 64,24
ValueState: 2,180,186,781,581,,MIDM
Include 0,LinkLibrary OpenAI_API_lib,2,2,0,0,..\OpenAI API lib.ana
NodeLocation OpenAI_API_lib: 544,64,1
NodeSize OpenAI_API_lib: 64,24
Variable Response
Title: Response
Description: This does the actual querying of GPT -- sending the prompt and getting the response. The response text sent back by GPT is the result.~
The response is supposed to be a prospect point in the format, e.g., 14,16~
Definition: ( , elapsed ) := Measure_elapsed(~
if Guess\d\.?\d*),\s*(?**\d+\.?\d*)", Response, re:true, subpattern:wb, return:'S'))
NodeLocation: 1304,456,1
NodeSize: 64,24
ValueState: 2,84,90,1435,565,,MIDM
Index wb
Title: wb
Description: An index for the regression parameters (slope and intercept)
Definition: ['w','b']
NodeLocation: 472,200,1
NodeSize: 64,24
Variable basis
Title: basis
Description: The regression basis for the training data
Definition: Table(wb)(X,1)
NodeLocation: 472,120,1
NodeSize: 64,24
Variable Y_guess
Title: Y guess
Description: The predicted Y value given the guess for the coefficients.
Definition: Sum( basis*guesses, wb )
NodeLocation: 632,200,1
NodeSize: 64,24
{!50500|ArrowClass Arrow1091487827}
{!50500|NodeInfo: ,,,,,,,,,,,1}
{!50500|Att_HeadNode: Response}
{!50500|Att_TailNode: Num_starting_guesses}
Variable Initial_guesses
Title: Initial guesses
Description: The initial guesses to provide the LLM at the start.~
These really should be sampled with replacement so that there are no duplicates. I didn't implement this just because it rarely occurs (although I did see a duplicate once during development). ~
I just check this first to make sure they are all unique and hand-invalidate if I see a duplicate. (A duplicate isn't the end-of-the-world, doesn't invalidate the technique, it is just not ideal).
Definition: if Guess <= Num_starting_guesses Then~
Random(Uniform(10,20,integer:true,over:Guess,wb) )~
else~
Null
NodeLocation: 320,392,1
NodeSize: 64,24
ValueState: 2,292,298,769,270,,MIDM
Variable elapsed
Title: elapsed
Description: This tracks the elapsed time that it took to generate each response. Mostly for debugging purposes. Occasionally GPT's latency is so long things time out. This helps to track how long it is actually taking.~
If you select Model_to_use = ALL, this is the sum of the two models.~
The Total is essentially how long it took (the time taken by the model logic is insignificant compared to the GPT response times)
Definition: ComputedBy(Response)
NodeLocation: 1464,296,1
NodeSize: 64,24
ValueState: 2,244,250,703,570,,MIDM
Function Measure_elapsed( expr : Expression )
Title: Measure elapsed
Description: Accepts an expression and evaluates it. It returns the result along with a second return value with is the wall-clock time that it took to evaluate.~
Useful when timing how long evaluations takes.
Definition: Local tStart := Today(true);~
_( ~
Evaluate(expr),~
86400 * (Today(true)-tStart)~
)
NodeLocation: 1080,80,1
NodeSize: 64,24
FormNode Fo1919535187
Title: Model to use
Definition: 0
NodeLocation: 1352,32,1
NodeSize: 128,16
NodeInfo: 1,,,,,,,136,,,,,,0
Original: Model_to_use
FormNode Fo980011091
Title: ground truth w
Definition: 0
NodeLocation: 1352,64,1
NodeSize: 128,16
Original: ground_truth_w
FormNode Fo2053752915
Title: ground truth b
Definition: 0
NodeLocation: 1352,96,1
NodeSize: 128,16
Original: ground_truth_b
FormNode Fo107595859
Title: Ground truth std dev
Definition: 0
NodeLocation: 1352,128,1
NodeSize: 128,16
Original: Ground_truth_std_dev
Variable mean_elapsed
Title: mean elapsed
Description: The average time for each response during the search.
Definition: Mean(if elapsed=0 then null else elapsed)
NodeLocation: 1464,368,1
NodeSize: 64,24
Close LLMs_as_optimizers_linear_regression
**