OpenAI API library

(Redirected from Chat completion)



Release:

4.6  •  5.0  •  5.1  •  5.2  •  5.3  •  5.4  •  6.0  •  6.1  •  6.2  •  6.3  •  6.4  •  6.5


Requires the Analytica Enterprise edition or better

The Open AI API library is a collection of Analytica functions that interface with generative A.I. models from within your Analytica model. You can leverage the flexibility of large language models (LLMs) to perform tasks that would be hard to do in a formal program or model, and you can generate images from text. The library is also a great way to learn about generative A.I. from within Analytica. This page is a reference to the functions in the library. It is accompanied by a Tutorial on using the library. Going through the tutorial is a great way to learn about LLMs.

Download: OpenAI API lib.ana (v. 0.7) --- Release notes for v0.7.

Requirements

To use this library, you must have:

To get an OpenAI API key

  1. Go to https://platform.openai.com/ and log in or sign up for an account.
  2. Click on your profile picture in the top right and select View API keys
  3. Click Create new secret key.
  4. Copy this key to the clipboard (or otherwise save it)

Getting started

  1. Download the library and save it to your "C:\Program Files\Lumina\Analytica 6.4\Libraries folder.
  2. Launch Analytica
  3. Load your model, or start a new model.
  4. Select File / Add Library...., select OpenAI API library.ana [OK], select Link [OK].
  5. Open the OpenAI API lib library module.
  6. Press
    • either Save API key in env var to savesthe key on your computer so you can use it from any model on this computer.
    • or Save API key with your model to include the key in the model, so that anyone can run this model (using your OpenAI account). You should use this option if you plan to post the model on Lumina's ACP server.
  1. In each case, it shows a message box asking you to enter (or paste in) your API key.

You should now be able to call the API without having to re-enter your key each time. View the result of Available models to test whether the connection to OpenAI is working. This shows you the list of OpenAI models that you have access to.

Text generation from a prompt

A common way to interact with an AI Large Language Model (LLM) is to send it a prompt to which it provides a "completion" --i.e. a text response.

Function Prompt_completion( prompt, modelName, «optional parms» )

Returns a response ("text completion" ) given a «prompt». For example:

Prompt_completion("The little red corvette is a metaphor for") → "sensuality and excitement"

The function has multiple return values:

  1. The main return value is the textual response (the "completion").
  2. If you specify a «Completion_index» it returns multiple responses ("completions") indexed by «Completion_index».
  3. The finish_reason. Usually Null, but may be "stop" if it meets a «stop_sequence».
  4. Number of tokens (words and parts of words) in the prompt
  5. Number of tokens in the response
  6. A reference to the full response from the API call

Examples

Prompt_completion("Translate 'happy birthday' into Chinese") → "生日快乐 (shēng rì kuài lè)"


Local ( completion_text, finish_reason, prompt_tokens, completion_tokens, raw_response )
        := Prompt_completion("The little red corvette is a metaphor for") Do
[ completion_text, finish_reason, prompt_tokens, total_tokens, raw_response ]
   
completion_text "freedom, desire, and youthfulness"
finish_reason "stop"
prompt_tokens 16
completion_tokens 15
raw_response «ref»

Optional parameters

The function has many optional parameters:

  • «modelName»: The OpenAI model to use. It must support Chat. 'gpt-3.5', 'gpt-3.5-turbo' and 'gpt-4' are common choices.
  • «functions» : One or more functions that the LLM can call while generating its completions.
  • «temperature»: A value between 0 and 2.
    Smaller values are more focused and deterministic, Higher values are more random. Default=1.
  • «top_p»: A value 0<top_p<=1. An alternative to sampling temperature.
    Do not specify both «temperature» and «top_p».
    A value of 0.1 means only tokens comprising the top 10% of probability mass are considered.
  • «Completion_index»: Specify an index if you want more than one alternative completion.
    The results will have this index if specified. The length of this index specifies how many completions to generate.
  • «stop_sequences»: You can specify up to 4 stop sequences.
    When it generates one of these sequences, it stops and returns the completion up to that point.
  • «max_tokens»: The maximum number of tokens to generate in the chat completion.
  • «presence_penalty»: Number between -2.0 and 2.0.
    Positive penalizes new tokens based on whether they appear in the text so far.
  • «frequency_penalty»: Number between -2.0 and 2.0.
    Positive values penalize new tokens based on their existing frequency in the text so far.
  • «seed»: An integer. When you set this, the model tries to return predictable, "almost" deterministic output, resulting in the same response if called the same way.

See the tutorial on using this library for more details. Also, see #Function callbacks below.

Managing a chat

A chat is a conversation with several back-and-forth interactions with the LLM. To use a chat, you need to store the chat history in variables within your model. The Append_to_chat functions makes it easy to manage your conversation history over successive interactions. The Chat_completion function processes the next response in a conversation.

A chat is encoded by three nodes:

  • The chat index, usually 1..n for n interactions so far.
  • A message history, a Table containing the prompts and responses, indexed by the chat index.
  • A role history, a Table containing the role for each message, indexed by the chat index. Each interaction has one of three possible roles: 'system', 'user' or 'assistant'. The first two mark text that you, your end-user, or your model creates, whereas 'assistant' marks the responses from the LLM. Typically you will have one 'system' prompt at the beginning of the chat with the instructions for the LLM.

Function Append_to_chat(messageHistory, roleHistory, ChatIndex, message, role)

Destructively appends a new «message» and «role» to the end of a conversation. A conversation consists of three globals:

  • A «messageHistory» variable, defined as a Table («ConversationHistory»)
  • A «roleHistory», defined as a Table( «ConversationHistory» )
  • A «ChatIndex», usually 1..n for n interactions so far.

«message» is the new message to append. «role» should be either "user", "assistant" or "system".

Because this destructively changes global variables, it must be called from an event handler like OnClick or OnChange, and cannot be called while an evaluation is in progress.

Function Chat_completion(messages, roles, Chat_index, modelName, «more optional parameters»)

This returns the next response in a Chat given the history of messages so far.

Required parameters:

  • «messages» : The sequence of messages so far, in order, indexed by «Chat_index».
  • «role» : Each message has a role, which must be one of 'system', 'user', 'assistant', or 'function'.
  • «Chat_index»: Indexes the sequence of messages in the conversation

The function has multiple return values:

  1. The main return value is the textual response (the content).
  2. If «Completion_index» is specified, this is a set of completions indexed by «Completion_index».
  3. The finish_reason. Usually Null, but may be "stop" if a «stop_sequence» is encountered.
  4. Number of tokens in the prompt, which includes all the tokens in «messages».
  5. Number of tokens in the response
  6. A reference to the full response from the API call

Optional parameters:

  • «modelName»: The OpenAI model to use. It must support Chat. 'gpt-3.5', 'gpt-3.5-turbo' and 'gpt-4' are common choices.
  • «functions» : One or more functions that the LLM can call during its completions.
  • «temperature»: A value between 0 and 2.
    Smaller values are more focused and deterministic, Higher values are more random. Default=1.
  • «top_p»: A value 0<top_p<=1. An alternative to sampling temperature.
    Do not specify both «temperature» and «top_p».
    A value of 0.1 means only tokens comprising the top 10% of probability mass are considered.
  • «Completion_index»: Specify an index if you want more than one alternative completion.
    The results will have this index if specified. The length of this index specifies how many completions are generated.
  • «stop_sequences»: You can specify up to 4 stop sequences.
    When one of these sequences is generated, the API stops generating.
  • «max_tokens»: The maximum number of tokens to generate in the chat completion.
  • «presence_penalty»: Number between -2.0 and 2.0.
    Positive penalizes new tokens based on whether they appear in the text so far.
  • «frequency_penalty»: Number between -2.0 and 2.0.
    Positive values penalize new tokens based on their existing frequency in the text so far.
  • «seed»: An integer. Specify if you want a deterministic response -- the same response if called twice with identical parameters.

Function callbacks

Callbacks are User-Defined functions that you supply to Prompt_completion and Chat_completion that they can call while generating the response to your prompt. For example, the LLM could use one to gather results from your model to incorporate into the conversation. You can also use this to provide tools for it to use for things it is not very good at on its own, such as arithmetic.

Your callback functions should have only simple parameters, accepting scalar text or numbers. The language models do not have a way to pass arrays or indexes. It is a good idea to qualify each parameter as either Text or Number. It is important to include a Description attribute which gives the language model guidance on when and how to use your function.

For example:

Function get_current_weather(location: text; unit: text optional)
Description: Get the current weather in a given location
Parameter Enumerations:
   unit
       "celsius"|
       "fahrenheit"|
Definition: AskMsgText(f"What is the current weather in {location}?", "API function call")

To allow it to use this function, pass the function identifier in the «functions» parameter, e.g.,

Prompt_completion("Do I need an umbrella today? I'll be taking a hike in Portland, Oregon", functions: get_current_weather)

When you (or an LLM) calls the function, it shows a message box on the screen asking "What is the current weather in Portland, Oregon?". This message box occurs when the AskMsgText in get_current_weather is evaluated by the LLM.

Type: Drizzly with occasional thunder showers, and the final return value is

"Yes, it is recommended to bring an umbrella today as there are occasional thunder showers in Portland, Oregon."

Using a meaningful parameter name can help the language model understand what value to pass for that parameter, but the LLM will often benefit from including an additional description of each parameter. To do this, add the parameter descriptions inside your function Description using the following format:

Description:
Get the current weather in a given location.
Parameters:
  • location: The location in the format City, State. E.g., "Tucson, AZ".
  • units: Units to use for temperature.

If a parameter description has more than one line, you should indent each line (using TAB). To find the parameter descriptions, in looks for this format -- the title "Parameters", followed by lines where the first character is a bullet (either * or •, the latter is typed with the keystrokes \bullet[TAB]), followed by the parameter name, a colon, then the description. The parameter name may optionally appear inside chevrons, e.g, «location», in which case the bullet is optional.

Use the ParameterEnumeration attribute to specify possible enumerated values for parameters that expect specific values. You may need to make this attribute visible first. The parameter name should appear on its own line; then each enumerated value should appear on a separate indented line. Each value should be followed by a bar (|) then an optional description of the value (this description isn't passed to the LLM, but is used by Expression Assist). Text value should appear with explicit quotes.

Similarity embeddings

A similarity embedding captures the semantic content of a chunk of text as a numeric vector. Embedding vectors can be compared to other embedding vectors to judge how semantically similar are the topics in two chunks of text. Chunk sizes typically range from a few words up to a few hundred words.

Similarity embeddings have many uses. One of the most common is Retrieval Augmented Generation, where your code finds a small number of reference text chunks with embeddings similar to the user's question, You then include these in the LLM prompt, along with the actual question, when calling Prompt_completion ar Chat_completion.

Function Embedding_for(txt)

Returns an embedding vector for the text «txt». If you pass it an array of text strings, it passes them all in a single API call. Each embedding result is indexed by Ada_embedding. It uses the text-embedding-ada-002 OpenAI model, which is tuned for fast similarity embedding. The price charged by OpenAI for each embedding is extremely low.

Function Embedding_similarity(x, y)

Compares two embedding vectors, «x» and «y», and returns their similarity. A larger number means that they are more similar.

Example:

Index Record_num := 1..10
Variable Processor_name ::= Table(Record_num) { CPU & GPU names in inconsistent formats }
Variable Proc_name_embedding ::= Embedding_for(Processor_name)
Variable Query ::= "Graphics card for gamer"
Variable Similarity_to_query ::= Embedding_similarity(Embedding_for(query), Proc_name_embedding)
Similarity to processorName query.png
To generate this plot:
  1. In Graph Setup / Chart type,
    1. select Bar chart,
    2. Swap horizontal and vertical
    3. Sort by data spread.
  2. Press the XY button
    1. check Use another variable
    2. add Processor_name.
  3. Set the vertical axis pivot to Processor_name.
  4. Right-click on the key for Record_num and select Hide Key.

Image generation

Function Generate_image( prompt, RepeatIndex, size )

  • «prompt»: Textual prompt describing what you want an image of.
  • «RepeatIndex»: (optional) Specify an index if you want multiple variants of the same «prompt». The length of the index, up to 10, variants will be generated.
  • «size»: The size of the image. It must be one of '256x256', '512x512' or '1024x1024'. If unspecified, the default is '1024x1024'.

Generates an image from a textual description.

Example:

Generate_image("Lollipops in the sky atop of clouds", size:'256x256')Lollipops in the sky atop of clouds.png

Text to speech

Function Text_to_speech( text )

Speak «text» out loud on your default speaker output.

Currently this works only in Desktop Analytica, not yet in ACP.

Return value usually ignored. Returns the temporary file path used, if any. Does not wait for the audio playback to complete before returning, but does wait for any previous utterance to complete before starting to pronounce the text so that it doesn't cut off the previous utterance.

Optional parameters:

  • «voice»: Select from several different voices. Available voices: 'alloy', 'echo', 'fable', 'onyx', 'nova', and 'shimmer'.
  • «model»: select either
    • 'tts-1': Lower latency but lower quality. May have some background static.
    • 'tts-1-hd': Slower but higher quality
  • «format»: The sound file format used in the transfer and for playback. You probably don't want to change this, but the possible values are:
    • 'mp3'|(default) The most universally supposed choice.
    • 'opus'|For internet streaming and communications, low latency. This one is NOT supported by the media player used here in Desktop Analytica.
    • 'aac'|For digital audio compression, preferred by YouTube, Android, iOS
    • 'flac'|For lossless audio compression, favored by enthusiasts for archiving.

Function Step_current_speech_playback( )

Aborts any speech that is in the process of being pronounced via a previous call to Text_to_speech.

API Errors

When calling OpenAI's API, you will encounter various errors from the API server (in the form of HTTP error codes). Unfortunately, these are common even when you don't have a bug in your own code, and are often hard to deal with in a graceful or automatic way. Future revisions of this library will hopefully incorporate enhancements for reducing the frustrations of these errors as we get more experience with ways to deal with them.

Two common error sources that we have observed are:

  • Intermittent errors from server. These are often codes like "page not found". The exact same set of queries may work when repeated, but it is unclear why the server reports the error in one run but not the next. We have seen these more often when sending a rapid sequence of calls, which might reflect a cause that is somehow correlated to bursts of calls, or it might just be that we notice it more when executing code that requires lots of calls.
  • Rate limit exceeded errors when using the gpt-4 model. Occurs when even a very small number of queries are sent in quick succession, such as when breaking a large page into 20 chunks and sending 20 queries in succession to process each chunk separately. It appears that the actual query rate is far below the rate limits published by OpenAI, but yet they occur.

These errors can be especially frustrating because they both tend to occur while array abstracting over a problem (where each part is solved by a separate call). If an error causes the calculation to abort, the results for the earlier calls in that iteration are not retained. If you get the rate-limit errors with gpt-4, you are charged by OpenAI for the tokens in the queries that fail. (GPT-4 queries are fairly expensive -- for example, it can be on the order of $1 to process the text of a long article, whereas gpt-3.5-turbo queries are dirt cheap).

Errors like these that are issued by OpenAI's server tend to be cryptic and not very informative about what the cause is. At present, these errors and the frustration of dealing with them is probably the dominant limitation of the current library.

We expect the library to evolve with time, with issues like these being a big area for improvements, so you should be prepared to update to newer versions of the library.

See Also

Examples
  • Using GPT to clean inconsistent data.ana
    Illustrates how to use GPT to clean data that has been entered in extremely inconsistent ways. Takes you through the steps one-by-one for one column of data in an actual set from Kaggle.
    Assumes you have already downloaded the library.
  • LLMs as optimizers - linear regression.ana
    This model reproduces an experiment from a paper out of DeepMind:
    Can you implement and run an optimization algorithm entirely in English? This experiment explores whether GPT-3.5-turbo or GPT-4 can implement the optimization required for 1-D linear regression. The paper reports a positive result for both (with GPT-4 doing a lot better), whereas this model finds that it works with GPT-4 but does not work in GPT-3.5-turbo.
    Assumes you have already downloaded the library (ver 0.5 or later).
  • Chess with GPT.ana
    Play chess against the GPT models. Featured in this video.
    Make sure you have at least ver 0.6 of the API library, and Analytica Enterprise or better.
Comments


You are not allowed to post comments.