ReadFromUrl
Release: |
4.6 • 5.0 • 5.1 • 5.2 • 5.3 • 5.4 • 6.0 • 6.1 • 6.2 • 6.3 • 6.4 • 6.5 |
---|
Reads a page or text file from a URL and returns the result as a text value.
Requires Analytica Enterprise
ReadFromUrl(url)
This function can read text data from an HTTP or HTTPs page (a standard or secure web page), from FTP, or from GOPHER. You can also submit data as would occur on a web page when a user submits an HTTP form. You can also use it to "call" a web service to obtain data. Its result is a text value. You can use functions like ParseCSV() to parse the text into data.
There are several optional parameters:
- «method»
- The HTTP method, usually "GET" or "POST". More esoteric methods include "HEAD", "PUT", "DELETE", "OPTIONS". The verbs "CONNECT", "PATCH" and "TRACE" cannot be used.
- «formFields», «formValues», «formIndex»
- Data for form field values to submit to a web page. Usually, you use «formValues» as an array indexed by «formFields», or if they both have a single common index. If «formValues» and «formFields» have multiple indexes in common, you should specify the common index as «formIndex».
- «httpHeaders»
- Additional HTTP headers (separate with
Chr(13)
). - «httpContent»
- custom submitted HTTP content{{Release|6.4||
- «fieldHeaders»
- Used only when sending multi-part requests. In multipart messages, the parts and the content for each part are passed in the «formFields» and «formValues» parameters, and can be text, images or binary blobs. But in general you also can specify separate httpHeaders for each part, which are passed in this parameter. The «httpHeaders» is for the full message, whereas the headers here (each is multi-line text) are the headers for each part. «fieldHeaders» is indexed by «formIndex». By specifying this parameter, the message will be configured as multipart.
- «connectTimeoutMs», «sendTimeoutMs», «receiveTimeoutMs»
- Timeouts for the key stages of connecting to the server, sending the request, and receiving the response, each expressed in milliseconds.
- «connectionType»
- One of
'asyncWait'
(default),'synchronous'
,'asynchronous'
,'SSE'
, or'websocket'
. When'asynchronous'
,'SSE'
or'websocket'
, a «webConnection» object is returned instead of a response.
Return values
ReadFromUrl has multiple return values:
- The response. Could be text, image, or binary.
- HTTP status code. This is 200 when successful.
- HTTP status code textual description.
- The HTTP response headers
- The content-type (if specified)
For example, to capture the status code, use:
Local ( response, code ) := ReadFromUrl(....);
or to avoid a warning and handle a code<>200
yourself,
Local ( response, code, errMsg, headers, contentType ) := IgnoreWarnings(ReadFromUrl(....));
When «connectionType» is 'asynchronous'
, 'SSE'
or 'websocket'
, the return value is a «webConnection» object and there are no auxillary return values (they aren't available yet). In these cases, ReadFromUrl returns immediately without waiting for the response. You then use the methods _WebConnectionStatus, _WebConnectionRead, _WebConnectionSend (web sockets only), and _WebConnectionClose to receive the content.
Reading from a Web page
To read the contents of a web page,usually in HTML format, just supply the URL, e.g.:
ReadFromUrl("http://lumina.com")
If you omit the "http://" part, it defaults to an http query, e.g.:
ReadFromUrl("lumina.com")
Obtaining an image from a web page
ReadFromUrl() can download images with standard formats (i.e., JPG, PNG, GIF, etc) from HTTP pages by supplying the URL, e.g.:
ReadFromUrl("analytica.com/wp-content/uploads/2023/07/analytica-img.png")
"
The result is a picture value object. You can assign the result to the Pict attribute of an object on a diagram to view the image in the model (and to cause the image to be saved with the model file).
Submitting Data to a Web Page
To simulate the submission of HTML form data when querying a web page, you can submit the information using either GET or POST methods. With a GET method, you normally see the parameters appear on the URL itself. With a POST method, you normally don't see the parameters -- they are passed in the body of the HTTP request.
To submit form data, you can set up an table of fields values indexed by field names, usally as an index as a list-of-labels. The function call uses these parameters:
- ReadFromUrl(url, method, formValues, formFields, formIndex)
You only need to specify the «formIndex» parameter if «formValues» and «formFields» are arrays that may have more than one index in common.
For example, the following queries Google for "Analytica":
INDEX fieldNames := ["hl", "q"];
VARIABLE form := Array(fieldNames, ["en", "Analytica"]);
ReadFromUrl("http://google.com/search", "GET", form, fieldNames)
It returns the result in HTML format.
You don't have to worry about URL-encoding the field names or values. If they containnon-alpha numeric characters in either, it will encode them before they are submitted.
Submitting multi-part data
Requires Analytica 6.4
Data sent in an HTTP method:post
can usually be formatted in two ways:
Content-Type:application/x-www-form-urlencoded
Content-Type:multipart/form-data
When a form value contains binary data or an image, you must use the multipart format. By default, it uses Content-Type:application/x-www-form-urlencoded
when possible, but uses Content-Type:multipart/form-data
when a form value requires a multipart format.
To force a multipart format when it is not required (e.g. if the web-service you requires a multipart format), include the header explicitly, e.g.:
ReadFromURL( url, method:'POST', formValues:vals, formIndex:I, httpHeaders: 'Content-Type:multipart/form-data' )
Or you can specify the «fieldHeaders» parameter (even by passing it an empty string). In either case, it generates a boundary string between parts. If you provide your own boundary, it will use it, but you need to be careful that this exact sequence of bytes does not appear in any of the content. For example
ReadFromURL( url, method:'POST', formValues:vals, formIndex:I, httpHeaders: 'Content-Type:multipart/form-data;boundary=MyBoundary' )
would malfunction if one of your vals contains the byte sequence "MyBoundary".
When sending multi-part data, you may also want to add custom headers to individual items. You can do this using the «fieldHeaders» parameter, which also uses the «formIndex». One example is when you want a field to appear to be the contents of a file and you want to provide the file name.
LocalIndex Field := ['Date','Upload'];
Local fileContents := ReadBinaryFile( filename, typeFlags: 7 {binary data blob} );
Local values:= Array( Field, [ Today(), fileContents ] );
Local specialHeader := f"Content-Disposition: form-data; name=""File""; filename=""{filename}""";
Local fieldHeaders := Array( Field, [ "", specialHeader ] );
ReadFromURL( url, 'Post', formValues: values, formIndex: Field, fieldHeaders: fieldHeaders )
In this code, specialHeader creates a header field for the form item 'Upload' which includes the file name. The server who receives this will think you used an HTML file upload field to select a file named filename. In addition, the server will see the field name as being "File"
(from the specialHeader) overriding the name "Upload" in the «formIndex» parameter.
Note that if you are not adding a filename or altering the usual Content-Disposition
, you don't have to specify it as a fieldHeader. If you don't specify it, the function automatically creates Content-Disposition
for each part.
A second common case where you might need to customize the «fieldHeaders» is when you need to customize the Content-Type
. For example, suppose you upload a video file, which you read into memory as a binary data term. If you don't specify the content-type, it will default to Content-Type: application/octet-stream
. Thus, you might specify it for the field:
ReadFromURL(url, "Post", vals, fields, I, fieldHeaders: If I="video clip" then "Content-Type: video/mp4" else "" )
If you are sending only a single image or a single binary data term, it is usually simpler to pass it as the «httpContent» rather than using form fields at all. In that case the message is a single part (not multipart) body, assuming you also control what the receiving side expects.
Obtaining content from an FTP site
To obtain content from an FTP site, use:
ReadFromUrl("ftp://site.com/directory/file.txt")
The content must be textual. Binary files will be corrupted as they are read into a text value (specifically, '\0' characters will be converted to spaces).
Authentication
FTP sites usually require user and password authentication, as do web services and some web pages. You can embed the user name and password in the URL like this:
ReadFromUrl("http://user:password@www.site.com/dir/page.htm")
ReadFromUrl("ftp://user:password@www.site.com/dir/page.htm")
If the user or password contains any of the special characters @ : / ? #
, these must be percent encoded. Thus an email user name, john.doe@gmail.com
, would be
ReadFromURL("https://john.doe%40gmail.com@domain.com")
Here are the % codes for the other special chars that might appear in a username or password:
Char Code Space %20 @ %40 : %3A / %2F ? %3F # %23
Sending data CGI scripts and proprietary web services
If you call a CGI service, the CGI program will accept input in its own format, which could be arbitrary. You can send this data as follows:
ReadFromUrl("http://server.com/cgi-bin/myProgram.cgi", httpContent: data)
In this example, data contains the data that you are submitting to the CGI script. This data will form the body of the HTTP request. Content may only be used with HTTP requests, not with FTP or Gopher requests.
Various web services fall into this category as well, where data being submitted via HTTP may be in a proprietary format.
In some cases, you may also need to include additional HTTP headers in your request. You can insert these using the optional «httpHeaders» parameter. If you have more than one HTTP header, separate them using a CR character, Chr(13). If you enter these into a definition (e.g., with quotes), you can just type a new line, or if you enter them into a single edit table cell, pressing ALT-ENTER to insert a CR (new-line) into the cell. Otherwise, you can use the & operator to concatenate each header line with Chr(13).
ReadFromUrl("http://somehost.com", httpHeaders: "Accept:text/xml"&Chr(13)&"User-Agent:MyModel.ana")
The content value is passed directly with no special encoding of characters.
HTTP Status Codes
During an HTTP request, if the status code returned by the server is greater than or equal to 400, ReadFromUrl issues a warning displaying the status code and status text, unless the Show Result Warnings preference is turned off.
Proxy Servers
ReadFromUrl uses information stored in the system registry to determine whether or a proxy server should be used to access the internet. The configuration can be set up using Internet Explorer or Chrome web browsers (but not through Firefox). The proxy configuration is stored in the system registry in the hive:
HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings
in the registry values ProxyEnable, ProxyServer and ProxyOverride.
It should work with TIS, SOCKS and CERN style proxy servers as long as Internet Explorer is installed (SOCKS support requires IE).
Connection types
In most situations, ReadFromUrl sends a request to a remote server then waits until it has received the response. It then returns the response. This is synchronous operation. This occurs when you omit the «connectionType» parameter, or when you specify it as 'asynchWait'
or 'synchronous'
.
The 'synchronous'
option is fully synchronous, turning control over to the operating system while it waits for a response. The downside of this is that no window events arrive to the Analytica process during this time, which means the screen cannot redraw and the application cannot respond to Ctrl+Break, etc. In Analytica 6.4 and earlier, this is the mode that was used by ReadFromUrl. This fully synchronous connection type is the lowest in complexity, so it can serve as a fallback if you discover an unexpected problem in the default 'asynchWait'
method.
The 'asynchWait'
method is synchronous as far as your model and code is concerned, but it allows Windows events to arrive and be processed. It can be aborted by pressing Ctrl+Break or Stop calculation. And the screen can redraw while you are waiting. But as with fully synchronous operation, the function call does not return until the full response is received, and the response is then returned directly as the return value of ReadFromUrl.
The remaining connection types are asynchronous. In these cases, ReadFromUrl returns immediately without waiting for the response from the server. These are each discussed in the next subsections.
Asynchronous connections
With an 'asynchronous'
«connectionType», ReadFromUrl returns immediately, allowing your code to work on something else while the server works on your request. This would be applicable to RESTful web services that take a long time to process AND where your code has something it can work on concurrently, or when you want to launch multiple concurrent web service calls so that they can process concurrently. If you aren't computing while waiting or launching multiple simultaneous calls, then you probably don't want to use an 'asynchronous'
connection type, simply because it is more complex for your own code.
When you specify connectionType: 'asynchronous'
, ReadFromUrl returns a «webConnection» object immediately. You can then call
_WebConnectionStatus( wc )
to test the status, which moves through the stages: 'Connecting'
, 'Waiting'
, 'ResponseReady'
, 'Done'
and 'Closed'
.
You can receive the message with
_WebConnectionRead(wc)
which blocks until the message arrives. Or you can use
_WebConnectionRead(wc, block:false)
which returns immediately. If so message is available (i.e., _WebConnectionStatus(wc)
is something other than 'ResponseReady'
), it returns Null.
An asynchronous connections is one-directional, so _WebConnectionSend does nothing. Also, the entire message, when ready, is received in a single call to _WebConnectionRead. If you were to call it multiple times (in the 'Done'
or 'Closed'
states), it would simply return null.
When you are done, you can close the connection by releasing the «webConnection», or by calling _WebConnectionClose(wc)
.
Once you have stored (cached) the «webConnection» object, so that it is not released, then stop-calculation events, such as when you press Ctrl+Break, do not cancel the communication. However, if you are only holding the «webConnection» in a local variable and the calculation aborts, then the «webConnection» will, in general, be released, thus closing the connection.
Streaming responses (SSE)
Server-Sent Events (SSE) is a standard used when the HTTP server streams the response rather than sending it all at once. The response is a sequence of messages (events), and you can process these message-by-message as they arrive by setting «connectionType» to 'sse'
(or to 'stream'
, which is a synonym). SSE is still a uni-directional communication channel, with the server sending the events only as a stream. After the initial request, the client cannot send additional data to the server.
Most people have used ChatGPT and thus seen an example of streaming SSE in action. In the case of ChatGPT, the server sends one token at a time to the UI, and you see the response appear token-by-token without having to wait for the entire response to appear.
To use SSE streaming, specify connectionType: 'sse'
(or synonymously, connectionType: 'stream'
). A «webConnection» is returned immediately. Use
_WebConnectionStatus(wc)
to monitor the status of incoming messages. The status moves through the states: 'Connecting'
, 'Waiting'
, 'ResponseReady'
, 'Waiting'
, 'ResponseReady'
, 'Waiting'
, 'ResponseReady'
, 'Waiting'
, ...., and 'Closed'
. If a message arrives before you have read it, then it skips the 'Waiting'
state.
To read the next incoming message, use
_WebConnectionRead(wc)
If the next message has not yet arrived, this waits (blocks) until it arrives. Or you can call
_WebConnectionRead(wc, block:false)
which return Null if a full message has not yet been received.
You will read each event message successively by iterating over calls to _WebConnectionRead. The SSE standard does not have a standard for flagging the end of the stream, but many specific applications agree on a common convention. For example, the OpenAI API sends "[DONE]"
as the last message. However, with or without a special closing message, either side might close the connection. Once closed, _WebConnectionRead will simple return Null, so to avoid an infinite loop you should also check that _WebConnectionStatus is not equal to 'Closed'
Because SSE is not bi-directional, the _WebConnectionSend method does nothing with an 'SSE' connection.
Websocket connections
A Websocket is a bi-directional communication standard, which starts with an HTTP request. Your client model can send and receive messages with the server in any order, and the connection can remain open for an extended period of time.
Server URLs that expect a websocket connection usually start with a ws://
or wss://
, but this is only a convention and not required. The way to tell ReadFromUrl to open a websocket connection is to specify connectiontype:'websocket'
, as in
ReadFromUrl( "ws://acme.org/interactive", connectionType: 'websocket' )
The ReadFromUrl call returns after the header response has been received, and if the server has agreed to the websocket upgrade, it returns a «webConnection» object. If the server does not upgrade, then it returns the response in the same way an 'asynchWait'
call would (i.e., as text in most cases). There is a good chance that this response contains error message information from the server.
Use _WebConnectionStatus to examine the current status of the connection, which moves through the states 'Waiting'
and 'ResponseReady'
multiple times as messages are received and read. Eventually if either side closes the connection, the status becomes 'Closing'
.
To read the next incoming message, you might optionally check the _WebConnectionStatus to make sure it is 'ResponseReady'
, and then call
_WebConnectionRead(ws)
to receive the message. To send a message, call
_WebConnectionSend(ws, "Hello out there")
Messages are asynchronous is both direction, so the client and server don't need to alternate.
To close the connection, release the «webConnection» object, or call _WebConnectionClose(ws)
.
Limitations
The function waits until the full request from the server has been received. It does not respond to Ctrl+Break. Some requests (e.g. if there is a problem), may take a while to time out before you can return to using Analytica.
On computers that aren't always connected to the internet, or need to dial up a modem or other connection to obtain access, the function does nothing to attempt to establish a connection. It will simply fail with an error.
Several possible error conditions return a cryptic message with only an error code number.
If cannot download binary data. It can only obtain text or pictures from HTTP sources.
It cannot access file system files, e.g., you can't use a url starting with "file://...".
History
ReadFromUrl was introduced in Analytica 4.2.
Analytica 6.4 added support useful for some web service calls that includes
- the transfer of binary data
- multi-part message payloads in both directions
- Explicit control over timeouts
- Access to the HTTP status code and text, the HTTP response headers, and the content type (as aux return values).
Analytica 6.5 added asynchronous connection types (WebConnections), and the 'asynchWait'
connection type that responds to Ctrl+Break and windows events (like redraw events) while waiting for a lengthy web service call.
See Also
- WebConnections
- OpenURL
- ReadTextFile
- RunConsoleProcess
- ReadImageFile
- TextCharacterEncode
- Retrieving Content From the Web
- Links or URL in model attributes
- Analytica example model:
"Function Examples/Map images from internet.ana"
Enable comment auto-refresher