Difference between revisions of "ReadFromUrl"

m
Line 3: Line 3:
  
 
{{ReleaseBar}}
 
{{ReleaseBar}}
 +
 +
Reads a page or text file from a URL and returns the result as a text value.
  
 
''Requires Analytica Enterprise''
 
''Requires Analytica Enterprise''
  
 
== ReadFromUrl(url) ==
 
== ReadFromUrl(url) ==
Reads a page or text file from the given URL and returns the result as a text value.
 
  
Using this function, you can read from an HTTP or HTTPs page (e.g., web page or secure web page), from FTP, or from GOPHER.  You can also submit data as would occur on a web page when a user submits an HTTP form. The content retrieved must be text.  It can also be used to "call" web services to obtain data.  The data is returned as a text value, so in general, you will need to use other Analytica functions to parse the data.
+
This function can read from an HTTP or HTTPs page (e.g., standard or secure web page), from FTP, or from GOPHER.  You can also submit data as would occur on a web page when a user submits an HTTP form.   You can also use it to "call" web services to obtain data.  The result is returned as a text value. You can functions like [[ParseCSV]]() to parse the data.
  
There are several optional parameters that may be useful in various context.
+
There are several optional parameters:
;<code>method</code>: The HTTP method, usually "GET" or "POST".  (More esoteric methods include "HEAD", "PUT", "DELETE", "TRACE", "OPTIONS", "CONNECT", and potential custom service methods.
+
;<code>method</code>: The HTTP method, usually "GET" or "POST".  More esoteric methods include "HEAD", "PUT", "DELETE", "TRACE", "OPTIONS", "CONNECT", and potential custom service methods.
;<code>formFields, formValues, formIndex</code>: Data for form field values submitted to a web page.  Most often, just «formFields» and «formValues» are used, which must be arrays with a common index.  When array abstracting such that multiple indexes could be in common, then you should specify the common index using the «formIndex» parameter.
+
;<code>formFields, formValues, formIndex</code>: Data for form field values to submit to a web page.  Usually, you use «formValues» as an array indexed by «formFields», or if they both have a single common index.  If  «formValues» and «formFields» have multiple indexes in common, you should specify the common index as «formIndex».
 
;<code>httpHeaders</code>: Additional HTTP headers (separate with <code>Chr(13)</code>).
 
;<code>httpHeaders</code>: Additional HTTP headers (separate with <code>Chr(13)</code>).
 
;<code>httpContent</code>: custom submitted HTTP content
 
;<code>httpContent</code>: custom submitted HTTP content
  
 
== Reading from a Web page ==
 
== Reading from a Web page ==
To obtain the contents of a web page, which is usually in HTML format, just supply the URL, e.g.:
+
To read the contents of a web page,usually in HTML format, just supply the URL, e.g.:
 
:<code>ReadFromUrl("http://lumina.com")</code>
 
:<code>ReadFromUrl("http://lumina.com")</code>
  
If you leave off the "http://" part, it defaults to an http query, e.g.:
+
If you omit the "http://" part, it defaults to an http query, e.g.:
 
:<code>ReadFromUrl("lumina.com")</code>
 
:<code>ReadFromUrl("lumina.com")</code>
  
 
== Obtaining an image from a web page ==
 
== Obtaining an image from a web page ==
Images with recognized image formats (i.e., JPG, PNG, GIF, etc) can be downloaded from HTTP pages but just supplying the URL, e.g.:
+
ReadFromUrl() can download images with standard formats (i.e., JPG, PNG, GIF, etc) from HTTP pages by supplying the URL, e.g.:
 
:<code>ReadFromUrl("lumina.com/images/lum_AnalyticaLogo_Tagline_Snagged.png")</code>
 
:<code>ReadFromUrl("lumina.com/images/lum_AnalyticaLogo_Tagline_Snagged.png")</code>
  
The result is a picture value object.  The result can then be assigned to the ''Pict'' attribute of an object on a diagram in order to view the image from within the model (and to cause the image to be saved with the model file).
+
The result is a picture value object.  You can assign the result to the [[Pict]] attribute of an object on a diagram to view the image in the model (and to cause the image to be saved with the model file).
  
 
== Submitting Data to a Web Page ==
 
== Submitting Data to a Web Page ==
To simulate the submission of HTML form data when querying a web page, you can either submit the information using GET or POST methods.  With a form uses a GET method, you would normally see the parameters appear on the URL itself.  With a POST method, you would normally not see the parameters -- they would be passed in the body of the HTTP request.
 
  
To submit form data, you need to set up an array of fields and an array of field namesThe fields and field names need to share a common indexA common way to do this is to create the field names index as a list-of-labels, and then create a table based on this index for the fields.
+
To simulate the submission of HTML form data when querying a web page, you can submit the information using either GET or POST methodsWith a GET method, you normally see the parameters appear on the URL itselfWith a POST method, you normally don't see the parameters -- they are passed in the body of the HTTP request.
  
 +
To submit form data, you can set up an table of fields values indexed by field names, usally as an index as a list-of-labels.
 
The function call uses these parameters:
 
The function call uses these parameters:
 
:[[ReadFromUrl]](url, method, formValues, formFields'', formIndex'')
 
:[[ReadFromUrl]](url, method, formValues, formFields'', formIndex'')
  
The «formIndex» parameter is the index that «formValues» and «formFields» have in common.  When it is guaranteed that there will be only one index in common, such as when «formFields» is an index, then the «formIndex» parameter is unnecessary.
+
You only need to specify the «formIndex» parameter if «formValues» and «formFields» are arrays that may have more than one index in common.  
  
 
For example, the following queries Google for "Analytica":
 
For example, the following queries Google for "Analytica":
:<code>Index fieldNames := ["hl", "q"];</code>
+
:<code>INDEX fieldNames := ["hl", "q"];</code>
:<code>Var form := Array(fieldNames, ["en", "Analytica"]);</code>
+
:<code>VARIABLE form := Array(fieldNames, ["en", "Analytica"]);</code>
 
:<code>ReadFromUrl("http://google.com/search", "GET", form, fieldNames)</code>
 
:<code>ReadFromUrl("http://google.com/search", "GET", form, fieldNames)</code>
  
The result obtained is in HTML format.
+
It returns the result in HTML format.
  
You do not have to worry about URL-encoding the field names or values.  If there are non-alpha numeric characters in either, they will be encoded before they are submitted.
+
You don't have to worry about URL-encoding the field names or values.  If they containnon-alpha numeric characters in either, it will encode them before they are submitted.
  
 
=== Submitting multi-part data ===
 
=== Submitting multi-part data ===
 
''Requires [[Analytica 6.4]]''
 
''Requires [[Analytica 6.4]]''
  
Data sent in an HTTP <code>method:post</code> can, in general, be formatted in two ways:
+
Data sent in an HTTP <code>method:post</code> can usually be formatted in two ways:
 
* <code>Content-Type:application/x-www-form-urlencoded</code>
 
* <code>Content-Type:application/x-www-form-urlencoded</code>
 
* <code>Content-Type:multipart/form-data</code>
 
* <code>Content-Type:multipart/form-data</code>
  
When a form value containing [[In-memory binary data terms|binary data]] or an image, then only the multipart format can be used. By default, <code>Content-Type:application/x-www-form-urlencoded</code> is used if possible, but if there is a form value that requires multipart, then  <code>Content-Type:multipart/form-data</code> is used.
+
When a form value contains [[In-memory binary data terms|binary data]] or an image, you must use the multipart format. By default, it uses <code>Content-Type:application/x-www-form-urlencoded</code> when possible, but uses <code>Content-Type:multipart/form-data</code> when a form value requires a multipart format.
  
To force a multipart format when it is not required (which you might need to do if the web-service you are calling requires a multipart format), include the header explicitly, e.g.:
+
To force a multipart format when it is not required (e.g.  if the web-service you requires a multipart format), include the header explicitly, e.g.:
 
:<code>[[ReadFromURL]]( url, method:'POST', formValues:vals, formIndex:I, httpHeaders: 'Content-Type:multipart/form-data' )</code>
 
:<code>[[ReadFromURL]]( url, method:'POST', formValues:vals, formIndex:I, httpHeaders: 'Content-Type:multipart/form-data' )</code>
A second way to force it to use multipart is to specify the «fieldHeaders» parameter (even by passing it an empty string). In either case, a boundary string between parts will be automatically generated. If you provide your own boundary, it will use it, but you need to be careful that this exact sequence of bytes does not appear in any of the content.  For example
+
Or you can specify the «fieldHeaders» parameter (even by passing it an empty string).  
 +
In either case, it generates a boundary string between parts. If you provide your own boundary, it will use it, but you need to be careful that this exact sequence of bytes does not appear in any of the content.  For example
 
:<code>[[ReadFromURL]]( url, method:'POST', formValues:vals, formIndex:I, httpHeaders: 'Content-Type:multipart/form-data;boundary=MyBoundary' )</code>
 
:<code>[[ReadFromURL]]( url, method:'POST', formValues:vals, formIndex:I, httpHeaders: 'Content-Type:multipart/form-data;boundary=MyBoundary' )</code>
 
would malfunction if one of your vals contains the byte sequence "MyBoundary".
 
would malfunction if one of your vals contains the byte sequence "MyBoundary".
Line 80: Line 82:
 
:<code>[[ReadFromURL]](url, "Post", vals, fields, I, fieldHeaders: If I="video clip" then "Content-Type: video/mp4" else "" )</code>
 
:<code>[[ReadFromURL]](url, "Post", vals, fields, I, fieldHeaders: If I="video clip" then "Content-Type: video/mp4" else "" )</code>
  
Note that if you are sending only a single image or a single [[In-memory binary data terms|binary data term]], it is usually simpler to pass it as the «httpContent» rather than using form fields at all. In that case the message is a single part (not multipart) body, assuming you also control what the receiving side expects.
+
If you are sending only a single image or a single [[In-memory binary data terms|binary data term]], it is usually simpler to pass it as the «httpContent» rather than using form fields at all. In that case the message is a single part (not multipart) body, assuming you also control what the receiving side expects.
  
== Obtaining text content from an FTP site ==
+
== Obtaining content from an FTP site ==
 
To obtain content from an FTP site, use:
 
To obtain content from an FTP site, use:
 
:<code>ReadFromUrl("ftp://site.com/directory/file.txt")</code>
 
:<code>ReadFromUrl("ftp://site.com/directory/file.txt")</code>
  
Keep in mind that the content must be textual.  Binary files will be corrupted as they are read into a text value (specifically, '\0' characters will be converted to spaces).   
+
The content must be textual.  Binary files will be corrupted as they are read into a text value (specifically, '\0' characters will be converted to spaces).   
  
 
== Authentication ==
 
== Authentication ==
FTP sites commonly require user and password authentication, as is also common for web services and some web pages.  To authenticate, embed the user name and password in the URL as follows:
+
FTP sites usually require user and password authentication, as do web services and some web pages.  You can embed the user name and password in the URL like this:
 
:<code>ReadFromUrl("http://user:password@www.site.com/dir/page.htm")</code>
 
:<code>ReadFromUrl("http://user:password@www.site.com/dir/page.htm")</code>
 
:<code>ReadFromUrl("ftp://user:password@www.site.com/dir/page.htm")</code>
 
:<code>ReadFromUrl("ftp://user:password@www.site.com/dir/page.htm")</code>
  
 
== Sending data CGI scripts and proprietary web services ==
 
== Sending data CGI scripts and proprietary web services ==
 +
 
If you call a CGI service, the CGI program will accept input in its own format, which could be arbitrary.  You can send this data as follows:
 
If you call a CGI service, the CGI program will accept input in its own format, which could be arbitrary.  You can send this data as follows:
 
:<code>ReadFromUrl("http://server.com/cgi-bin/myProgram.cgi", httpContent: data)</code>
 
:<code>ReadFromUrl("http://server.com/cgi-bin/myProgram.cgi", httpContent: data)</code>
Line 120: Line 123:
  
 
== Limitations ==
 
== Limitations ==
The function waits until the full request from the server has been received.  It does not respond to ''Ctrl+Break''.  Some requests (usually if there is a problem), may take a while to time out before you can return to using Analytica.
 
  
On computers that aren't always connected to the internet, and need to dial up a modem or other connection to obtain access, the function does nothing to attempt to establish a connection.  It will simply fail with an error.
+
The function waits until the full request from the server has been received.  It does not respond to ''Ctrl+Break''.  Some requests (e.g. if there is a problem), may take a while to time out before you can return to using Analytica.
 +
 
 +
On computers that aren't always connected to the internet, or need to dial up a modem or other connection to obtain access, the function does nothing to attempt to establish a connection.  It will simply fail with an error.
  
 
Several possible error conditions return a cryptic message with only an error code number.
 
Several possible error conditions return a cryptic message with only an error code number.
  
Binary content cannot be downloadedThe function can be used to download pictures from HTTP sources, but otherwise content must be text.
+
If cannot download binary dataIt can only obtain text or pictures from HTTP sources.
 
   
 
   
The function cannot be used to access file system files, e.g., you can't use a url starting with "file://...".
+
It cannot access file system files, e.g., you can't use a url starting with "file://...".
  
 
==History==
 
==History==

Revision as of 01:12, 19 December 2023


Release:

 • 4.6 •  5.0 •  5.1 •  5.2 •  5.3 •  5.4 •   •  6.0 •  6.1 •  6.2 •  6.3 •  6.4 •  6.5 •  6.6

Reads a page or text file from a URL and returns the result as a text value.

Requires Analytica Enterprise

ReadFromUrl(url)

This function can read from an HTTP or HTTPs page (e.g., standard or secure web page), from FTP, or from GOPHER. You can also submit data as would occur on a web page when a user submits an HTTP form. You can also use it to "call" web services to obtain data. The result is returned as a text value. You can functions like ParseCSV() to parse the data.

There are several optional parameters:

method
The HTTP method, usually "GET" or "POST". More esoteric methods include "HEAD", "PUT", "DELETE", "TRACE", "OPTIONS", "CONNECT", and potential custom service methods.
formFields, formValues, formIndex
Data for form field values to submit to a web page. Usually, you use «formValues» as an array indexed by «formFields», or if they both have a single common index. If «formValues» and «formFields» have multiple indexes in common, you should specify the common index as «formIndex».
httpHeaders
Additional HTTP headers (separate with Chr(13)).
httpContent
custom submitted HTTP content

Reading from a Web page

To read the contents of a web page,usually in HTML format, just supply the URL, e.g.:

ReadFromUrl("http://lumina.com")

If you omit the "http://" part, it defaults to an http query, e.g.:

ReadFromUrl("lumina.com")

Obtaining an image from a web page

ReadFromUrl() can download images with standard formats (i.e., JPG, PNG, GIF, etc) from HTTP pages by supplying the URL, e.g.:

ReadFromUrl("lumina.com/images/lum_AnalyticaLogo_Tagline_Snagged.png")

The result is a picture value object. You can assign the result to the Pict attribute of an object on a diagram to view the image in the model (and to cause the image to be saved with the model file).

Submitting Data to a Web Page

To simulate the submission of HTML form data when querying a web page, you can submit the information using either GET or POST methods. With a GET method, you normally see the parameters appear on the URL itself. With a POST method, you normally don't see the parameters -- they are passed in the body of the HTTP request.

To submit form data, you can set up an table of fields values indexed by field names, usally as an index as a list-of-labels. The function call uses these parameters:

ReadFromUrl(url, method, formValues, formFields, formIndex)

You only need to specify the «formIndex» parameter if «formValues» and «formFields» are arrays that may have more than one index in common.

For example, the following queries Google for "Analytica":

INDEX fieldNames := ["hl", "q"];
VARIABLE form := Array(fieldNames, ["en", "Analytica"]);
ReadFromUrl("http://google.com/search", "GET", form, fieldNames)

It returns the result in HTML format.

You don't have to worry about URL-encoding the field names or values. If they containnon-alpha numeric characters in either, it will encode them before they are submitted.

Submitting multi-part data

Requires Analytica 6.4

Data sent in an HTTP method:post can usually be formatted in two ways:

  • Content-Type:application/x-www-form-urlencoded
  • Content-Type:multipart/form-data

When a form value contains binary data or an image, you must use the multipart format. By default, it uses Content-Type:application/x-www-form-urlencoded when possible, but uses Content-Type:multipart/form-data when a form value requires a multipart format.

To force a multipart format when it is not required (e.g. if the web-service you requires a multipart format), include the header explicitly, e.g.:

ReadFromURL( url, method:'POST', formValues:vals, formIndex:I, httpHeaders: 'Content-Type:multipart/form-data' )

Or you can specify the «fieldHeaders» parameter (even by passing it an empty string). In either case, it generates a boundary string between parts. If you provide your own boundary, it will use it, but you need to be careful that this exact sequence of bytes does not appear in any of the content. For example

ReadFromURL( url, method:'POST', formValues:vals, formIndex:I, httpHeaders: 'Content-Type:multipart/form-data;boundary=MyBoundary' )

would malfunction if one of your vals contains the byte sequence "MyBoundary".

When sending multi-part data, you may also want to add custom headers to individual items. You can do this using the «fieldHeaders» parameter, which also uses the «formIndex». One example is when you want a field to appear to be the contents of a file and you want to provide the file name.

LocalIndex Field := ['Date','Upload'];
Local fileContents := ReadBinaryFile( filename, typeFlags: 7 {binary data blob} );
Local values:= Array( Field, [ Today(), fileContents ] );
Local specialHeader := f"Content-Disposition: form-data; name=""File""; filename=""{filename}""";
Local fieldHeaders := Array( Field, [ "", specialHeader ] );
ReadFromURL( url, 'Post', formValues: values, formIndex: Field, fieldHeaders: fieldHeaders )

In this code, specialHeader creates a header field for the form item 'Upload' which includes the file name. The server who receives this will think you used an HTML file upload field to select a file named filename. In addition, the server will see the field name as being "File" (from the specialHeader) overriding the name "Upload" in the «formIndex» parameter.

Note that if you are not adding a filename or altering the usual Content-Disposition, you don't have to specify it as a fieldHeader. If you don't specify it, the function automatically creates Content-Disposition for each part.

A second common case where you might need to customize the «fieldHeaders» is when you need to customize the Content-Type. For example, suppose you upload a video file, which you read into memory as a binary data term. If you don't specify the content-type, it will default to Content-Type: application/octet-stream. Thus, you might specify it for the field:

ReadFromURL(url, "Post", vals, fields, I, fieldHeaders: If I="video clip" then "Content-Type: video/mp4" else "" )

If you are sending only a single image or a single binary data term, it is usually simpler to pass it as the «httpContent» rather than using form fields at all. In that case the message is a single part (not multipart) body, assuming you also control what the receiving side expects.

Obtaining content from an FTP site

To obtain content from an FTP site, use:

ReadFromUrl("ftp://site.com/directory/file.txt")

The content must be textual. Binary files will be corrupted as they are read into a text value (specifically, '\0' characters will be converted to spaces).

Authentication

FTP sites usually require user and password authentication, as do web services and some web pages. You can embed the user name and password in the URL like this:

ReadFromUrl("http://user:password@www.site.com/dir/page.htm")
ReadFromUrl("ftp://user:password@www.site.com/dir/page.htm")

Sending data CGI scripts and proprietary web services

If you call a CGI service, the CGI program will accept input in its own format, which could be arbitrary. You can send this data as follows:

ReadFromUrl("http://server.com/cgi-bin/myProgram.cgi", httpContent: data)

In this example, data contains the data that you are submitting to the CGI script. This data will form the body of the HTTP request. Content may only be used with HTTP requests, not with FTP or Gopher requests.

Various web services fall into this category as well, where data being submitted via HTTP may be in a proprietary format.

In some cases, you may also need to include additional HTTP headers in your request. You can insert these using the optional «httpHeaders» parameter. If you have more than one HTTP header, separate them using a CR character, Chr(13). If you enter these into a definition (e.g., with quotes), you can just type a new line, or if you enter them into a single edit table cell, pressing ALT-ENTER to insert a CR (new-line) into the cell. Otherwise, you can use the & operator to concatenate each header line with Chr(13).

ReadFromUrl("http://somehost.com", httpHeaders: "Accept:text/xml"&Chr(13)&"User-Agent:MyModel.ana")

The content value is passed directly with no special encoding of characters.

HTTP Status Codes

During an HTTP request, if the status code returned by the server is greater than or equal to 400, ReadFromUrl issues a warning displaying the status code and status text, unless the Show Result Warnings preference is turned off.

Proxy Servers

ReadFromUrl uses information stored in the system registry to determine whether or a proxy server should be used to access the internet. The configuration can be set up using Internet Explorer or Chrome web browsers (but not through Firefox). The proxy configuration is stored in the system registry in the hive:

HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings

in the registry values ProxyEnable, ProxyServer and ProxyOverride.

It should work with TIS, SOCKS and CERN style proxy servers as long as Internet Explorer is installed (SOCKS support requires IE).

Limitations

The function waits until the full request from the server has been received. It does not respond to Ctrl+Break. Some requests (e.g. if there is a problem), may take a while to time out before you can return to using Analytica.

On computers that aren't always connected to the internet, or need to dial up a modem or other connection to obtain access, the function does nothing to attempt to establish a connection. It will simply fail with an error.

Several possible error conditions return a cryptic message with only an error code number.

If cannot download binary data. It can only obtain text or pictures from HTTP sources.

It cannot access file system files, e.g., you can't use a url starting with "file://...".

History

ReadFromUrl was introduced in Analytica 4.2.

See Also

Comments


You are not allowed to post comments.