Difference between revisions of "ReadBinaryFile"

Revision as of 17:59, 13 June 2018

Requires Analytica 5.0 Enterprise or higher

ReadBinaryFile(filename, bytesPer, typeFlags, offset, resultIndex..., showDialog, title)

Reads data directly from a binary (i.e., non-text) file.

Flat data files are usually read using ReadTextFile. When you read text files, end-of-line characters are interpreted in various ways for you, conversions from character sets such as UTF-8 or Unicode may occur, and certain special byte values such as the zero byte or end-of-file cannot occur in the file. In contrast, reading from a binary file gives you access to the bytes of the file exactly as they appear.

ReadBinaryFile is most commonly used when you have a data file in a binary data format from another application, and need to access the internal bytes at particular locations. It can also be used in conjunction with WriteBinaryFile as a fast and convenient method to dump an Analytica array to a file and load it back in later.

Result Value

The result returned is the contents reads. This will either be an array indexed by the index(es) you specify in the «resultIndex» parameter, or a list of items if you omit the «resultIndex» parameter.

Reading bytes

The usage

ReadBinaryFile(filename)

returns the contents of the file as bytes, a list of integer values between 0 and 255.

Reading N values

To limit the number of items read, pass an index to the «resultIndex» parameter having a length equal to the maximum number of items to read. The actual number read will be less if the end-of-file is reached. For example,

ReadBinaryFile(filename, resultIndex: I)

Parameters

filename

Name of file to read from.

File name paths can be either relative or absolute. An absolute path specifies the full path of the file from a root drive, such as

"C:\Users\Drice\Documents\Projects\FatigueAnalysis\Results.bin"

A relative filename incompletely specifies a file path, and is interpreted relative to the CurrentDataDirectory.

I...

The index(es) that you want as index(es) of the result. If you provide no index, it returns an unindexed list, so the variable calling ReadBinaryFile will appear as the index. If you provide multiple indexes, the last index listed vary fastest and first varies slowest as items are read.

The number of bytes to read depends on the product of the sizes of the indexes and the number of bytes per cell, which could vary if Bytesper parameter varies by one or more of the Indexes. If the total is less than the number of bytes in the file, it will read only the amount of data needed. If the total is more than the file size, it will pad out the excess cells with NULL.

If the indexes are in a different order in the file from the canonical order within Analytica, it reforms (transposes) the array to the internal form needed. If the amount of data is very large file (e.g. file has hundreds of MB), this reform process can be quite time-consuming.

bytesPer

Specifies the number of bytes per item read. For example, when you specify bytesPer: 4, then each 4 bytes becomes one cell of the result. When this is 1, then each byte is read and returned as a separate cell. Using 8 for «bytesPer» will result in the most memory-efficient encoding of contents when it is read into memory, since there is 8 times as much information per cell as would be the case for single bytes (and the bytes required per cell in memory would be the same in both cases).

When reading integer values (as is the default when «typeFlags» is omitted), this specifies the number of bytes per integer word, and can be 1 (for single bytes), 2 (for 16-bit words), 4 (for 32-bit integers) or 8 (for 64-bit integers).

When reading floating point real numbers, «bytesPer» can be 4 (for 32-bit IEEE 754 reals, often called floats), or 8 (for 64-bit IEEE 754 reals, often called doubles). When «typeFlags» has 2 (floating point), the default for «bytesPer» is 8 if not specified.

For the «typeFlags» option of 3, nine bytes per item are used, so you can either omit «bytesPer» or set it to 9. Any other value throws an error.

For the «typeFlags» option of 4, the number of bytes per item may vary (depending on the length of text values), and so «bytesPer» must be omitted or an error will be issued.

typeFlags

Controls how each item read gets converted into a value.

0 = Unsigned integer (positive only)

1 = Signed integer (positive and negative numbers)

2 = IEEE 754 Floating point real

3 = Value (numbers, dates and null). 9 bytes per item.

4 = Value (including text). Variable bytes per text item.

32 = Big endian flag.

You can say the one-byte integer with hex value 0xc1 is equal to 193, or you can say that it equals -133. The distinction here is whether you interpret the raw binary data as representing an unsigned integer (193) or a signed integer (-133). Signed integers use two's-complement so that both negative and positive values are within the possible range. Use typeFlags: 0 when you want integers to be interpreted as unsigned, or 1 to include negative values. When «bytesPer» is 8, integers are read in as signed regardless.

A standard Intel convention writes numbers (integers or floating point values) with the least significant bytes written first, least significant bytes last, which is termed "little-endian format". This is the default assumed by WriteBinaryFile unless the typeFlags = 32 flag is specified. "Big endian" means that the most significant byte is written first.

You can add the 32 "big endian" option to the data type selection. So, for example, a value of 2 + 32 would specify a floating point format (2) with the byte order reversed (32).

The Value type is a variant data type, in which a leading byte encodes the internal data type of the value, and followed by an 8-byte value. The «typeFlags» option of 3 only writes Analytica data types that can be fully encoded in with 8 bytes, which includes integers, fixed-point reals, floating point reals, date and date-time numbers, and the special value Null. When using this type, Null values ARE written to the file. The «typeFlags» option of 3 cannot be used to write text values, since these vary in length. The cells will each use 9 bytes, and will be spaced every 9 bytes in the file.

The «typeFlags» option of 4 extends option 3 by allowing text values. When a text value occurs, the first byte indicates that the value is text, the next two bytes encode the length, n, and the next n bytes encodes the characters in UTF-8. When this format is used, the «bytesPer» parameter must be omitted. The location of items in the file are not spaced at any predictable interval, since text items vary in length.

offset

Specifies the file position to start reading from. Negative value specifies the distance (in bytes) from the end of the file, and a positive or zero value specifies the distance in bytes from the start of the file. You can use «offset» parameter to read data from a particular location inside the file.

When «offset» varies with the result index(es), you can use Null values in the «offset» array to indicate that the item resides directly after the previously read item. Hence, set to non-null integer values only for those items that start at a new offset.

It is likely you'll want to limit the number if items read to a maximum number. To do this, specify a result index. The length of the result index will be used as the maximum number of items read.

showDialog

This is a flag to force or suppress the file selector dialog. When not specified, the dialog only displays if needed, for example, if the file name is blank or the file doesn't exist. Setting «showDialog» to false suppresses the dialog from appearing. Setting «showDialog» to true forces the dialog to display, using «filename» as the initial default.

title

The text to use as caption of file selector dialog.

Examples

Fast Writing and reading of Analytica arrays

WriteBinaryFile() and ReadBinaryFile() are the fastest ways to write data from a model to a file or read from a file, because the numeric data does not need to be converted to text when writing or parsed when reading.

Assume: a is an array indexed by indexes I, J, K, and l. Assume also that it contains only numbers, dates and null values -- no text, handles, references, etc.

Write the array to the file:

WriteBinaryFile("a.dat", a, I,J,K,L, bytesPer:12, typeFlags:3 )

Read the array back in:

ReadBinaryFile("a.dat", bytesPer:12, typeFlags:3, resultIndex: I, J, K, L )

Reading a complex binary structure

Binary Shape files (*.shp) and Shape index files (*.shx) have the following header layout for their first 100 bytes:

To read this, create an index for the items of the header:

Index Header_item :=

['File Code','Unused','Unused','Unused','Unused','Unused','File length','Version','Shape type','Xmin','Ymin','Xmax','Ymax','Zmin','Zmax','Mmin','Mmax']

The bytes per item varies:

Variable BytesPerItem := Table(Header_item)(4,4,4,4,4,4,4,4,4,8,8,8,8,8,8,8,8)

A big-endian (positive) integer is «typeFlag» 32, a little-endian integer is 0, and a little-endian double is 2. So the item types are:

Variable ItemType := Table(Header_item)(32,32,32,32,32,32,32,0,0,2,2,2,2,2,2,2,2)

The full header is then read using:

ReadBinaryFile("filename.shx", bytesPer: BytesPerItem, typeFlags:itemType, resultIndex: Header_item )

The following model embellishes this with a user-interface for selecting the items in a general binary data record and using this to read the structure. For complex binary structures, this may simplify the coding.

Model file: Reading binary data file.ana
Try the model with this binary data file: NHDWaterbody.shx

@@ Line 35: / Line 35: @@
 ====  I... ====
-The indexes that the final result will be indexed by. This can be omitted, in which case all items in the file are read, all the way to the end of the file.
-When reading into a multidimensional array, the last index listed will be varied the fastest as items are read.
+The index(es) that you want as index(es) of the result. If you provide no index, it returns an unindexed list, so the variable calling ReadBinaryFile will appear as the index. If you provide multiple indexes, the last index listed vary fastest and first varies slowest as items are read.
+The number of bytes to read depends on the product of the sizes of the indexes and the number of bytes per cell, which could vary if Bytesper parameter varies by one or more of the Indexes.  If the total is less than the number of bytes in the file, it will read only the amount of data needed. If the total is more than the file size, it will pad out the excess cells with NULL.
+If the indexes are in a different order in the file from the canonical order within Analytica, it reforms (transposes) the array to the internal form needed. If the amount of data is very large file (e.g. file has hundreds of MB), this reform process can be quite time-consuming.
 ==== bytesPer ====