Database access

Analytica User GuideIntegration with data and applicationsDatabase access

An overview of functions for querying and modifying an external databases using Open Database Connectivity (ODBC). ODBC is a widely used standard for connecting to relational databases. It uses queries in Structured Query Language (SQL, pronounced “sequel,”) to read from and write to databases.

Database access is available in Analytica Enterprise, Optimizer, and ACP.

Overview of ODBC

SQL is a widely used language to read data from and write data to a relational database. A relational database organizes data in two-dimensional tables, where the columns of a table serve as fields or labels, and the rows correspond to records, entries, or instances. In Analytica, it is more natural to refer to the columns as labels and rows as records. For instance, an address book table might have the columns or labels LastName, FirstName, Address, City, State, Zip, Phone, Fax, and E-mail, and each individual would occupy one row or record in that table.

The result of an SQL query is a two-dimensional table, called a result table. The rows are the records matching the criteria specified by the query. The columns are the requested fields.

Analytica Enterprise provides functions that accept an SQL query, using standard SQL syntax, as a text-valued parameter. These functions return the result of the query as an array with two dimensions, with its rows indexed by a record index, and columns indexed by a label index. So, the basic structure of an Analytica model for retrieving a result table is this.

Each of these three nodes could require the information from the Result_Table. For example, the definition of the record index would require knowing how many records (rows) are in the result table; the label index might need to read the names of the columns — although, often they are known in advance; and of course, the Result_Table needs to read the table. The Database library provides the functions, DbQuery, DbLabels, and DbTable to define these variables. These functions work in concert to perform the query only once (when the record index is evaluated), and share the result table between the nodes.

Suppose as an example that we have a database containing the addresses individuals. To ensure the titles are meaningful, we name the indexes Individuals and Address_fields. The query is then encoded in the indexes and variable as follows.

Index Individuals := DbQuery(Data_source,'SELECT*FROM Addresses')

Index Address_fields := DbLabels(Individuals)

Variable Address_fields := DbTable(Individuals, Address_fields)

In the above example, the record index is defined using DbQuery(), the label index is defined using DbLabels(), and the result table is defined using DbTable(). Each function is described below.

To specify a data source query, two basic pieces of information must always be known. These are the data source identifier and the SQL query text. These two items are the parameters to the DbQuery() function, and are discussed in the following two subsections.

DSN and data source

A data source is described by a text value, which can contain the Data Source Name (DSN) of the data source, login names, passwords, etc. Here, we describe the essentials of how to identify and access a data source. These follow standard ODBC conventions. For more details, consult one of the many texts on ODBC.

Tip

You must have a DSN already configured on your machine. If not, consult with your Network Administrator. See Configuring a DSN.

The general format of a data source identification text is (the single quotes are Analytica’s text delimiters):

'attr1 = value1; attr2 = value2; attr3 = value3;'

For example, the following data source identifier specifies the database called 'Automobile Data', with a user login 'John' and a password of 'Lightning':

'DSN = Automobile Data; UID = John; PWD = Lightning'

If a database is not password protected, then a data source descriptor might be as simple as:

'DSN = Automobile Data'

If a default data source is configured on your machine (consult your database administrator), you can specify it as:

'DSN = DEFAULT'

Some systems might require one login and password for the server, and another login and password for the DBMS. In this case, both can be specified as:

'DSN = Automobile Data; UID = John;

PWD = Lightning; UIDDBMS = JQR; PWDDBMS = Thunder'

You can use the DRIVER attribute to specify explicitly which driver to use, instead of letting it be determined automatically by the data source type. For example:

'DSN = Automobile Data; DRIVER = SQL Server'

Instead of embedding a long data source connection text inside the DbQuery() statement, you can define a variable in Analytica whose value is the appropriate text value. The name of this variable can then be provided as the argument to DbQuery(). Another alternative is to place the connection information in a file data source (a .DSN file). Such a file would consist of lines such as:

DRIVER = SQL Server

UID = John

PWD = Lightning

DSN = Automobile Data

Assuming this data is in a file named MyConnect.DSN, the connection text can be specified as:

'FILEDSN = MyConnect.DSN'

In some applications, you might wish to connect directly to a driver rather than a registered data source. Some drivers allow this as a way to access a data file directly, even when it is not registered. Also, some drivers provide this as a way o f interrogating the driver itself. To perform such a connection, use the driver keyword. For example, if the Paradox driver accepts the directory of the data files as an argument, you can specify:

'DRIVER={Paradox Driver};DIRECTORY='D:\CARS'

The specific fields used here (UID, PWD, UIDDBMS, PWDDBMS, DIRECTORY, etc.) are interpreted by the ODBC driver, and therefore depend on the specific driver used. Any fields interpreted by your driver are allowed.

If you do not wish to embed the full DSN in the connection text, a series of dialogs pop up when the DbQuery() function is evaluated. For example, you can leave the UID and PWD (user name and password) out of your model. When the model is evaluated, Analytica prompts you to enter the required information. Explicitly placing information in your model eliminates the extra dialog. A blank connection text can even be used, in which case you need to choose among the data sources available on your machine when the model is being evaluated. Although the user can form the DSN via the graphical interface at that point, the result is not automatically placed in the definitions of your Analytica model. However, you might be able to store the information in a DSN file (depending on which drivers and driver manager you are using). You might also be able to register data sources on your machine from that interface.

Tip

In 64-bit Windows, ODBC drivers may be either 64-bit drivers or 32-bit drivers. When you are using Analytica 64-bit, you can only make use of 64-bit ODBC drivers. If you do not have a 64-bit driver for the database you are using installed on your computer, you will not be able to query that database from Analytica 64-bit. Likewise, the 32-bit editions of Analytica can only make use of 32-bit ODBC drivers, so the appropriate 32-bit driver must be installed on your computer. Note that 64-bit versions of the Microsoft JET drivers do not exist. These drivers are installed with Microsoft Access and include the Access ODBC driver, the Excel ODBC driver, and the flat file ODBC driver. Microsoft apparently has no plans to release 64-bit versions of these drivers, and has indicated it wants to phase out the use of these drivers entirely. These drivers therefore cannot be utilized from Analytica 64-bit. Most other major database drivers are available in both 32- and 64-bit.

Configuring a DSN

To access a database using ODBC, you must have a Data Source Name (DSN) already configured on your machine. In general, configuring a DSN requires substantial database administration expertise as well as the appropriate access permissions on your computer and network. To configure a data source, you should consult with your Network Administrator or your database product documentation. The general task of configuring a DSN is beyond the scope of this manual.

If you find you must configure a DSN yourself, the process usually involves the following steps (assuming your database already exists):

Select the ODBC icon from the Windows Control Panel.
Select the User DSN, System DSN, or File DSN tab depending on your needs. Most likely, you will want System DSN. Click the Add button.
Select the driver. For example, if your database is a Microsoft Access database, select the Microsoft Access Driver and click Finish.
You are led through a series of dialogs specific to the driver you selected. These include dialogs that allow you to specify the location of your database, as well as the DSN name that you will use from your Analytica model. An example is shown here.

Specifying an SQL query

You can use any SQL query as a text parameter within an Analytica database function. SQL queries can be very powerful, and can include multiple tables, joins, splits, filters, sorting, and so on.

We give only a few simple examples here. If you are interested in more demanding applications, please consult one of the many excellent texts on SQL.

The SQL expression to select a complete table in a relational database, where the table is named VEHICLES, would be:

'SELECT * FROM vehicles'

Tip

SQL is case insensitive, but Analytica is case sensitive for labels of Column names.

To select only two columns (make and model) from this same table and sort them by make:

'SELECT make, model FROM vehicles ORDER BY make'

These examples provide a starting point. When using multiple tables, one detail to be aware of is that it is possible in SQL to construct a result table with two columns containing the same label. For example:

'SELECT * FROM vehicles, companies'

where both tables for vehicles and companies contain a column labeled Id. In this case, you can only access one (the first) of the two columns using DbTable(). Thus, you should take care to ensure that duplicate column labels do not result. This can be accomplished, for example, using the AS keyword, for example:

'SELECT vehicles.Id AS vid, companies.Id AS cid, * FROM vehicles, companies'

For users that are unaccustomed to writing SQL statements, products exist that allow SQL statements to be constructed from a simple graphical user interface. Many databases allow queries to be defined and stored in the database. For example, from Microsoft Access, one can define a query by running Access and using the Query Wizard graphical user interface. The query is given a name and stored in the database. The name of the query can then be used where the name of a table would normally appear, for example:

'SELECT * FROM myQuery'

Retrieving an SQL result table

To retrieve a result table from a data source, you need:

The data source connection text.
The SQL query. These are discussed in the previous two sections. For illustrative purposes, suppose the connection text is 'DSN = Automobile Data', and the SQL statement is

'SELECT * FROM vehicles'. Obtain the relational Result_table

Thus:

Index Records := DbQuery('DSN=Automobile Data','SELECT * FROM vehicles')

Index Labels := DbLabels(Records)

Variable Result_table := DbTable(Records, Labels)

You can now display Result_table to examine the results.

This basic procedure can be repeated for any result table. The structure of the model stays the same, and just the connection text and SQL query text change.

Separating columns of a database table

It is often more convenient for further modeling to create a separate variable for each column of a database table. Each column variable uses the same record index. For example, we might create separate variables for Make, Year, and Car model from the vehicles database table.

In this case, the record index is still defined using DbQuery(), and each column is defined using DbTable(). The actual SQL query is issued only once when the record index is evaluated.

Suppose you wished to have Make, Model, Year, MPG, etc., as separate Analytica variables, each a one-dimensional array with a common index. For example:

Index Records := DbQuery('DSN = Automobile Data',

'SELECT * FROM vehicles')

Variable Make := DbTable(Records, 'make')

Variable Model_Year := DbTable(Records, 'year')

Variable Car_Model := DbTable(Records, 'model')

Since Model is a reserved word in Analytica, we named the variable Car_Model instead of just Model. But, the second parameter to DbTable() specifies the name of the column as stored in the database. This does not have to be the same as the name of the variable in Analytica.

Alternatively, you can construct a table containing a subset of the columns in a result table. For example, if vehicles has a large number of columns, you might create this variable with only the three columns you are interested in:

Variable SubCarTable := DbTable(Records, ['make','model','year'])

This table is indexed by Records and by an implicit index (a.k.a. a Null index). The first argument to DbTable() must always be an indexed defined by DbQuery() — remember the SQL query is defined in that node, and this is how DbTable() knows which table is being retrieved.

DbWrite(): Writing to a database

You can use SQL to change the contents of the external data source from within an Analytica model. Using the appropriate SQL statements, you can add or delete records from an existing database table. You can also add columns, and create or delete tables, if your data source driver supports these operations.

DbQuery() cannot alter the data source, because it processes the SQL statement in read-only mode. Instead, use DbWrite(), which is identical to DbQuery() except that it processes the SQL statement in read-write mode. DbWrite() can make any change to the database that can be expressed as an SQL statement, and is supported by the ODBC driver.

To send data from your model into the database, you must convert that data into a text value — more precisely, into an SQL statement. Analytica offers some tools to help this process. Here, we illustrate a common case — writing a multi-dimensional array to a table in a database. We use the ODBC_Library.ana library distributed with Analytica.

Suppose you want to write the value of variable A, which is a three dimensional array indexed by I, J, and K, into a relational table named TableA, so that other applications can use the data.

First, we need to convert the 3D array into the correct relational table form. Then we convert the table into the SQL text to write to the database.

Our approach is to first convert the three-dimensional array A into a two dimensional table, which we store into TableA. TableA needs the two indexes ARowIndex and ALabelIndex. These three variables are defined as follows:

Index ALabelIndex := Concat(IndexNames(A), ['A'])

Index ARowIndex := sequence(1, Size(A))

Variable TableA := MdArrayToTable(A, ARowIndex, ALabelIndex)

See MdArrayToTable() and MdArrayToTable(a, row, col) (pure relational transformation). ALabelIndex evaluates to ['I', 'J', 'K', 'A'], and ARowIndex sets aside one row for each element of A. TableA is then a table with one row for each element of A, where the value of each index for that element is listed in the corresponding column, and the value of that element appears in the final column.

Next, set up TableA in the database with the same columns. This is most easily done using the front end provided with your database. For example, if you are using MS Access, start the MS Access program, and from there, create a new table. Alternatively, you could issue the statement:

DbWrite(DB,'CREATE TABLE TableA(I <text>, J <text>, K <text>, A <text>)')

from an Analytica expression (replacing <text> with whatever type is appropriate for your application). Be sure that the column labels in the database table have the same names as the labels of ALabelIndex in the Analytica model.

Tip

If you want to use column labels in the database that are different from the Analytica index names, define ALabelIndex to be a 1D array, self indexed. Set the domain of ALabelIndex to be the database labels, and the values of the array to the index names. (The last value is arbitrary.)

Our data is now in the form of a 2D table as needed for a database table. Next we construct the SQL text to write the table to the database. You must choose whether you want to append rows to the existing database table, or replace the table entirely. Or you can replace only selected entries. Your choice affects how you construct the SQL statement. Here, we totally replace any existing data with the new data, so after the operation, the database table is exactly the same as TableA in the Analytica model. The SQL statements for performing the write is:

DELETE * FROM TableA

INSERT INTO TableA(I, J, K, A) VALUES ('i1', 'j1', 'k1', 'a111')

INSERT INTO TableA(I, J, K, A) VALUES ('i1', 'j1', 'k2', 'a112')

...

The first statement removes existing data, since we are replacing it. We follow this by one INSERT INTO statement for each row of TableA. The data to the right of the VALUES keyword is replaced by the specific values for indexes I, J, K, and array A (the example above assumes the values are all text values). If your values are numeric, you should note that MSAccess adds quotes around them automatically.

Since writing the table requires a series of SQL statements, we have two options: Evaluate a series of DbWrite() functions, or lump the series of SQL statements into one long text value and issue one DbWrite() statement. In Analytica, the second option is much more efficient for two reasons. First, the overhead of connecting with the database occurs only one time. Second, intermediate result tables do not have to be read from the ODBC driver, while if you issued separate DbWrite() statements, each one would go through the effort of acquiring the result table, only to be ignored.

Important feature (double semicolon)

To allow multiple SQL statements in a single DbWrite() function (or in a single DbQuery() function), Analytica provides an extension to the SQL language. The double semicolon separates multiple statements. For example:

'DELETE * FROM TableA ;; SELECT * FROM TableA'

This first deletes the data from the table, and then reads the (now empty) table. When ;; is used, only the last SQL statement in the series returns a result table. Most statements that write to a database return an empty result table.

We are now ready to write the Analytica expression that constructs the SQL statement to write the table to the database. The function to do this already exists in the ODBC_Library. First, use the Add Module item on the File menu to insert the ODBC_Library into your model; then use the WriteTableSql function, which returns the SQL statement (as a text value) for writing the table to the database. The function requires that I and L contain no duplicates (which should be the case anyway).

Finally, define:

Variable Write_A_to_DB := DbWrite(DB, WriteTableSql(A, RowIndex, LabelIndex, 'TableA'))

Creating an output node to write to a database

Write_A_to_DB writes array A to the database whenever it is evaluated. But, this happens when the model user causes Write_A_to_DB to be evaluated, not necessarily whenever A changes. To make it easy for the end user to perform the write, we suggest you make an output node for WriteAtoDB:

Select node Write_A_to_DB in its diagram.
Select the Make Output Node command on the Edit menu.
Move the new output node to a convenient place in the user interface of the model.

Initially, the output node shows the Calc button. When you click it, it writes A to the database. It also displays the result of evaluating DbWrite(), usually an empty window, not very interesting to the user. To avoid this, append “; 'Done' ” to its definition:

Write_A_to_DB := DBWrite(DB, WriteTableSql(A, RowIndex, LabelIndex,'TableA'); 'Done'

Now, when you or an end user of the model, clicks Write_A_to_DB, after writing A to the database, it shows 'Done' in the output node. It reverts to the Calc button, whenever A changes.