Database access

Revision as of 04:48, 24 December 2015 by Jhernandez3 (talk | contribs) (Created page with "Category:Analytica User Guide <breadcrumbs>Analytica User Guide > {{PAGENAME}}</breadcrumbs> Analytica Enterprise provides several functions for querying external databas...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Analytica Enterprise provides several functions for querying external databases using Open Database Connectivity (ODBC). ODBC is a widely used standard for connecting to relational databases, on either local or remote computers. It uses queries in Structured Query Language (SQL), pronounced “sequel,” to read from and write to databases.

Overview of ODBC

SQL is a widely used language to read data from and write data to a relational database. A relational database organizes data in two-dimensional tables, where the columns of a table serve as fields or labels, and the rows correspond to records, entries, or instances. In Analytica, it is more natural to refer to the columns as labels and rows as records. For instance, an address book table might have the columns or labels LastName, FirstName, Address, City, State, Zip, Phone, Fax, and E-mail, and each individual would occupy one row or record in that table.

The result of an SQL query is a two-dimensional table, called a result table. The rows are the records matching the criteria specified by the query. The columns are the requested fields.

Analytica Enterprise provides functions that accept an SQL query, using standard SQL syntax, as a text-valued parameter. These functions return the result of the query as an array with two dimensions, with its rows indexed by a record index, and columns indexed by a label index. So, the basic structure of an Analytica model for retrieving a result table is this.

Db access 1.png

Each of these three nodes could require the information from the Result_Table. For example, the definition of the record index would require knowing how many records (rows) are in the result table; the label index might need to read the names of the columns — although, often they are known in advance; and of course, the Result_Table needs to read the table. The Database library provides the functions, DBQuery, DBLabels, and DBTable to define these variables. These functions work in concert to perform the query only once (when the record index is evaluated), and share the result table between the nodes.

Suppose as an example that we have a database containing the addresses individuals. To ensure the titles are meaningful, we name the indexes Individuals and Address_fields. The query is then encoded in the indexes and variable as follows.

Index Individuals := DBQuery(Data_source,'SELECT*FROM Addresses') Index Address_fields := DBLabels(Individuals) Variable Address_fields := DBTable(Individuals, Address_fields)

In the above example, the record index is defined using DBQuery(), the label index is defined using DBLabels(), and the result table is defined using DBTable(). Each function is described below.

To specify a data source query, two basic pieces of information must always be known. These are the data source identifier and the SQL query text. These two items are the parameters to the DBQuery() function, and are discussed in the following two subsections.

DSN and data source

A data source is described by a text value, which can contain the Data Source Name (DSN) of the data source, login names, passwords, etc. Here, we describe the essentials of how to identify and access a data source. These follow standard ODBC conventions. For more details, consult one of the many texts on ODBC.

Tip
You must have a DSN already configured on your machine. If not, consult with your Network Administrator. See “Configuring a DSN” below.

The general format of a data source identification text is (the single quotes are Analytica’s text delimiters):

'attr1=value1; attr2=value2; attr3=value3;'

For example, the following data source identifier specifies the database called 'Automobile Data', with a user login 'John' and a password of 'Lightning':

'DSN=Automobile Data; UID=John;PWD=Lightning'

If a database is not password protected, then a data source descriptor might be as simple as:

'DSN=Automobile Data'

If a default data source is configured on your machine (consult your database administrator), you can specify it as:

'DSN=DEFAULT'

Some systems might require one login and password for the server, and another login and password for the DBMS. In this case, both can be specified as:

'DSN=Automobile Data; UID=John; PWD=Lightning; UIDDBMS=JQR; PWDDBMS=Thunder'

You can use the DRIVER attribute to specify explicitly which driver to use, instead of letting it be determined automatically by the data source type. For example:

'DSN=Automobile Data; DRIVER=SQL Server'

Instead of embedding a long data source connection text inside the DBQuery() statement, you can define a variable in Analytica whose value is the appropriate text value. The name of this variable can then be provided as the argument to DBQuery(). Another alternative is to place the connection information in a file data source (a .DSN file). Such a file would consist of lines such as:

DRIVER = SQL Server UID = John PWD = Lightning DSN = Automobile Data

Assuming this data is in a file named MyConnect.DSN, the connection text can be specified as:

'FILEDSN=MyConnect.DSN'

In some applications, you might wish to connect directly to a driver rather than a registered data source. Some drivers allow this as a way to access a data file directly, even when it is not registered. Also, some drivers provide this as a way o f interrogating the driver itself. To perform such a connection, use the driver keyword. For example, if the Paradox driver accepts the directory of the data files as an argument, you can specify:

'DRIVER={Paradox Driver};DIRECTORY='D:\CARS'

The specific fields used here (UID, PWD, UIDDBMS, PWDDBMS, DIRECTORY, etc.) are interpreted by the ODBC driver, and therefore depend on the specific driver used. Any fields interpreted by your driver are allowed.

If you do not wish to embed the full DSN in the connection text, a series of dialogs pop up when the DBQuery() function is evaluated. For example, you can leave the UID and PWD (user name and password) out of your model. When the model is evaluated, Analytica prompts you to enter the required information. Explicitly placing information in your model eliminates the extra dialog. A blank connection text can even be used, in which case you need to choose among the data sources available on your machine when the model is being evaluated. Although the user can form the DSN via the graphical interface at that point, the result is not automatically placed in the definitions of your Analytica model. However, you might be able to store the information in a DSN file (depending on which drivers and driver manager you are using). You might also be able to register data sources on your machine from that interface.

Tip
In 64-bit Windows, ODBC drivers may be either 64-bit drivers or 32-bit drivers. When you are using Analytica 64-bit, you can only make use of 64-bit ODBC drivers. If you do not have a 64-bit driver for the database you are using installed on your computer, you will not be able to query that database from Analytica 64-bit. Likewise, the 32-bit editions of Analytica can only make use of 32-bit ODBC drivers, so the appropriate 32-bit driver must be installed on your computer. Note that 64-bit versions of the Microsoft JET drivers do not exist. These drivers are installed with Microsoft Access and include the Access ODBC driver, the Excel ODBC driver, and the flat file ODBC driver. Microsoft apparently has no plans to release 64-bit versions of these drivers, and has indicated it wants to phase out the use of these drivers entirely. These drivers therefore cannot be utilized from Analytica 64-bit. Most other major database drivers are available in both 32- and 64-bit.

Configuring a DSN

To access a database using ODBC, you must have a Data Source Name (DSN) already configured on your machine. In general, configuring a DSN requires substantial database administration expertise as well as the appropriate access permissions on your computer and network. To configure a data source, you should consult with your Network Administrator or your database product documentation. The general task of configuring a DSN is beyond the scope of this manual.

If you find you must configure a DSN yourself, the process usually involves the following steps (assuming your database already exists):

  1. Select the ODBC icon from the Windows Control Panel.
  2. Select the User DSN, System DSN, or File DSN tab depending on your needs. Most likely, you will want System DSN. Click the Add button.
  3. Select the driver. For example, if your database is a Microsoft Access database, select the Microsoft Access Driver and click Finish.
  4. You are led through a series of dialogs specific to the driver you selected. These include dialogs that allow you to specify the location of your database, as well as the DSN name that you will use from your Analytica model. An example is shown here.

Db access 2.png

Specifying an SQL query

You can use any SQL query as a text parameter within an Analytica database function. SQL queries can be very powerful, and can include multiple tables, joins, splits, filters, sorting, and so on.

We give only a few simple examples here. If you are interested in more demanding applications, please consult one of the many excellent texts on SQL.

The SQL expression to select a complete table in a relational database, where the table is named VEHICLES, would be:

'SELECT * FROM vehicles'

Tip
SQL is case insensitive, but Analytica is case sensitive for labels of Column names.

To select only two columns (make and model) from this same table and sort them by make:

'SELECT make, model FROM vehicles ORDER BY make'

These examples provide a starting point. When using multiple tables, one detail to be aware of is that it is possible in SQL to construct a result table with two columns containing the same label. For example:

'SELECT * FROM vehicles, companies'

where both tables for vehicles and companies contain a column labeled Id. In this case, you can only access one (the first) of the two columns using DBTable(). Thus, you should take care to ensure that duplicate column labels do not result. This can be accomplished, for example, using the AS keyword, for example:

'SELECT vehicles.Id AS vid, companies.Id AS cid, * FROM vehicles, companies'

For users that are unaccustomed to writing SQL statements, products exist that allow SQL statements to be constructed from a simple graphical user interface. Many databases allow queries to be defined and stored in the database. For example, from Microsoft Access, one can define a query by running Access and using the Query Wizard graphical user interface. The query is given a name and stored in the database. The name of the query can then be used where the name of a table would normally appear, for example:

'SELECT * FROM myQuery'

Retrieving an SQL result table

To retrieve a result table from a data source, you need:

  1. The data source connection text.
  2. The SQL query. These are discussed in the previous two sections. For illustrative purposes, suppose the connection text is 'DSN=Automobile Data', and the SQL statement is

'SELECT * FROM vehicles'. Obtain the relational Result_table thus: Index Records := DBQuery('DSN=Automobile Data', 'SELECT * FROM vehicles') Index Labels := DBLabels(Records) Variable Result_table := DBTable(Records, Labels)

You can now display Result_table to examine the results.

This basic procedure can be repeated for any result table. The structure of the model stays the same, and just the connection text and SQL query text change.

Separating columns of a database table

It is often more convenient for further modeling to create a separate variable for each column of a database table. Each column variable uses the same record index. For example, we might create separate variables for Make, Year, and Car model from the vehicles database table.


Comments


You are not allowed to post comments.