DbQuery


(Requires Analyica Enterprise or Power Player)

DbQuery( connectionString, sql )

Used to define an index Variable. The definition of the index should contain only one DbQuery() statement. ConnectionString specifies a data source (e.g., 'DSN=MyDatabase'). The sql parameter defines an SQL query.

When placed as the definition of an index Variable, DbQuery() will be evaluated as soon as the definition is complete. When it is evaluated, the actual query is performed. The resulting result table is cached inside Analytica, to subsequently be accessed by [[DbTable]() or DbLabels().

DbQuery() returns a sequence 1..n, where n is the number of records (rows) in the result table.

DbQuery() should appear only once in a definition, and if it is embedded in an expression, the expression must return a list with n elements.

DbQuery() processes the sql statement in read-only mode, so that the data source cannot be altered as a result of executing this statement. To alter the data source, use DbWrite().

New to 4.0

In Analytica 4.0 (but not earlier releases), a local index can hold the record set. This means you can create a user-defined function that bundles several database query steps. For example:

 Function DbFullTable(connectionStr, sql: text)
 Definition:
    Index Rows := DbQuery(connectionStr, sql);
    Index Cols := DbLabels(Rows);
    DbTable(Rows, Cols)

This user-defined function returns the full relational table in a single call, using local indexes, Rows and Cols.

Connection String Syntax

Using a configured DSN

The connectionString parameter specifies the data source that you will be querying. The most common syntax for the connection string is:

"DSN=«name»"

where «name» is the name of a system or user data source that you have previously configured in the Data Sources (ODBC) wizard (accessable in Windows via

Start → Programs → Administrative Tools → Data Sources (ODBC).

You can augment a DSN-based connection string with additional keyword values. These are specific to the database driver you are using, and the names of these additional keywords must be found from other documentation. For example, some databases require a login and password, which can be added to the connection string as:

"DSN=MyDatabase; UID=John; PWD=Lightning"

Some databases require two logins, one for the server, and another for the DBMS. In this case, both can be specified:

"DSN=MyDatabase; UID=John; PWD=Lightning; UIDDBMS=JQR; PWDDBMS=Thunder"

There are a couple of advantages to using a DSN to specify your data source. First, the wizard in Windows makes it relatively easy to select the appropriate driver and configure the parameters used to access the data source. Second, the connectionString parameter in your model remains very simple. And third, the model is not reliant on the precise location of the database, so that if someone else uses your model on a different machine, the connection string can remain valid. On the downside, it does mean that whenever you share your model with someone else, they must configure a DSN before using it.

Directly accessing the database file

You can also access your database directly, without configuring a DSN. To do this, at a minimum you need to specify the ODBC driver name and the database location in the connection string.

The driver name can be discovered by evaluating

SqlDriverInfo("")

from within Analytica.

Many drivers use the keyword DBQ for specifying the file path to the database. Some database drivers (examples are Microsoft Access and Excel) store the database in a single file, so the path specifies the actual file. Other drivers (e.g., Paradox, Text driver) store their database in many files in a single directory, in which case DBQ locates the directory. Here are some example of connection strings:

"Driver=Microsoft Access Driver (*.mdb); DBQ=C:\MyData\MyDatabase.mdb"
"Driver=Microsoft Excel Driver (*.xls); DBQ=" & CurrentModelDirectory() & "\SourceData.xls"
"Driver=Microsoft Paradox Driver (*.db); DBQ=C:\MyData\MyDatabaseDirectory"
"Driver=Microsoft Text Driver (*.txt; *.csv); DBQ="C:\MyData\MyCsvTables"

Note that when specifying the location, you must use absolute path names. The ODBC drivers do not know anything about Analytica's current directories. You may find CurrentModelDirectory and CurrentDataDirectory functions useful for computing full path names when your data source is in the same directory as your model, or a subdirectory.

SQL Syntax

Databases contain tables, each table having a name. The simplest and most common SQL statement is one that retrieve all the data from a given table:

"SELECT * FROM tableName"

Many databases also allow you to define queries, which may join several underlying tables to create a new computed table results. The names of such predefined queries can usually be used in place of tableName as well.

The SELECT statement is SQL's workhorse for reading data. You can expand on this simple form by selecting only a subset of the columns, selecting only a subset of the rows using a search condition, joining multiple tables, sorting the selected rows, eliminating duplicate rows, or grouping (aggregating) the rows. The following shows the general syntax used by most SQL databases, where brackets surround optional parts:

SELECT column_list
[ FROM list_of_tables ]
[ WHERE search_condition ]
[ ORDER BY column_list [ASC|DESC] ]
[ GROUP BY group_by_expression ]
[ HAVING group_search_conditon ]

If you are creating a complex query, involving the joining of multiple tables and complex WHERE or GROUP BY criteria, a convenient way to do this by using an SQL designer application. Graphical user interfaces that help you create your SQL queries can be found in Microsoft Access, in a program called Microsoft Query, in Microsoft SQL Server management studio, etc.

You can often bring entire tables into Analytica and "join" them from within Analytica, using Analytica expressions. However, if you are dealing with huge databases, then use of more sophisticated SQL can be used to reduce the amount of data that needs to be transferred and held in Analytica's memory.

Note that DbQuery cannot be used to modify the database -- it executes its SQL in a read-only model. Other SQL statements can be used to modify a database -- creating new tables, adding or removing data to/from tables, etc. To utilize these SQL statements, you must use DbWrite.

ODBC to read Excel data

ODBC can be used to read data from Excel spreadsheets, at least in a rudimentary way. When using it in this fashion, data that you might read should be organized as a table, where the first row of the table contains column labels. If you have a single table on a worksheet, you can use:

"Select * from [Sheet1$]".

The Excel ODBC driver wants the the name of the sheet to be followed by a dollar sign, and because of this, you must surround the sheet name and dollar sign with square brackets.

If you name a region in your workbook, you can read the contents of that region using:

"Select * from [RegionName]

In this case, you don't have to specify a worksheet name, and you don't include the trailing dollar sign. The first row of your region needs to have column labels. If the first row does not have column labels, it will omit the first row from the result and return placeholder labels like ["F1","F2","F3"].

You can query a cell range using, e.g.:

"Select * from [Sheet1$D17:F55]"

Again, the first row (in this example, D17:F17), of the range needs to have labels. I do not know of a way of reading a transposed table (i.e., where the labels are row headers, and the fields for each record are in the same column). If the block of data don't have the labels for the first row, as long as it doesn't start in row 1, you can always start your query from one row previous. E.g., if you want cells F9:H12, then query F8:H12.

Library

Database Functions

Additional Help

  • A bug has been identified in 4.0.1.1 that causes the following message to appear incorrectly in some circumstances:
First arg, <rowIndex>, to DBLabels must be defined with a DBQuery( ) statement

The bug only occurs when the node holding the DbQuery is an index node that has just been defined. If you use a Variable node, rather than an index node, the problem does not occur, or if a model is saved and reloaded the problem does not occur. If you change the index to a variable node, you must force the definition to be re-evaluated, such as by inserting a space in the definition. If you encounter this problem when it should not be appearing, our recommendation is to save and reload the model, which will correct the problem.

See Also

Comments


You are not allowed to post comments.