In this post, originally written by Glenn Paulley and posted to sybase.com in May of 2009, Glenn talks about SQL Anywhere's built-in support for 2 dialects of SQL: TSQL and Watcom SQL, and how mixing dialects can potentially cause problems within your stored procedures.

In November 1995 we launched the first version of SQL Anywhere (version 5.0) that offered support for Transact-SQL, in addition to SQL Anywhere's existing dialect, which we continue to call Watcom SQL. (By the way, SQL Anywhere 5.0 was my first SQL Anywhere release, as I had joined the firm just a month earlier). There are many SQL constructions common between the two SQL dialects, but there are also significant and important differences. Certainly a significant difference between the two is the use of statement delimiters: in Watcom SQL, a semicolon is used to delimit each statement, whereas in the Transact-SQL dialect supported by Sybase Adaptive Server Enterprise, no statement delimiters are specified between statements or within a BEGIN...END block.

Supporting two SQL dialects, one without statement delimiters, is a significant technical challenge. The SQL Anywhere server must be able to parse constructions of either dialect - since it has no idea what an application might send - and recognize when one dialect is being used, or the other. Why? Perhaps the most important semantic difference between the two dialects is how errors are handled. The SQL Anywhere documentation states:

Default procedure error handling is different in the Watcom SQL and Transact-SQL dialects. By default, Watcom SQL dialect procedures exit when they encounter an error, returning SQLSTATE and SQLCODE values to the calling environment. Explicit error handling can be built into Watcom SQL stored procedures using the EXCEPTION statement, or you can instruct the procedure to continue execution at the next statement when it encounters an error, using the ON EXCEPTION RESUME statement. When a Transact-SQL dialect procedure encounters an error, execution continues at the following statement. The global variable @@error holds the error status of the most recently executed statement. You can check this variable following a statement to force return from a procedure. For example, the following statement causes an exit if an error occurs.
IF @@error != 0 RETURN
When the procedure completes execution, a return value indicates the success or failure of the procedure. This return status is an integer, and can be accessed as follows:
DECLARE @Status INT
EXECUTE @Status = proc_sample
IF @Status = 0   PRINT 'procedure succeeded'
ELSE   PRINT 'procedure failed'

Hence it is important that the server properly recognize which SQL dialect is being used.

Syntactic clues

On input, the SQL Anywhere server parser may expect a Watcom SQL batch, a Transact-SQL batch, a Watcom SQL dialect statement (eg. CREATE PROCEDURE), or a Transact-SQL one. There are several specific syntactic constructions that indicate to the SQL Anywhere parser that the dialect of the batch, procedure or trigger is Transact-SQL. These include:

The use of the ASclause before the procedure body, as in the following Transact-SQL procedure:

CREATE PROCEDURE showdept @deptname varchar(30)
AS   SELECT Employees.Surname, Employees.GivenName   FROM Departments, Employees   WHERE Departments.DepartmentName = @deptname   AND Departments.DepartmentID = Employees.DepartmentID;

Watcom-dialect procedures use a BEGIN...END block to denote the procedure body, as does the ANSI/ISO SQL standard.

For a trigger, the lack of a trigger-time (BEFORE, AFTER, INSTEAD OF, or RESOLVE). In the Transact-SQL dialect supported by Adaptive Server Enterprise 15.0, all triggers are statement-level triggers and in SQL Anywhere such triggers are created as AFTER STATEMENTtriggers. The supported Transact-SQL syntax is:
```
CREATE TRIGGER [owner .]trigger_name
ON [owner .]table_name
FOR [ INSERT | UPDATE | DELETE ]
AS ...
```
SQL Anywhere's Transact-SQL support does not (yet) include support for Transact-SQL INSTEAD OF triggers, which were recently introduced in the Adaptive Server Enterprise 15.5 release.
In a procedure, trigger, function, or SELECT statement, the use of the Transact-SQL '=' operator for aliasing SELECT list expressions, or variable assignment:
```
SELECT @var = 'literal string'
```
rather than the Watcom SQL dialect's SET statement:
```
SET @var = 'literal string';
```
Use of '=' to denote a default value in the argument to a stored procedure, rather than the Watcom SQL dialect's DEFAULT clause.

Use of OUTPUT or OUTafter the specification of a stored procedure parameter:

CREATE PROCEDURE showdept @deptname varchar(30) OUTPUT
AS ...

rather than the Watcom SQL syntax

CREATE PROCEDURE showdept ( OUT @deptname varchar(30) )
BEGIN ...

Use of the Transact-SQL statements COMMIT TRANSACTION, ROLLBACK TRANSACTION, or PREPARE TRANSACTION.

Conversely, there are two instances where specific syntax identifies the statement(s) as being in the Watcom-SQL dialect:

the CREATE [OR REPLACE] VARIABLE statement, and
a Watcom-SQL dialect BEGIN...END block with optional label, variable declaration(s), and an EXCEPTION clause where a semicolon is used to separate the individual statements.

An example: Common Table Expressions

The SQL:2008 standard, and SQL Anywhere, support the SQL construction known as common table expressions, which use the WITH keyword to declare what is effectively an in-lined view definition. WITH RECURSIVE is the syntax used to construct a recursive query. Supporting common table expressions in a SQL dialect that does not utilize statement delimiters is difficult, because of the use of the WITH keyword in various other SQL constructions. For example, with SQL Anywhere, a common table expression definition would conflict with the use of the optional Transact-SQL WITH clause on a constraint definition. Hence SQL Anywhere does not support common table expressions in a Transact-SQL procedure, though they can be used when embedded within a derived table in a query's FROM clause (where a grammar conflict is not an issue).

As an aside, Microsoft SQL Server 2008 supports Transact-SQL procedures that optionally contain statement delimiters (semicolons), and in fact have deprecated the original (no-semicolons) syntax of Transact-SQL. Unlike Adaptive Server Enterprise, Microsoft SQL Server does support common table expressions, but requires, for example, that if used in a batch the preceding statement must be terminated by a semicolon - no doubt because, if it was permitted, of the grammar conflicts it would produce in an LALR(1) - or even LALR(2) - parser.

How SQL Anywhere parses SQL input

As mentioned above, SQL Anywhere has to support both the Watcom SQL and Transact-SQL dialects. Since the server has no idea what the application might send, the SQL Anywhere parser iteratively attempts to the parse input in multiple ways, termed "parse goals": a WATCOM SQL batch, a Transact-SQL batch, an SQL statement. For greater efficiency, the dialect tried first is the dialect of the last successfully parsed statement for that connection.

What happens when there is an error? In that case, the server will try an alternative goal. To illustrate, suppose we have the following compound statement:

begin  declare @var int  select @var = 100  WITH CountEmployees( DepartmentID, n ) AS  ( SELECT DepartmentID, COUNT( * ) AS n    FROM Employees GROUP BY DepartmentID )  SELECT DepartmentID, n  FROM CountEmployees  WHERE n <= @var
end;

Here, we have a Transact-SQL compound statement, as it lacks statement delimiters. The server, however, does not know this a priori and attempts to parse it, first, as a single SQL statement, which fails. The server then tries to parse the block as a Watcom SQL dialect compound statement, but this fails because the parser doesn't understand DECLARE @var INT SELECT on lines 25-26, since there is no semicolon to delimit the end of the DECLARE statement. It then tries to parse it as a Transact-SQL batch; this too fails, however, because of the common table expression on beginning on line 27, which isn't supported in Transact-SQL in SQL Anywhere.

So which error gets returned from these three separate attempts? The short answer is "the best one". The metric that is used in SQL Anywhere is that the further along the parser proceeded, the greater the match with the anticipated dialect. So the error that is returned in this example is for the Transact-SQL attempt, which is "Syntax error near 'WITH' on line 4". The Watcom SQL attempt yielded the error "Syntax error near 'DECLARE' on line 3", which the server suppressed after trying the Transact-SQL dialect.

The point of this example is to illustrate that when mixing SQL dialects, the error message returned to the client is based on the server's best attempt at parsing the input, and consequently the particular error may be counter-intuitive. In the example above, there is nothing wrong per se with the common table expression used in the BEGINblock; it's just that common table expressions are not supported in the SQL Anywhere implementation of the Transact-SQL dialect. The Transact-SQL dialect was assumed because the parser proceeded further through the compound statement using Transact-SQL grammar. If one had added semicolons to the block above:

BEGIN  DECLARE @var int;  SELECT @var = 100;  WITH CountEmployees( DepartmentID, n ) AS  ( SELECT DepartmentID, COUNT( * ) AS n    FROM Employees GROUP BY DepartmentID )  SELECT DepartmentID, n  FROM CountEmployees  WHERE n <= @var;
END;

then this compound statement, too, would yield an error. The statement delimiters would quickly terminate parsing the compound statement as Transact-SQL, yet the use of the Transact-SQL syntax for variable assignment on line 36 would contradict parsing the statement using the Watcom SQL dialect. This latter error would be the one returned in this case.

In this post, originally written by Glenn Paulley and posted to sybase.com in April of 2010, Glenn talks about how keywords are recognized by the SQL Anywhere parser.

Each new SQL Anywhere release brings additional SQL functionality, some resulting from enhancements to the SQL standard, and some resulting from our own innovations to the product. In the forthcoming Version 12, for instance, SQL Anywhere will support the "distinct predicate" from the SQL/2008 standard that has the syntax X IS [ NOT ] DISTINCT FROM Y which permits one to compare two expression values and treat NULLs as comparing equal. The distinct predicate can be useful in situations, particularly involving stored procedure parameters, where one doesn't want to have to specify a different SQL construction when an expression can be NULL. (Aside: while the distinct predicate may have some utility, at the same time, this deviation from SQL's "normal" 3-valued logic arguably increases the already confusing issues surrounding NULLs that have been debated at length elsewhere.)

In the case of the distinct predicate, its syntax permits straightforward adoption and implementation because its syntax - arguably somewhat clumsy - at least doesn't contain any new reserved words. Unfortunately, in SQL it is more often the rule that additional functionality requires the use of additional keywords, and occasionally these keywords must become reserved words in order to avoid ambiguities when parsing the statement. With SQL Anywhere server's custom-built implementation of YACC, we have long-offered the "non_keywords" connection option, which permits a user or application to turn off a specific keyword so that it can be used as an identifier. For example, one would be able to specify:

SET OPTION non_keywords = 'TRUNCATE, SYNCHRONIZE';

In the Version 12 release of SQL Anywhere, we've taken this flexibility one step further, and we now support an additional connection option, "reserved_keywords". The motivation behind this new option is to make server upgrades easier out-of-the-box by automatically excluding keywords from the SQL grammar when the likelihood of conflicts with customer applications is high. As a concrete example, SQL Anywhere 12 supports the LIMIT and OFFSET clauses familiar to those who develop MySQL applications. LIMIT and OFFSET are nearly identical in functionality to SELECT TOP ... START AT. However, introducing LIMIT to the SQL grammar would require it to be a reserved keyword, potentially breaking existing applications that use "limit" as an identifier. Consequently, what we've done is introduce support for LIMIT but to have it disabled by default. To enable the use of the LIMITclause, one can enable its status as a keyword via:

SET OPTION PUBLIC.reserved_keywords = 'LIMIT';

In the server, whether a word is identified as a keyword is determined by the following (in order of precedence):

It appears in the SQL Anywhere list of reserved words.
It has been turned on with the reserved_keywords option.
It has been turned off using the non_keywords option.

The reserved_keywords option offers additional flexibility in how we offer support for SQL language extensions, and our intent is to utilize this option as we offer additional SQL functionality in forthcoming releases of SQL Anywhere.

In this post, originally written by Glenn Paulley and posted to sybase.com in November of 2009, Glenn talks about statistics management in SQL Anywhere, a key feature that enables SQL Anywhere to operate without the need for a DBA to tune performance on the database.

SQL Anywhere has offered autonomic, self-managing statistics collection since 1992 [1]. In SQL Anywhere 12, we are going one step further with the introduction of a "statistics governor" that makes SQL Anywhere column histograms both self-monitoring and self-healing. Self-healing statistics management in SQL Anywhere 12 includes the:

recording and categorization of selectivity estimation errors in queries;
automatic, autonomous correction of statistics errors with low overhead; and
autonomic monitoring and determination of column histogram maintainability.

Determining error

In SQL Anywhere 12, the server monitors the amount of estimation error with each predicate in every search condition executed during query processing. The difference between actual and estimated selectivities is slightly biased against errors with small selectivities to avoid needless corrections. With this adjusted error, the server computes an error metric for each column histogram based on the number of predicates over that column and the amount of error. This metric is biased towards predicates that encounter significant differences between estimated and actual values, involving step functions when the error is between 20 and 35%, and another step when the error is greater than 35%. If this computed metric is greater than a threshold value, and the server has encountered more than 20 predicates involving that column, then the column histogram is considered a candidate for repair.

Repairing a column histogram

Once the server determines that a column histogram needs repair, the server has three methods at its disposal to attempt to correct the problem. They are (in order of attempts):

Piggybacking column statistics upon other queries over the same table data;
re-creating the column histogram from scratch using index statistics (for indexed columns); and
automatically performing a sampling table scan during periods that the server is relatively idle using a daemon process.

With (1) the server will attempt to gather column statistics by utilizing a scan operator from another query. If the (table or index) scan operator processes at least 70% of the rows in the table, the existing histogram is replaced outright; if less, the existing histogram is adjusted based on the selectivities of the actual values encountered, adjust for the unobserved percentage of the table. For string histograms, the replacement and correction policies are different since the underlying histogram implementation differs substantially from numeric values.

If the server is unable to gather updated statistics via a piggybacked scan - perhaps because the application workload does not contain a query that scans the table in its entirety - then the server will attempt to recreate the column histogram using the upper levels of a primary key, foreign key, or secondary index. If the column is not a leading column of any index, then as a last resort the server will scan the table using a background process. This background process will perform a stratified table scan of 100 + 1% of the table's pages. Sampling is done by partitioning the table pages into n equal-size blocks, where nis the number of pages to be sampled; then the server randomly selects one sample page from each block to use in computing a replacement histogram.

Self-monitoring statistics usage

The server automatically persists updated histograms - which are stored in the ISYSCOLSTAT catalog table - at least every 30 minutes (and during every CHECKPOINT). This is performed by a daemon known as the statistics flusher. In addition to the above self-healing mechanisms, SQL Anywhere 12 includes a statistics monitoring daemon, called the statistics cleaner, that tracks statistical errors over time. If the monitoring daemon determines that a particular histogram continuously requires rebuilding because its error metric continues to remain high, the daemon will automatically execute a DROP STATISTICS statement and will disable auto-creation for that histogram. While the histogram is unavailable, the query optimizer will rely exclusively on index statistics or magic values for selectivity estimation, rather than utilize the (erroneous) values contained in the histogram which are primarily caused by excessive concurrent "churn" in the data from simultaneously-executing connections. At a later point, the server will automatically re-create the column histogram, using one of the above three reconstruction methods, in an effort to correct the anomaly. In our experience, SQL Anywhere's self-managing statistics management is robust and is capable of handling a wide variety of workloads and update scenarios. Nonetheless, various aspects of statistics collection can be controlled, if necessary, by the DBA, and this is also true with both the statistics flusher and statistics cleaner processes. The behaviours of both tasks are customizable through the use of the sa_server_option system procedure.

[1] I. T. Bowman et al. (April 2007). SQL Anywhere: A Holistic Approach to Database Self-Management. In Proceedings, 2nd International IEEE Workshop on Self-Managing Database Systems, Istanbul, Turkey.

In this post, originally written by Glenn Paulley and posted to sybase.com in September of 2010, Glenn talks about how the optimizer takes advantage of clustered indices and describes some of the statistics associated with clustered indices. Understanding this information can help you to determine whether or not a declared clustered index is being used effectively and to decide whether a rebuild of the index is a good idea to improve performance.

SQL Anywhere has offered support for clustered indexes since the 8.0.2 release. There is no physical difference between clustered and non-clustered indexes in SQL Anywhere; any index can be marked as clustered, including those indexes implicitly created for the maintenance of referential integrity constraints. Moreover, one can apply or remove the "clustering" attribute of an index using the ALTER INDEX statement. Since clustering is a hint, sorting is still required if a query contains an ORDER BY clause.

If an index is marked as clustered - and there can be at most one clustered index per table - then the server will attempt to ensure that the physical ordering of the rows in the table's pages matches, as closely as possible, the ordering of the corresponding index entries in that clustered index when the rows are first inserted. The advantage, of course, of a clustered index is that the server can take advantage of its clustering property during range scans, as clustered indexed retrieval during query processing should require reading the minimal number of table pages.

Clustering statistics

Beginning with the SQL Anywhere 10 release, the server maintains clustering statistics about each index in the database regardless of whether or not the index is declared CLUSTERED. These statistics can be found in the SYS.SYSPHYSIDX system view, and can be queried using a query such as the one below:

SELECT tbl.creator, tbl.table_name, ix.table_id, ix.index_name,      COALESCE((IF tbl.clustered_index_id = ix.index_id THEN 'Y'                ELSE 'N' ENDIF), 'N') AS clustered,      pix.depth, pix.seq_transitions,      pix.rand_transitions, pix.rand_distance,      pix.key_value_count, pix.leaf_page_count
FROM SYS.SYSIDX ix JOIN SYS.SYSPHYSIDX pix ON (ix.table_id = pix.table_id            AND ix.phys_index_id = pix.phys_index_id )    JOIN SYS.SYSTAB tbl on (tbl.table_id = ix.table_id)
WHERE tbl.creator = 1 AND pix.depth > 1
ORDER BY tbl.table_name;

The statistics in the SYS.SYSPHYSIDX system view are correct as of the last database checkpoint.

The values depth, key_value_count and leaf_page_count are straightforward; these represent the number of levels in the index, the number of distinct key values, and the number of leaf pages in the index respectively. The other values queried above, namely seq_transitions, rand_transitions, and rand_distance, are defined as follows.

Consider any two adjacent (in lexicographic order) index entries in an index. If the two base table rows corresponding to these index entries are on the same table page, then the index entries are said to involve zero transitions. If the two base table rows are on adjacent table pages, then these two rows constitute a sequential transition. If they are further apart than adjacent pages, the two rows constitute a random transition. The number of table pages between the rows is termed the random distance.

With these statistics, the query optimizer can assess the clustering characteristics of any index, declared "clustered" or not, and adjust its cost estimates accordingly. Here's an example:

The indexes on lines 4, 5, and 6 in the display above are over the same base table, which has approximately five million rows. The index on line 4, which contains 4,956,540 distinct key values, is quite clustered even though it isn't declared as such: there are 41,747 sequential transitions and only 5,134 random ones (the missing number is the number of zero transitions). With these statistics, the query optimizer will assume the clustering property holds at execution time, resulting in the minimal number of table page retrievals when accessing the data through that index.

In contrast, consider the index on line 6, which is declared CLUSTERED. This index is problematic; it has only 6,123 sequential transitions and 4,806,149 random ones. Hence this index isn't really clustered at all; if we take the random distance amount and average it over all of the key values, each pair of adjacent index entries point to base table rows 2 pages apart. This extra retrieval cost is taken into account by the optimizer when determining the cost of an access plan that uses this index. As such, the DBA may determine that this index is a candidate for reorganization, either via the REORGANIZE INDEX statement or via the ALTER INDEX REBUILD statement.

The index on line 5 has approximately equal numbers of sequential and random transitions, indicating an equal mix of clustered and non-clustered rows. Reorganization may be beneficial in this case as well, though not to the same degree as the index displayed on line 6.

Index density and skew

The index statistics described above are readily available from the SYS.SYSPHYSIDX system view, as they are computed on-the-fly during index maintenance operations. There are three other statistics that can be retrieved from a SQL Anywhere server, but these involve scanning index and/or table pages in real time to determine their values.

The first is the output of "dbinfo -u", the dbinfo utility. If you specify "-u" on the dbinfo command line, dbinfo will compute page usage statistics for the entire database, which can give the DBA an idea of the amount of internal page fragmentation that the database contains.

The second and third statistics are index density and index skew, which are returned by the sa_index_density() system procedure. Calling this procedure results in a complete index scan and the procedure returns a result set containing the index density and skew values for the specified index.

Index density is the fraction (between 0 and 1) of the amount of space utilized in each index page. For B+tree indexes a density of 0.60 to 0.80 is typical, depending on a variety of factors including database page size, the size (and variance of the size) of the key, and the amount of key compression that can be attained.

Index skew is a measure of the "balance" of an index. Like many other B+tree index implementations, in SQL Anywhere indexes are not re-balanced upon delete operations. Consequently, in a table with significant "churn" it is possible for portions of the index to become relatively sparse, whereas index pages for other ranges of values may be nearly 100% full. The skew measure that is returned by sa_index_density() is the standard deviation of the natural logarithm of the number of index entries per page. By definition, skew cannot be less than 1.0.

Here's an example:

In the above example, we called sa_index_density() with no parameters so that density and skew statistics were returned for every index in the database.

Line 259 in the display above corresponds to the CLUSTERED index described above (line 6). The density is 0.54, meaning that approximately 50% of each index page is being utilized, which is below average. On the positive side, the skew is approximately 1.27, which is not too bad - recall that the natural logarithm of 100 is 4.605 and of 500 is 6.21, so a standard deviation of 1.27 isn't entirely unreasonable.

However, line 260 in this display represents the same unclustered index as on line 5 above, which has merely 24 unique values in five million rows. Here, the density is also low - 0.54 - but more importantly the skew is 2.105, indicative of an index that is more unbalanced and perhaps requiring reorganization.

Did you know that SQL Anywhere has an associated Wiki site? It’s true! Check it out here:

http://wiki.scn.sap.com/wiki/display/SQLANY/

Why have a SQL Anywhere Wiki site?

The SQL Anywhere Wiki site is intended to complement the existing SQL Anywhere Documentation ( http://dcx.sap.com/ ) by providing walkthroughs, examples, and explanations of how different components of SQL Anywhere can be used to form parts of larger solutions.

The current SQL Anywhere Wiki site already has a number of older articles migrated from Sybase.com and we are continuing to migrate even more. We are also posting a number of new articles (including a GPIO tutorial for SQL Anywhere on Linux ARM!) and are continuously posting new content often. The primary intent is that the Wiki will be a dynamic reference for all customers to help them build and deploy SQL Anywhere component configurations with greater ease and speed.

Would you like to help us build up this Wiki resource? We would love to have your help and feedback! There are two ways you can currently help contribute to the SQL Anywhere Wiki:

1. Submit a new Wiki article

Do you have a tutorial / Wiki topic that isn’t currently covered by the existing Wiki articles or the SQL Anywhere documentation and would like to share it with the wider SQL Anywhere community? The first step to creating a new Wiki article is to sign in to SCN:

New wiki pages are always created in the SQL Anywhere staging area so that they can be moderated before being published:

http://wiki.scn.sap.com/wiki/display/stage/Staging+Area+for+SQL+Anywhere

To get to the SQL Anywhere staging area to create a new Wiki document, go to the right-hand side and click on one of the “Staging Area” links:

Once in the Staging Area, click on “Create” and then “Knowledge Management Template”:

You can always return to a document in the draft area and continue editing it at a later time.

Alternatively, you can then click on the following link directly to create a new article in the staging area:

http://wiki.sdn.sap.com/wiki/pages/createpage-entervariables.action?spaceKey=stage&fromPageId=361234695&templateId=353435649

Once you have finished creating the article, send a message to one of the Wiki moderators with the page URL in order to have the article reviewed and moved to the main Wiki site:

Jeff Albion

2. Edit an existing Wiki article

Is there a problem with an existing article and can you make the changes to correct it? Thank you for your assistance! It’s very simple to do – first, sign in to SCN:

Once logged in, navigate to the page that requires an update and click on ‘Edit’ or type ‘e’:

Edit the page contents in the Wiki editor, then click ‘Save’ or type ‘Ctrl + S’:

Thank you for helping to create and maintain the SQL Anywhere Wiki!

In this post, originally written by Glenn Paulley and posted to sybase.com in February of 2011, Glenn talks about the effect of using snapshot isolation in combination with materialized views.

Snapshot isolation and materialized views are two important features that have been part of the SQL Anywhere server since the Version 10 release, which first shipped in September 2006. In the past, I've written articles that explain the tradeoffs of using snapshot isolation and presented material on the space-time tradeoffs of materialized views.

In one article on snapshot isolation, I wrote:

Of course, snapshot isolation doesn't come for free. It is necessary for the database system to construct archive copies of changed data in anticipation of new snapshot transactions. With SQL Anywhere, copies of snapshot rows are managed automatically, written to the temp file (which grows on demand) as necessary. However, though the management impact is near zero, query performance can suffer as snapshot rows may need to be fetched individually from the snapshot row store in the temp file, based on the snapshot semantics of the transaction. The degree of performance degradation depends entirely on the application and its workload, and will be worse with update-intensive workloads.

In this article, I'll describe the interaction between snapshot isolation and materialized views, particularly as it pertains to the SQL Anywhere server's transaction log, and discuss the inherent tradeoffs.

Refreshing materialized views

With deferred maintenance materialized views, modifications to the materialized view's underlying base tables proceed without any additional locking or (immediate) maintenance overhead. Put another way, update transactions modify the values or rows of base tables, and upon COMMIT these changes are made persistent. However, in this situation the materialized view contents then become stale; whether or not an SQL statement can exploit the stale contents of the materialized view is controlled by setting options for that query's connection. To refresh the view's contents, one issues the REFRESH MATERIALIZED VIEW statement on that view, which effectively performs a TRUNCATE TABLE on the base table containing the view, and then immediately performs an INSERT ... FROM SELECT to re-populate the materialized view.

On the other hand, with immediately-maintained materialized views a REFRESH MATERIALIZED VIEW statement is required only to initially populate the view. After that, any modifications to underlying base tables are correspondingly applied to the immediately-maintained materialized view(s) that refer to those base tables within the same transaction. When the transaction completes, either the transaction performs a COMMIT to make the changes permanent, or issues a ROLLBACK to undo them.

Example

Consider the following simple, contrived example - we create a simple, single-table immediately-maintained materialized view over the Products table in the sample DEMO database:

CREATE MATERIALIZED VIEW groupo.shirt_products( prod_id, prod_name, prod_description, prod_size, prod_color, prod_quantity, prod_unit_price)
AS
SELECT "id", "name", "description", "size", color, quantity, unitprice
FROM Products
WHERE "name" LIKE '%shirt%'

To make the view immediately-maintained, we first define a unique index on the table instantiating the view:

CREATE UNIQUE INDEX products ON groupo.shirt_products (prod_id ASC);

and then specify that the view should be maintained immediately:

ALTER MATERIALIZED VIEW groupo.shirt_products IMMEDIATE REFRESH

and, finally, initialize the materialized view:

REFRESH MATERIALIZED VIEW groupo.shirt_products

With the creation of the materialized view now complete, we start a new transaction and modify some of the underlying rows of the Products base table:

UPDATE products SET description = 'Modified' WHERE "name" LIKE '%Tee Shirt%'; COMMIT

If we look at the contents of the materialized view, we can see that the modifications to the underlying Products table are now reflected in the view:

Materialized views and the transaction log

If we look at the contents of the transaction log for the DEMO database at this point - using the DBTRAN utility - we see the following:

--CONNECT-1010-0000975095-DBA-2011-02-02 11:38
--BEGIN TRANSACTION-1010-0000975106
BEGIN TRANSACTION
go
--SQL-1010-0000975109
begin  set temporary option first_day_of_week = '7';  set temporary option date_order = 'YMD';  set temporary option nearest_century = '50';  set temporary option date_format = 'YYYY-MM-DD';  set temporary option timestamp_format = 'YYYY-MM-DD HH:NN:SS.SSS';  set temporary option time_format = 'HH:NN:SS.SSS';  set temporary option default_timestamp_increment = '1';  set temporary option timestamp_with_time_zone_format = 'YYYY-MM-DD HH:NN:SS.SSS+HH:NN';  create materialized view groupo.shirt_products( prod_id,prod_name,prod_description,prod_size,prod_color,prod_quantity,prod_unit_price )    as select products.id,products.name,products.description,products.size,products.color,products.quantity,products.unitprice      from GROUPO.products      where products.name like '%shirt%';  set temporary option first_day_of_week = ;  set temporary option date_order = ;  set temporary option nearest_century = ;  set temporary option date_format = ;  set temporary option timestamp_format = ;  set temporary option time_format = ;  set temporary option default_timestamp_increment = ;  set temporary option timestamp_with_time_zone_format = ;
end
go
--COMMIT-1010-0000976304
COMMIT WORK
go
--BEGIN TRANSACTION-1010-0000976307
BEGIN TRANSACTION
go
--SQL-1010-0000976310
comment to preserve format on view groupo.shirt_products is  'create materialized view groupo.shirt_products( prod_id, prod_name, prod_description, prod_size, prod_color, prod_quantity, prod_unit_price)
as select "id", "name", "description", "size", color, quantity, unitprice
from products
where "name" like ''%shirt%'''
go
--COMMIT-1010-0000976661
COMMIT WORK
go
--BEGIN TRANSACTION-1010-0000976664
BEGIN TRANSACTION
go
--SQL-1010-0000976667
create unique index products on groupo.shirt_products(prod_id asc)
go
--COMMIT-1010-0000976745
COMMIT WORK
go
--BEGIN TRANSACTION-1010-0000976748
BEGIN TRANSACTION
go
--SQL-1010-0000976751
alter materialized view groupo.shirt_products immediate refresh
go
--COMMIT-1010-0000976826
COMMIT WORK
go
--CHECKPOINT-0000-0000976829-2011-02-02 11:39
--BEGIN TRANSACTION-1010-0000976859
BEGIN TRANSACTION
go
--SQL-1010-0000976862
refresh materialized view groupo.shirt_products
go
--COMMIT-1010-0000976921
COMMIT WORK
go
--BEGIN TRANSACTION-1010-0000976924
BEGIN TRANSACTION
go
--UPDATE-1010-0000977130
UPDATE GROUPO.Products  SET Description='Modified'
WHERE ID=300
go
--UPDATE-1010-0000977152
UPDATE GROUPO.Products  SET Description='Modified'
WHERE ID=301
go
--UPDATE-1010-0000977174
UPDATE GROUPO.Products  SET Description='Modified'
WHERE ID=302
go
--COMMIT-1010-0000977494
COMMIT WORK
go

There are a few things I'd like to point out regarding the transaction log contents above:

Note that the CREATE MATERIALIZED VIEW statement appears as part of a batch (lines 18-40) that includes statements to re-establish pertinent connection option settings that were in effect at the time the CREATE MATERIALIZED VIEW statement was issued.
The REFRESH MATERIALIZED VIEW statement is logged as a separate statement in the transaction log (line 85). By default, the REFRESH MATERIALIZED VIEW statement uses WITH SHARE MODE locking if the WITH clause is not specified. With immediately-maintained views, only WITH SHARE MODE, WITH EXCLUSIVE MODE, and WITH ISOLATION LEVEL SERIALIZABLE are permitted lock modes.
Row modifications to the shirt_products materialized view do not appear in the transaction log. Since the view is an immediately-updateable view, it is sufficient for the transaction log to contain only the base table modifications caused by the UPDATE statement (lines 94-105). If the log is replayed during recovery, the UPDATE statements in the transaction log will automatically invoke the corresponding modifications to the materialized view.

Impact of snapshot isolation

Enabling a SQL Anywhere database for snapshot isolation is done by setting the option allow_snapshot_isolation as follows:

SET OPTION PUBLIC.allow_snapshot_isolation = 'On';

When snapshot isolation is enabled, the default isolation setting for the REFRESH MATERIALIZED VIEW statement is WITH ISOLATION LEVEL SNAPSHOT, so that the connection executing the REFRESH statement is not blocked by other concurrent update transactions that are simultaneously modifying the underlying base tables referenced in the materialized view.

While this is advantageous, there is a tradeoff. The transaction log must contain enough detail about the modifications to the database that we can be confident that applying the log(s) to an arbitrary backup during recovery will re-establish as precisely as we can the contents of the database. With a REFRESH MATERIALIZED VIEW WITH ISOLATION LEVEL SNAPSHOT statement, the statement will "see" the rows of the underlying base tables according to snapshot semantics - that is, the REFRESH statement will fail to "see" any modified rows by committed transactions that took place after the REFRESH statement was started. To properly replay that REFRESH statement from a transaction log during recovery, the server would need to retain the state of every row of every underlying base table so that the REFRESH statement can compute the identical contents of the materialized view.

Similar phenomena can occur if the REFRESH MATERIALIZED VIEW statement is run at any isolation level other than SERIALIZABLE, SHARE MODE, or EXCLUSIVE MODE. Rather than attempt to provide all of this necessary context in the transaction log, when a weaker isolation level is used with a REFRESH MATERIALIZED VIEW statement the server logs both the REFRESH statement and each individual row inserted to the materialized view, rather than log the REFRESH statement alone.

In this case, if one looks at the SQL statements generated by the transaction log utility DBTRAN, the DBTRAN output will only include the REFRESH MATERIALIZED VIEW statement, since the REFRESH statement is all that is necessary when translating a log to SQL to apply it to a different database. On recovery, however, the server ignores the REFRESH MATERIALIZED VIEW statement, and utilizes the individual INSERT statements in the log to recover the materialized view contents.

We are considering enhancements to DBTRAN so that the INSERT statements logged in these cases appear in the output as comments, so that their existence can be verified.

In this post, originally written by Glenn Paulley and posted to sybase.com in February of 2011, Glenn introduces a list of application architecture components that, if poorly designed/handled, can have significant negative impact on the performance of the application.

Inevitably, at some point performance becomes an issue for many database applications. Performance analysis is often problematic simply because there are so many variables, which include the characteristics of the hardware, the workload, physical database design, and application design, and because these considerations have tradeoffs and side-effects - there are usually no right answers.

Some time ago SQL Anywhere consultant Breck Carter wrote an article entitled How to Make SQL Anywhere Slow, possibly one of my all-time favourite posts. Breck's article enumerates 38 different database design, application design, and server configuration settings that can lead to poor performance. In a forthcoming series of articles, which I've somewhat brashly called the Seven Deadly Sins of Database Application Performance, I'll write at length about seven specific issues that I believe are deserving of additional explanation.

The Seven Deadly Sins of Database Application Performance are:

Poor physical database design decisions: schema design issues, table column order, indexing, database page size.
Lock contention: due to hot rows or running at higher ANSI isolation levels.
Iterative, nested-iteration execution, including but not limited to the use of nested queries, user-defined functions, and client-side joins.
Performing more work than necessary in a query, either due to the amount of data accessed or retrieved, or to overly-complex queries that are difficult to optimize.
Inefficient client-server interactions, possibly involving prefetch settings, wide inserts and/or fetches, the use of prepared statements, and the re-fetching of the same information over and over from the server.
Choice of optimization goal for a SELECTstatement.
Choice of transaction model - particularly the use of auto COMMIT.

In this post, originally written by Glenn Paulley and posted to sybase.com in April of 2011, Glenn talks about some of the critical components of database design and how they can impact overall application performance.

I previsouly introduced the Seven Deadly Sins of Database Application Performance. Our first deadly sin concerns physical database design, which includes schema design issues, table column order, indexing, and choice of database page size, to name but a few important factors, that can adversely impact performance. In this article, I want to explore a particular point surrounding database design in SQL Anywhere, and that concerns the concept of domains.

What are domains?

Domains are part of the original relational model of data as first developed by E. F. Codd. In his seminal text [1, pp. 81], Chris Date defines a domain as follows:

Next, we define a domain to be a named set of scalar values, all of the same type. For example, the domain of supplier numbers is the set of all possible supplier numbers, the domain of shipment quantities is the set of all integers greater than zero and less than 10,000 (say). Thus domains are pools of values, from which actual attribute values are drawn.

In a nutshell, Codd defined domains in the relational model as we might define strong typing in a programming language. In a way similar to that which occurs in programming languages, one intention behind the use of domains is to prevent application developers from making mistakes, such as comparing, for example, invoice numbers to customer numbers. While both sets of values may be (say) integers, comparing an invoice number to a customer number would - normally - not make a great deal of sense.

While that's the theory, supporting domains in commercial relational database products has not gained significant traction over the past forty years. Sure, virtually every commercial database system, including SQL Anywhere, supports the definitions of DOMAINs - sometime referred to as user-defined data types - but the implementation of these DOMAINs and the intent of domains in the relational model are two very different things. All commercial RDBMS systems permit loose typing; SQL Anywhere, for example, will happily execute an SQL statement that contains a comparison between a numeric column and a string. Ivan Bowman has written a whitepaper that outlines the semantics of the implicit type conversions in SQL Anywhere that take place when a mismatched comparison must be evaluated. (Aside: commercial products each have their own implicit conversion rules, and the semantics of such comparisons are implementation-defined.)

Such flexibility may seem advantageous to application developers, but it's not. It's a sin. Let me explain why.

SQL is not "C" - or any other programming language

It is exceedingly important that application developers remember that SQL is a data sub-language based on tuple relational calculus. In a nutshell, that means that SQL is based on first-order predicate logic, where the query specifies what is to be computed, but not how. (Aside - I well realize that the SQL language now embodied in the SQL/2008 standard contains a mix of relational calculus and relational algebra -INTERSECT, EXCEPTandUNIONare set-level algebraic operators - but in the main an SQLSELECTblock still is calculus-based.)

By basing its semantics on predicate logic, SQL permits an implementation to translate the original calculus-based query into an access plan, hopefully an efficient one, of algebraic operations. Different algebraic rewritings are of course possible, so long as the ordering of those operations yields an equivalent result. As an example, suppose we have the following nested query:

SELECT *
FROM CORPORATE_SUPPLIERS AS CS
WHERE CS.SUPPLIER_ID IN ( SELECT S.SUPPLIER_ID                          FROM LOCAL_SUPPLIERS AS S                          WHERE S.ADDRESS LIKE '%Ontario%'                                AND S.REGION = 'Eastern Canada' )

Here, we are in search of suppliers who are listed as Eastern Canadian suppliers in the LOCAL_SUPPLIERS table, and who are in turn extant in the CORPORATE_SUPPLIERS table.

So far, so good. We have written this query as a nested query, where the semantics seem clear: first, find those suppliers in the LOCAL_SUPPLIERS table who are from Ontario, and then use that intermediate result as a filter so that only the suppliers from the CORPORATE_SUPPLIERS table with the same key are returned.

With a small amount of background knowledge, it should be clear that the above query should be equivalent to this SQL query, written as a join:

SELECT DISTINCT CS.*
FROM CORPORATE_SUPPLIERS AS CS, LOCAL_SUPPLIERS AS S
WHERE CS.SUPPLIER_ID = S.SUPPLIER_ID AND      S.ADDRESS LIKE '%Ontario%' AND S.REGION = 'Eastern Canada'

This sort of transformation is at the very root of query optimization. The query optimizer is free to re-arrange the query and evaluate the predicates in any order as long as the original semantics are maintained. In the main, the sophistication of a query optimizer is its ability to perform these kinds of rewritings, amongst other things, to determine the least expensive access plan for this SQL request - or, more correctly, the least expensive plan it can find in a relatively short, finite amount of time.

The fly in the ointment, however, is weak typing.

What if the supplier numbers in the CORPORATE_SUPPLIERS table were numeric, but the supplier numbers in the LOCAL_SUPPLIERS table were a mix of values, some numeric and some alphanumeric? Suppose the application developer knows that suppliers in Eastern Canada are guaranteed to have numeric identifiers - then, if the query is executed using the semantics of nested iteration, then all will be well. However, if the query is executed as a join, the optimizer may choose to evaluate the query's predicates in an order that will not guarantee that only Eastern Ontario suppliers will be joined to the CORPORATE_SUPPLIERS table - and, because SQL Anywhere uses a numeric domain to compare a number and a string, and an alphanumeric string will not convert properly, a data exception may result.

I chose this example deliberately, because entity-type hierarchies often lead to these sorts of problems, which are extremely difficult to discover during testing because the generation of the error is access-plan specific. The real issue - or sin, if you will - is the poor choice of data types for the identifiers of the two tables. There are many reasons to utilize surrogate keys in a schema - I have described some of these issues elsewhere in this blog - but standardizing domains (or data types, at a bare minimum) are important to ensure that queries return expected results, and the query optimizer can use the appropriate indexes, if they exist, to speed retrieval of rows that match columns or values of the same type (i.e. same domain). Otherwise, the application developer is forced to resort to "SQL gymnastics", such as the use of IF-expressions, to try to prevent predicate re-orderings that can lead to such anomalies. Unfortunately, IF-expressions and similar constructions are often complex to write, difficult to maintain, expensive to evaluate, and are not sargable.

[1] C. J. Date (1995). An Introduction to Database Systems, Sixth Edition. Addison-Wesley.

SQLAnywhere 16.0.0 SP32 Build 2087 is now available for download from http://support.sap.com

Cheers,

-bret

In this post, originally written by Glenn Paulley and posted to sybase.com in July of 2011, Glenn talks about the trade-offs when using different isolation levels and updating data in the database.

Routinely, application developers trade off serializable transaction semantics in favour of better execution time performance by limiting the potential for lock contention. Few and far between are applications that execute at ISO/ANSI SQL isolation level 3, SERIALIZABLE. Indeed, the SQL Anywhere default isolation level is zero - READ UNCOMMITTED - except for JDBC applications, where the default is READ COMMITTED.

At the READ UNCOMMITTED isolation level with SQL Anywhere, only schema locks and write row locks are acquired by a transaction during its operation; read row locks are never acquired, and so at READ UNCOMMITTED write transaction do not block read transactions. On the flip side, however, SQL Anywhere does not guarantee semantics at the READ UNCOMMITTED isolation level. To use the common parlance, you get what you pay for. With many applications, the risk and/or impact of uncommitted rows is low; sometimes this can lead to complacency about what READ UNCOMMITTED really means. In this post, I want to illustrate an example where the impact is more obvious.

Set-level UPDATE operations

In the SQL Anywhere Version 5.5 release, circa 1997, we introduced full support for set-level UPDATE statements that could modify columns that were part of a table's PRIMARY KEY, UNIQUE constraint, or part of a unique index, in support of the ISO SQL/1992 standard which was the current SQL standard at that time. To illustrate, suppose we have the following table:

CREATE TABLE updkey ( a INTEGER PRIMARY KEY, b INTEGER UNIQUE, c VARCHAR(500) )

populated by the following INSERT statement:

INSERT INTO updkey(a,b,c) SELECT row_num, row_num, 'test string' FROM rowgenerator WHERE row_num

In this example, we desire to renumber the "b" values of all ten rows using a single atomic statement. We can do so as follows:

UPDATE updkey SET b = 11-b, c = 'New value'

Processing the UPDATE statement row-by-row clearly won't do, since the update of any single row in the updkey table will immediately violate the uniqueness constraint on column "b". (Aside: you may be thinking the WAIT_ON_COMMIT connection option might help here, but WAIT_ON_COMMIT only affects referential integrity constraints, not uniqueness constraints). Consequently, Version 5.5 of SQL Anywhere provided a different mechanism to perform the update, and it has implications for lower levels of concurrency control, as we shall see.

The HOLD temporary table

When the SQL Anywhere server processes an UPDATE or MERGE statement and encounters a uniqueness constraint violation on a primary key, unique index, or unique constraint, the server automatically creates an unnamed "hold" temporary table to temporarily store the problematic rows. The temporary table contains both the before and after values of a row, so that AFTER row and AFTER statement triggers can work correctly. Processing the rows is done row-by-row as follows:

If the row can be modified without a uniqueness constraint violation, the update proceeds normally.
If the modification causes a uniqueness constraint violation, then
1. the row's contents, along with its new values, are copied to the hold temporary table;
2. the row, along with its index entries, is - temporarily - deleted from the base table. No DELETE triggers are fired for this temporary deletion.
any appropriate AFTER row triggers are fired for this row.

Once all of the rows have been processed, any deleted rows that have been copied to the hold temporary table are then re-inserted into the base table, with the modified values from the UPDATE or MERGE statement. The order in which the rows from the hold temporary table are processed is not guaranteed. If the re-insertion of any of the saved rows still causes a uniqueness violation, then the entire UPDATE or MERGE statement is rolled back, and the uniqueness constraint violation is reported back to the application.

Only if all row modifications are successful are any AFTER statement triggers fired for the request.

Implications

The effect of deleting rows during the execution of an INSERT or MERGE statement can impact the results of

an SQL statement that queries the same table, issued within an AFTER row trigger that is fired for the UPDATE or MERGE statement that initiated the action; or
any other connection, including event handlers, that are not executing at the SERIALIZABLE or SNAPSHOT isolation levels.

The semantics of this processing of set-level update operations is somewhat counter-intuitive, since on the surface you might expect that another connection concurrently querying the table would either "see" the old row values, or the new row values. However, with set-level update operations on tables with uniqueness constraints, there is the possibility that other connections will not see a particular row at all, depending on the isolation level being used. If the other connection is executing at the SERIALIZABLE isolation level, it will block until the transaction doing the update issues a COMMIT or ROLLBACK. If the other connection is executing at SNAPSHOT isolation, that transaction will continue to see the original values of the modified rows for the duration of that transaction.

This detailed, complex behaviour has previously been undocumented. It will appear in the standard documentation in the next major release of SQL Anywhere.

SQL Anywhere12.0.1 SP88 Build 4231 for the Win64 platform is now available for download from http://support.sap.com

(see guys, I'm learning!)

Cheers,

-bret

In this post, originally written by Glenn Paulley and posted to sybase.com in March of 2012, Glenn talks about some of the limitations related to the SQL Anywhere remote data access functionality.

Proxy tables, sometimes referred to Remote Data Access or OMNI, are a convenient way to query or modify tables in different databases all from the same connection. SQL Anywhere's proxy tables are an implementation of a loosely-coupled multidatabase system. The underlying databases do not have to be SQL Anywhere databases - any data source that supports ODBC will do, so the underlying base table for the proxy can be an Oracle table, a Microsoft SQL Server table, even an Excel spreadsheet. Once the proxy table's schema is defined in the database's catalog, the table can be queried just like any other table as if it was defined as a local table in that database.

That's the overall idea, anyway; but there are some caveats that get introduced as part of the implementation, and I'd like to speak to one of these in particular. My post is prompted by a question from a longstanding SQL Anywhere customer, Frank Vestjens, who in early February in the NNTP newsgroup sybase.public.sqlanywhere.general queried about the following SQL batch:

begin  declare dd date;  declare tt time;  declare resultaat numeric;  //  set dd = '2012-06-07';  set tt = '15:45:00.000';  //  message dd + tt type info to console;  //  select first Id into resultaat  from p_mmptankplanning  where arrivalDate + IsNull(arrivaltime,'00:00:00') <= dd+tt  order by arrivaldate+arrivalTime,departuredate+departureTime;
end

The batch works fine with a local table p_mmptankplanning but gives an error if the table is a proxy table; the error is "Cannot convert 2012-06-0715:45:00.000 to a timestamp".

Operator overloading

In SQL Anywhere, multidatabase requests are decomposed into SQL statements that are shipped over an ODBC connection to the underlying data source. In many cases, the complete SQL statement can be shipped to the underlying server, something we call "full passthrough mode" as no post-processing is required on the originating server - the server ships the query to the underlying DBMS, and that database system returns the result set which is percolated back to the client. Since the originating server is a SQL Anywhere server, the SQL dialect of the original statement must be understood by SQL Anywhere. If the underlying DBMS isn't SQL Anywhere, then the server's Remote Data Access support may make some minor syntactic changes to the statement, or try to compensate for missing functionality in the underlying server.

The SQL statement sent to the underlying DBMS, whether or not the statement can be processed in full passthrough mode or in partial passthrough mode, is a string. Moreover, SQL Anywhere can ship SELECT, INSERT, UPDATE, DELETE and MERGE statements to the underlying DBMS - among others - but lacks the ability to ship batches or procedure definitions.

So in the query above, the problem is that the query refers to the date/time variables dd and tt, and uses the operator + to combine them into a TIMESTAMP. Since SQL Anywhere lacks the ability to ship an SQL batch, what gets shipped to the underlying DBMS server is the SQL statement

select first Id into resultaat  from p_mmptankplanning  where arrivalDate + IsNull(arrivaltime,'00:00:00') <=  '2012-06-07' + '15:45:00.000'  order by arrivaldate+arrivalTime,departuredate+departureTime;

and now the problem is more evident: in SQL Anywhere, the '+' operator is overloaded to support both operations on date/time types, and on strings; with strings, '+' is string concatentation. When the statement above gets sent to the underlying SQL Anywhere server, it concatenates the two date/time strings to form the string '2012-06-0715:45:00.000' - note no intervening blank - and this leads directly to the conversion error. Robust support for SQL batches would solve the problem, but we have no plans to introduce such support at this time. A workaround is to compose the desired TIMESTAMP outside the query, so that when converted to a string the underlying query will give the desired semantics. However, even in that case care must be taken to make sure that the DATE_ORDER and DATEFORMAT option settings are compatibile across the servers involved.

My thanks to my colleague Karim Khamis for his explanations of Remote Data Access internals.

In this post, originally written by Glenn Paulley and posted to sybase.com in May of 2012, Glenn talks about concurrency control and the consequences of using the various options available with SQL Anywhere.

Back in 2011 I wrote an article entitled "The seven deadly sins of database application performance" and I followed that introductory article in April 2011 with one regarding the first "deadly sin" that illustrated some issues surrounding weak typing within the relational model.

In this article I want to discuss the implications of concurrency control and, in particular, the tradeoffs in deciding to use the weaker SQL standard isolation levels READ UNCOMMITTED and READ COMMITTED.

Contention through blocking

Most commercial database systems that support the SQL Standard isolation levels [3] of READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE use 2-phase locking (2PL), commonly at the row-level, to guard against update anomalies by concurrent transactions. The different isolation levels affect the behaviour of reads but not of writes: before modifying a row, a transaction must first acquire an exclusive lock on that row, which is retained until the transaction performs a COMMIT or ROLLBACK, thus preventing further modifications to that row by another transaction(s). Those are the semantics of 2PL.

Consequently, it is easy to design an application that intrinsically enforces serial execution. One that I have written about previously - Example 1 in that whitepaper - is a classic example of serial execution. In that example, the application increments a surrogate key with each new client to be inserted, yielding a set of SQL statements like:

UPDATE surrogate SET @x = next key, next key = next key + 1 WHERE object-type = 'client';
INSERT INTO client VALUES(@x, ...);
COMMIT;

Since the exclusive row lock on the 'client' row in the surrogate table is held until the end of the transaction, this logic in effect forces serialization of all client insertions. Note that testing this logic with one, or merely a few, transactions will likely fail to trigger a performance problem; it is only at scale that this serialization becomes an issue, a characteristic of most, if not all, concurrency control problems except for deadlock.

Hence lock contention, with serialization as one of its most severe forms, is difficult to test because the issues caused by lock contention are largely performance-related. They are also difficult to solve by increasing the application's degree of parallelism, since that typically yields only additional waiting threads, or by throwing additional compute power at the problem, for, as sometimes stated by my former mentor at Great-West Life, Gord Steindel: all CPUs wait at the same speed.

Why not lower isolation levels?

With 2PL, write transactions block read transactions executing at READ COMMITTED or higher. The number, and scope, of these read locks increase as one moves to the SERIALIZATION isolation level, which offers serializable semantics at the expense of concurrent execution in a mixed workload of readers and writers. Consequently it is logical to tradeoff the server's guarantee of serialized transaction schedules with better performance by reducing the number of read locks to be acquired, and hence reduce the amount of blocking - a strategy that makes sense for many applications with a typical 80-20 ratio of read transactions to write transactions.

But this tradeoff is not free; it is made at the expense of exposing the application to data anomalies that occur as the result of concurrent execution with update transactions. But this exposure is, again, very hard to quantify: how would one attempt to measure the risk of acting on stale data in the database, or overwriting a previously-modified row (often termed the "lost update" problem)? Once again, the problem is exacerbated at scale, which makes analysis and measurement of this risk difficult to determine during a typical application development cycle.

Some recent work [1] that explores these issues was on display at the 2012 ACM SIGMOD Conference held last week in Phoenix, Az. At the conference, graduate student Kamal Zellag and his supervisor, Bettina Kemme, of the School of Computer Science at McGill University in Montreal demonstrated ConsAD, a system that measures the number of serialization graph cycles that develop within the application at run time - where a cycle implies a situation involving either stale data, a lost update, or both. A full-length paper [2] presented at last year's IEEE Data Engineering Conference in Hannover, Germany provides the necessary background; here is the abstract:

While online transaction processing applications heavily rely on the transactional properties provided by the underlying infrastructure, they often choose to not use the highest isolation level, i.e., serializability, because of the potential performance implications of costly strict two-phase locking concurrency control. Instead, modern transaction systems, consisting of an application server tier and a database tier, offer several levels of isolation providing a trade-off between performance and consistency. While it is fairly well known how to identify the anomalies that are possible under a certain level of isolation, it is much more difficult to quantify the amount of anomalies that occur during run-time of a given application. In this paper, we address this issue and present a new approach to detect, in realtime, consistency anomalies for arbitrary multi-tier applications. As the application is running, our tool detect anomalies online indicating exactly the transactions and data items involved. Furthermore, we classify the detected anomalies into patterns showing the business methods involved as well as their occurrence frequency. We use the RUBiS benchmark to show how the introduction of a new transaction type can have a dramatic effect on the number of anomalies for certain isolation levels, and how our tool can quickly detect such problem transactions. Therefore, our system can help designers to either choose an isolation level where the anomalies do not occur or to change the transaction design to avoid the anomalies.

The Java application system described in the paper utilizes Hibernate, the object-relational mapping tooklit from JBoss. ConsAD is in two parts: a "shim", called ColAgent, that captures application traces and implemented by modifying the Hibernate library used by the application; and DetAgent, an analysis piece that analyzes the serialization graphs produced by ColAgent to look for anomalies. In their 2011 study, the authors found that the application under test, termed RuBis, suffered from anomalies when it used Hibernate's built-in optimistic concurrency control scheme (termed JOCC in the paper), 2PL using READ COMMITTED, or (even) PostgreSQL's implementation of snapshot isolation (SI). This graph, from the 2011 ICDE paper, illustrates the frequency of anomalies for the RUBiS "eBay simulation" with all three concurrency-control schemes. Note that in these experiments snapshot isolation consistently offered the fewest anomalies at all benchmark sizes, a characteristic that application architects should study. But SI is not equivalent to serializability, something other authors have written about [4-7] and still causes low-frequency anomalies during the test.

The graph is instructive in not only illustrating that anomalies occur with all three concurrency control schemes, but that the frequency of these anomalies increase dramatically with scale. Part of the issue lies with Hibernate's use of caching; straightforward row references will result in a cache hit, whereas a more complex query involving nested subqueries or joins would execute against the (up-to-date) copies of the row(s) in the database, leading to anomalies with stale data. As such, these results should serve as a warning to application developers using ORM toolkits since it is quite likely that they have little, if any, idea of the update and/or staleness anomalies that their application may encounter when under load.

It would be brilliant if Kamal and Bettina expanded this work to cover other application frameworks other than Hibernate, something I discussed with Kamal at length while in Phoenix last week. Hibernate's mapping model makes this sort of analysis easier than (say) unrestricted ODBC applications, but if it existed such a tool would be very useful in discovering these sorts of anomalies for other types of applications.

References

[1] K. Zellag and B. Kemme (May 2012). ConsAD: a real-time consistency anomalies detector. In Proceedings of the 2012 ACM SIGMOD Conference, Phoenix, Arizona, pp. 641-644.
[2] K. Zellag and B. Kemme (April 2011). Real-Time Quantification and Classification of Consistency Anomalies in Multi-tier Architectures. In Proceedings of the 27th IEEE Conference on Data Engineering, Hannover, Germany, pp. 613-624.
[3] H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil, and P. O'Neil (May 1995). A critique of ANSI SQL isolation levels. In Proceedings of the ACM SIGMOD Conference, San Jose, California, pp. 1-10.
[4] A. Fekete (January 1999). Serialisability and snapshot isolation. In Proceedings of the Australian Database Conference, Auckland, New Zealand, pp. 201-210.
[5] A. Fekete, D. Liarokapis, E. J. O'Neil, P. E. O'Neil, and D. Shasha (2005). Making snapshot isolation serializable. ACM Transactions on Database Systems 30(2), pp. 492-528.
[6] S. Jorwekar, A. Fekete, K. Ramamritham, and S. Sudarshan (September 2007). Automating the detection of snapshot isolation anomalies. Proceedings of the 33rd International Conference on Very Large Data Bases, Vienna, Austria, pp. 1263-1274.
[7] A. Fekete, E. O'Neil, and P. O'Neil (2004). A read-only transaction anomaly under snapshot isolation. ACM SIGMOD Record 33(3), pp. 12-14.

SQL Anywhere 17 Announced!

I am excited today to announce that the next major evolution of SQL Anywhere is now available! Enhancements in this release continue our long running themes to improve performance, security, availability, and developer friendliness, while still being easy-to-use, and easy to embed.

SQL Anywhere 17 has been in development for over two years. There are literally hundreds of new features, but I would like to highlight a few important ones:

Tooling

SQL Anywhere Cockpit – a web based interface to see the availability, capacity and performance of your server.
SQL Anywhere Profiler – development tool that pulls together all the various performance and tuning data the server has available to enable easier tuning and troubleshooting of your application and database

Performance

Auto-Detect CPU changes - SQL Anywhere can now detect newly added (or removed) CPUs from the system and make use of those CPUs appropriately without the need for a server restart
Mirroring performance - Mirror or copy node checkpoint performance has increased especially in cases of many inserts, updates or deletes on a primary server.
Interfaces - all the various SQL Anywhere interfaces including ODBC, .NET, JDBC, Ruby, PHP, OLEDB, JavaScript, NodeJS, Perl, etc were analyzed and performance anomalies were addressed resulting in continued improvements for inserting and fetching data.
Parallel recovery - SQL Anywhere can now makes better use of multi-core machines during the database recovery phase.
Prefetch – Improved prefetching which assists performance by cutting down on client/server round trips, and increases throughput by making many rows available without a separate request to the server for each row or block of rows.

Security

Increased password protection – The default DBA user name and password (DBA/sql) will no longer be used by dbinit and CREATE DATABASE statement, and the default minimum password length has been changed to 6 from 3. The DBA username and password must be provided when creating new database. In addition, a new option has been added to allow the user to specify the minimum password length.
Database isolation - Support has been added for enabling database isolation for a database server. When database isolation is turned on, each database behaves as though it is the only database running on the database server.

Availability

Alter/drop procs - Previously, attempting to alter, replace, or drop a procedure that was being executed would result in an error. Now, the alter, replace, or drop succeeds. Current executions use the procedure definition from when the procedure started executing. New calls to the procedure after an alter, replace, or drop, use the new definition.
Online rebuild - You can now rebuild a production database while the database is running, which reduces downtime.
Point-in-time recovery – You can now restore a database to a specified time stamp or to an offset in the transaction log.
Dynamically start/stop TCP-IP/HTTP - Support added for starting and stopping connection listeners while the database server is running (database upgrade or rebuild required) – ie. you can now start and stop HTTP, HTTPS, TCP/IP, and shared memory connection listeners without having to restart the database server.

Language Enhancements

PIVOT/UNPIVOT - You can now pivot and unpivot table data using two clauses, PIVOT and UNPIVOT, in the FROM clause of a query to create pivoted- or unpivoted-derived tables. Pivoting rotates column data into rows and aggregates data in a meaningful way for your business needs. Unpivot is a similar operation, but rotates row data into columns. Unpivoting is useful for normalizing un-normalized data, such as several columns of similar data that must be joined with other data.
DECLARE VARIABLE LIKE – You can use the %TYPE and %ROWTYPE attributes to define the data type(s) based on the data type of other objects. When creating schema objects such as columns, use the %TYPE attribute to set the data type of the object you are creating or altering, to the data type of a column in a table or view. Use the %ROWTYPE attribute to set the data types to the composite data type for a row in a table or view. When creating variables, you can also use the %TYPE and %ROWTYPE attributes to set the data type to the data type of temporary objects such as variables and cursors
FETCH INTO <row var> - The SELECT statement now includes the INTO VARIABLE clause to support specifying row variables and the INTO TABLE clause to support explicitly creating a new table. The FETCH statement now allows you to specify a row variable in the INTO clause
Users locks - Build user-defined mutexes and semaphores into your application logic to achieve locking behavior and control and communicate the availability of resources.

Developer Enhancements

OData Enhancements - The SQL Anywhere database server can act as an OData server. This functionality replaces the OData Server utility. In addition, the The OData producer now supports a greater subset of the OData Service Definition Language (OSDL).
Javascript external environment - SQL Anywhere now includes support for JavaScript stored procedures and functions.
Node.JS and Python drivers - The SQL Anywhere Node.js driver allows users to connect and perform queries on the database using JavaScript on Joyent’s Node.js Software platform. In addition, the three SQL Anywhere python drivers are now available through PyPI, the official Python package index.
Consolidated db – MobiLink has support for SAP ( ASE, IQ, SQLA), Oracle, Microsoft, IBM, MySQL back-end databases. We continue to add support for newer versions of these databases. In v17, we added support for Oracle 12.1, ASE 16, SS 2014,MySQL 5.6.20, IBM DB2 10.5.
HANA Integration – with our SAP HANA Remote Data Sync product, MobiLink becomes more tightly integrated into the HANA platform infrastructure (assigned port numbers, lifecycle management, name server integration, license management, monitoring integration).
SAP Passport support and NCSLib logging

Complete details of all the enhancements in SQL Anywhere version 17 are available in our online documentation system: Doc Comment Exchange (DCX), located here: DocCommentXchange

As well, you may find Laura's blog post of interest in navigating the documentation: SQL Anywhere 17 Documentation - At your fingertips!

You can download the Developer Edition for Windows and Linux here: https://www.sap.com/cmp/syb/crm-xm15-dwn-dt015/index.html

The entire SAP SQL Anywhere thanks you for your dedication to the SQL Anywhere product!

The recently released SQL Anywhere 17, contains a wide variety of new features that improve performance for a variety of activities, improve the security and robustness of the database server and clients, and provide some new tools to improve developer productivity. You can find a nice overview of version 17 here.

In order to fully appreciate the application and implications of some of the changes in version 17, I have decided to post some detailed analysis of select features, comparing version 17 to previous versions, highlighting differences and improvements, and attempting to provide best practices in relation to the subject matter where possible. The first topic I have chosen to cover is autocommit.

Auto-commit is a database client connection option that governs when commits are issued for transactions. When it comes to best practices in SQL Anywhere, the recommendation is that auto-commit be turned off wherever possible. This is because a commit operation causes an IO, and IO is one of the most expensive things you can do in a traditional RDBMS. Not using auto-commit generally results in fewer commits, and therefore better performance of the database server. In addition, creating and managing transactions that line up with your business processes allows you to better control when commits occur, can help to ensure a more consistent state of your database in relation to your business rules.

Even though not using auto-commit is preferred, most applications are actually built with auto-commit enabled. There are a couple of reasons for this. It is often easier to commit after every operation, and it is appealing to developers to get immediate feedback as to whether their database change was successful or not. In addition, many database interfaces turn auto-commit on by default, meaning application developers also build auto-commit style applications by default. Once built, it is sometimes very difficult to re-architect an application to do explicit commits.

Prior to version 17, if an application turned on auto-commit, the client api (eg. ODBC, JDBC, etc…) would issue an explicit “COMMIT” after every database request. To avoid this extra communication with the server and improve overall performance, a new option was added to version 17 called (surprise!) auto_commit. The option is OFF by default and can only be set locally for the duration of a connection (ie. you cannot set a PUBLIC auto_commit option). However, if an application sets the auto_commit option to ON, then the SQL Anywhere Server will automatically commit after every request.

All of the SQLA client interfaces have been updated to take advantage of this new option automatically. The version 17 SQL Anywhere ODBC, JDBC and OLEDB drivers automatically set the new auto_commit option if connected to a version 17 server when the application issues the corresponding AutoCommit API call for each of the drivers. These drivers will revert back to handling auto commit on the client side if the target server is version 16 or below. For APIs (for example ESQL) where autocommit was not previously supported, these application can now use the new auto_commit option to have the server automatically commit after each execution if desired.

Performance Implications

The following example tells the server to auto commit after every request:

SET TEMPORARY OPTION auto_commit = ‘ON’

To determine the performance impact of this option, I ran a simple test using the PerformanceInsert sample that ships with SQL Anywhere. I used a table with a single integer primary key column and inserted 100000 rows on my laptop.

CREATE TABLE ins( c1 integer NOT NULL PRIMARY KEY );

First I ran the test with 1 commit (at the end of the test):

instest -cdba,sql -o ins.out -r 100000 -x -v 1

My second test run was committing after every row:

instest -cdba,sql -o ins.out -r 100000 -x -v 1 -m 1

For my third test, I altered the source code for instest and added a command to turn on auto_commit after connecting to the database:

EXEC SQL EXECUTE IMMEDIATE 'SET TEMPORARY OPTION auto_commit=ON';

…and then I ran the test with no explicit commits:

instest -cdba,sql -o ins.out -r 100000 -x -v 1

Here are the results:

Local machine Connection	Insert Time (s)	Commit Time (s)	Total Time (s)
1 Commit	9.669	0.18	9.849
Commit every row from the client	11.185	22.911	34.096
Commit every row with auto_commit=’ON’	*	*	30.029

As we can see, not committing after every operation can have a huge impact on performance. However, if your application does use auto_commit, over a same machine connection, the test demonstrates a ~13% improvement in performance between having the client issue a commit after every operation, or having the server automatically do the commit after every operation. If we use a network server and tcpip, the difference is even more pronounced, at a ~17% difference in performance (chart below)

TCP/IP Connection	Insert Time (s)	Commit Time (s)	Total Time (s)
1 Commit	13.486	0.246	13.732
Commit every row from the client	14.615	26.467	41.082
Commit every row with auto_commit=’ON’	*	*	35.222

*Because the server is executing the commit operation, the instest program cannot separate the commit and insert execution times

Lab testing of other scenarios (which include more mixed workload activities) has shown as much as a 25-30% improvement in performance of the commit operation, when the server-side auto_commit option is used as opposed to the client based auto_commit operations.

A Final Note - Chained vs. Autocommit

The auto_commit option is very different from the database chained option. Setting chained=OFF will force the server to commit after each statement executed in the server, including each individual statement within a procedure, whereas setting auto_commit ON will force the server to commit only after each statement from the client. In the case of a procedure the commit would happen after the entire procedure has completed execution when auto_commit is on.

By default, when you run a database server, you have access to all features of the server, regardless of whether or not you actually use them. Some of these features can also be leveraged by 3rd parties (malicious or not) to do things you did not intend as part of your application deployment. For example, a 3rd party could compromise the security of a machine by using the database servers’ abilities to access the file system, send e-mail, or call web services.

If you are using these features in your application, you are most likely aware of the possibilities for mis-use, and can take appropriate steps to mitigate them, but if you are not using these features, you may not even be aware of them until a problem occurs.

In version 10, in order to prevent 3rd parties from supplanting a deployed SQL Anywhere server and mis-using it, the SQL Anywhere added the ability for developers to better control what features of the server were available by default when they deploy their applications. This is called the “Secure Feature” feature. It allows you to enable/disable specific features of the database server. For example, if you never use the xp_start/stop/sendmail procedures, you can disable that feature when you run the server.

Secure features are enabled via a commandline switch (-sf <features>). You can also provide a key (-sk <key>), which can be used to enable/disable specific features at runtime. There are a number of features which can be controlled, which is discussed here:

Secure Features

To demonstrate how the Secure Feature feature works, we will take a look at a new secure feature added to SQL Anywhere 17 that allows you to enable database isolation. If enabled, each database running on a database server acts as if it is the only database running on that database server. Connections to one database are not allowed to query properties or connection information for any other database running on the same server, nor are they allowed to start or stop other databases.

A new server command line option -edi (enable database isolation) has been added to turn database isolation on. There is also a new secure feature database_isolation which allows applications to turn database isolation off for a given connection. This new secure feature is on by default and can only be turned off for a specific database, not globally.

For this example I’ve created 2 databases, demo.db and demoprime.db, If we start a server with these two databases running:

dbsrv17 -n J -edi demo.db demoprime.db

And then connect to it from dbisql, running the statement “SELECT * FROM sa_db_info()” shows us that we can “see” both databases running on the server:

A user can leverage the knowledge of what databases are running on a server to look up information about the other databases running (database and connection properties and information about the database file for example). To prevent this from happening, we can run the server with database isolation enabled:

dbsrv17 -n J -edi demo.db demoprime.db

Connecting to the demo database from dbisql and issuing the same “SELECT * FROM sa_db_info()” statement only tells us about the database we are connected to.

The results would be similar for other system procedures like sa_conn_info(), sa_db_properties(), etc… There is no way for a connected user to ‘see’ any information for other databases running on the server.

But what if you wanted to allow an admin user to access this information? The way the server is currently being run, there is no way to disable database isolation without restarting the server with different options. To allow the database isolation feature to be disabled for a connection, you must use the sf/sk options to set a key. The key can be used to modify secure features in a running server.

dbsrv17 -n J –edi -sf none -sk mysfkey demo.db demoprime.db

With our server running and database isolation enabled, we cannot access information on any other databases running on the server. To disable database isolation for a connection, we can execute the following procedure call to allow us to manage keys for secure feature access.

CALL sp_use_secure_feature_key( 'system', 'mysfkey' );

We then create a key to associate with the database_isolation feature:

CALL sp_create_secure_feature_key( 'dbiso', 'dbisokey', 'database_isolation' );

Now, we can distribute the key to our administrative users, and they can use it to disable the database_isolation feature for their connection:

CALL sp_use_secure_feature_key( 'dbiso', 'dbisokey' );

In order to be able to re-enable database isolation for the connection, we can create another key, and use it:

CALL sp_use_secure_feature_key( 'system', 'mysfkey' );
CALL sp_create_secure_feature_key( 'disabledbiso', 'dbisokey', '-database_isolation' );
CALL sp_use_secure_feature_key( 'disabledbiso', 'dbisokey' );

Note that with database isolation turned on applications can still connect to the utility database with the appropriate utility userid and password and can perform utility actions; however, starting and stopping databases while connected to the utility database will still be restricted and applications will still need to use the manage_database_isolation secure feature in order to start and stop databases.

From its first release, SQL Anywhere has used a default user id and password for newly created databases: DBA/sql

While it is considered best practice to not use the DBA user and default password in your production database, we have found that customers do occasionally still release applications where the default connection uses DBA. Fortunately, it is relatively uncommon to find a customer production database that is using the default password.

In order to further encourage the use of a non-DBA user Id, the dbinit utility and the CREATE DATABASE statement will no longer use a default. You must specify a user id and password when you create a new database. Dbinit now requires you to specify the option “-dba <userid>,<password>” to specify the dba user and password for a new database. The CREATE DATABASE statement requires that you use the clauses “DBA USER <userid>” and “DBA PASSWORD <password>”

Furthermore, the default minimum password length has been increased to 6 from 3. However, you can control this to make it longer or shorter. The dbinit command line option “–mpl” and the CREATE DATABASE clause “MINIMUM PASSWORD LENGTH” allows you to specify a minimum password length in your new database. You can also change the minimum password length after the database is created by changing the database option min_password_length. Note that the password for the utility_db database must now be 6 characters in length.

A small change was also made to dbunload, since it can no longer use a default user id and password in the reload.sql script to rebuild a database. It has been changed to use a 16 byte, randomly generated value instead.

The utility database also no longer requires DBA as the sole username. The -su server option has changed from "-su <password>" to "-su <userid>,<password>" or "-su <password>" or "-su none". The personal server continues to default to allowing connections to utility_db with user DBA and any password if the -su server option is not specified.

Auditing database interactions provides the ability to see who did what and when they did it in the database. This is useful in scenarios where the data stored is sensitive (eg. salary information, proprietary recipes) and you need to track who has accessed the data over a specific period of time. It can also be useful when trying to mitigate a breach or to determine what happened in the case of data that is changed/deleted improperly. SQL Anywhere auditing (when enabled) tracks all activity performed on a database, including:

login attempts (including the user information for both successes and failures),
timestamps of all events,
permissions checks (including object information for both successes and failures) and
all actions that require system privileges.
Custom audit strings added to the audit log via the sa_audit_string(…) stored procedure

What exactly is audited can be managed using the sa_enable_auditing_type(…) system procedure. Auditing can be enabled and disabled at login time by setting the “conn_auditing” option. All of the auditing information is stored in the transaction log in version 16.

Version 17 introduces a new option, AUDIT_LOG, which allows you to specify one or more different targets for the auditing information. Potential targets include:

TRANSLOG (default) – as in version 16, audit data is stored in the transaction log
SYSLOG – audit information is logged to the system event tracing log (eg. the Windows event log on windows, and syslog on Linux/Unix)
FILE – audit information is logged to the file specified (the server must have write access to it) using the server ETD (Event Trace Data) format

You may choose to use a different target for many reasons. For example, if you do not wish to use a transaction log for the database, if you have existing tools for the examination of the operating system logs, or if you simply wish to separate the auditing information about database operations from the actual recording of the execution of the operations.

Setting the audit_log option requires the SET ANY SECURITY OPTION privilege. If there are any problems with auditing targets, the server does its best to continue auditing, reverting to using the transaction log if other targets are unavailable.

Here is an example of how one might use the audit_log option and what the resulting logs would look like.

First we connect as a user with appropriate security administration privileges and turn on auditing and set the audit_log option to log audit details to a file:

SET OPTION PUBLIC.audit_log='FILE(filename_prefix=C:\work\issues\v17testing\auditing\)';
SET OPTION PUBLIC.auditing = 'On';

We can log a custom message to the audit log at any time, using the sa_audit_string() stored procedure:

CALL sa_audit_string( 'Auditing has begun.' );
GRANT CONNECT TO testuser identified by testuser;

Next, we connect as the newly created testuser, and execute the following statement, which will fail with a permissions error:

SELECT * FROM groupo.Employees;

Switching back to our original connection, we will turn off auditing, and log a corresponding message to the audit log.

CALL sa_audit_string( 'Auditing is ending.' );
SET OPTION PUBLIC.auditing = 'Off';

After completing the above, we have an audit file generated in the specified directory, called “_0.etd” by default

We can run:

dbmanageetd –o auditlog.out _0.etd

This creates a file “auditlog.out” that will contain a section that looks similar to the following:

First we see our custom audit string, which we can use to bookmark our test:

[2015-09-08T10:28:10.254-04:00] SYS_Audit_String text=[Auditing has begun.] connid=25 username=[DBA] recnum=17

Next we see various permissions checks related to calls made by DBISQL:

[2015-09-08T10:28:10.301-04:00] SYS_Audit_PermCheck success=1 perm_type=[Execute] detail1=[dbo.sa_locks] detail2=[NULL] connid=25 username=[DBA] recnum=18

[2015-09-08T10:28:10.301-04:00] SYS_Audit_PermCheck success=1 perm_type=[Execute] detail1=[dbo.sa_locks] detail2=[NULL] connid=25 username=[DBA] recnum=19

[2015-09-08T10:28:10.301-04:00] SYS_Audit_PermCheck success=1 perm_type=[MONITOR] detail1=[NULL] detail2=[NULL] connid=25 username=[DBA] recnum=20

Next we see various permissions checks related to calls made by DBISQL from the testuser connection:

[2015-09-08T10:28:12.064-04:00] SYS_Audit_PermCheck success=1 perm_type=[Execute] detail1=[dbo.sa_locks] detail2=[NULL] connid=8 username=[testuser] recnum=21

[2015-09-08T10:28:12.064-04:00] SYS_Audit_PermCheck success=0 perm_type=[MONITOR] detail1=[NULL] detail2=[NULL] connid=8 username=[testuser] recnum=22

[2015-09-08T10:28:12.064-04:00] SYS_Audit_PermCheck success=1 perm_type=[Execute] detail1=[dbo.sa_locks] detail2=[NULL] connid=8 username=[testuser] recnum=23

[2015-09-08T10:28:12.064-04:00] SYS_Audit_PermCheck success=0 perm_type=[MONITOR] detail1=[NULL] detail2=[NULL] connid=8 username=[testuser] recnum=24

Now we have the permissions failure for the select statement (signified by “success=0” in the audit record)

[2015-09-08T10:28:13.094-04:00] SYS_Audit_PermCheck success=0 perm_type=[Select] detail1=[GROUPO.Employees] detail2=[NULL] connid=8 username=[testuser] recnum=25

[2015-09-08T10:28:13.094-04:00] SYS_Audit_PermCheck success=0 perm_type=[Select] detail1=[GROUPO.Employees] detail2=[***] connid=8 username=[testuser] recnum=26

Finally, we see the custom message for the end of our audit:

[2015-09-08T10:28:19.256-04:00] SYS_Audit_PermCheck success=1 perm_type=[Execute] detail1=[dbo.sa_audit_string] detail2=[NULL] connid=25 username=[DBA] recnum=35

[2015-09-08T10:28:19.256-04:00] SYS_Audit_PermCheck success=1 perm_type=[MANAGE AUDITING] detail1=[NULL] detail2=[NULL] connid=25 username=[DBA] recnum=36

[2015-09-08T10:28:19.256-04:00] SYS_Audit_String text=[Auditing is ending.] connid=25 username=[DBA] recnum=37

To improve security, SQL Anywhere 17 has made a number of new changes related to how passwords are managed and accessed via the various tools and utilities. I have included here a brief description of the changes.

Direct Access to Password Hashes in System Tables

As a best practice for database security, access to password hash values in the database should require two separate actors – an administrator (the user with SELECT ANY TABLE privilege) and a security officer (the user with the ACCESS USER PASSWORD admin privilege).

In order to better protect passwords stored in the database, in version 17 the server will no longer return even the hashes for passwords in queries. The following views will have password-containing column(s) replaced with ‘***’ for all users:

SYSUSER
SYSUSERPERM
SYSUSERAUTH
SYSEXTERNLOGIN
SYSLDAPSERVER
SYSSYNC2

The following synchronization-related views will now select from SYSSYNC2 view rather than ISYSSYNC table and will, as a result, have sensitive columns replaced with ‘***’:

SYSSYNCS
SYSSYNCUSERS
SYSSYNCPUBLICATIONDEFAULTS
SYSSYNCSUBSCRIPTIONS
SYSSYNCPROFILE

Access to the actual password hashes and password values stored in the database will now require two privileges: SELECT ANY TABLE privilege and the new ACCESS USER PASSWORD privilege. The “ACCESS USER PASSWORD” privilege allows a user to access views that contain password hashes (see list below), and perform operations that involve accessing passwords, such as unloading, extracting, or comparing databases.

The privilege ACCESS USER PASSWORD along with SELECT ANY TABLE is required to access the following views that report sensitive information or passwords:

SYSSYNC
SYSSYNCPROFILE
SYSUSERPASSWORD
SYSLDAPSERVERPASSWORD
SYSEXTERNUSERPASSWORD

As a result of these changes, you may notice a difference in the behaviour of the Sybase Central schema diff utility, as well as Copy/Paste options for user, external login, LDAP server and synchronization definition options.

DBUnload/DBXTRACT Changes

By default, passwords are now no longer unloaded by DBUnload or DBXtract. DBUnload should only attempt to unload password hashes when the result is to be reloaded into a database with the purpose of recreating the database with the same (or slightly modified) schema and data. When a database is unloaded without passwords, GRANT CONNECT, CREATE EXTERNLOGIN and CREATE LDAP SERVER statements will not have the IDENTIFIED BY clause specified. A GRANT CONNECT statement for default DBA user will be added (with default password).

DBUnload and DBXtract utilities will unload passwords if the –up option is specified, and the user performing the unload/extract has the appropriate privileges (see below). The –up option is implied if you use any of the reload options (-ac, -an, -aob, etc…)

DBUNLOAD with the –no option (used when performing database schema comparisons) will never unload password hashes and values containing passwords.

For an unload with a reload into a new database (-an, -ao, -aob), you must have the SELECT ANY TABLE, SERVER OPERATOR, and ACCESS USER PASSWORD system privileges. For an unload with a reload into an existing database (-ac) you do not need the SERVER OPERATOR system privilege. It is recommended that the user doing the rebuild only be granted both SELECT ANY TABLE and ACCESS USER PASSWORD temporarily for the duration of the rebuild process. Note that the compatibility DBA user has all roles necessary to perform any of the above operations.

Improvements to the EncryptedPassword Connection Parameter

While the best practice for security is to never store passwords as part of a DSN definition (or elsewhere on a client computer), many developers do this in order to make application deployment and management easier. In these cases, in order to prevent a user from seeing a plain text password on their local machine (in a DSN definition for example), SQL Anywhere provides the EncryptedPassword (ENP) connection parameter that can be used as a substitute for the Password (PWD) connection password.

The intent of the ENP connection parameter is to disguise the actual password used to authenticate to a database. However, prior to version 17, the encrypted password is subject to abuse. For example, it is obfuscated and so could be decrypted with a little effort. In addition, the ODBC administrator could be used to convert an encrypted password back to a plaintext password.

Note: While described as a new feature for version 17, the changes to encryptedpassword described here actually were made available in version 16 build 2039 and later

For version 17, encrypted password support has been enhanced so that a database administrator can restrict database access to a user on a particular computer without revealing the actual plain text password to the user. It also prevents the current password from being decrypted to memory and consequently subject to inspection. Since successful decryption can now be restricted to a particular computer or computer/user combination, displaying the encrypted password in plain text is much less of an issue.

For example, in the following connection string, the encrypted passed (ENP=) cannot be used by anyone other than the specific computer/user combination for whom it was created.

dbping -d -c "Host=server-pc;Server=DemoServer;UID=DBA;ENP=05a17731bca92f97002100c39d906b70f3272fe76ad19c0e8bd452ad4f9ea9"

To better secure the EncryptedPassword connection parameter, the following changes were made:

Better encryption algorithms are used to ensure that the encrypted password cannot be easily decrypted.
EncryptedPassword can be restricted to a particular computer or a particular computer/user combination, such that
1. the encrypted password can only be decrypted on that specific computer. It cannot be used on any other computer. However, anyone who can log on to the computer can still use the encrypted password and corresponding user ID to authenticate to a database.
2. the encrypted password can only be decrypted on that specific computer for that specific user. It cannot be used on any other computer by the same or other user.
Plaintext passwords can no longer be reverse-engineered from the EncryptedPassword value using the ODBC Configuration for SQL Anywhere dialog. The Encrypt password option is no longer a checkbox but is now used to select from different encryption options including
1. none,
2. for use on any computer,
3. for use on this computer only,
4. for use on this computer and this user only.

The dialog can no longer be used to change the level of password encryption for an existing password, unless it was previously unencrypted. If the level of encryption is to be changed, then the password must be reentered.

DBDSN Changes

The Data Source utility (dbdsn) supports a new option -pet a|c|u, a specifying how the encrypted password may be used.

If -pet a is specified, the password is encrypted for use on any computer.
If -pet c is specified, the password is encrypted for use on this computer only.
If -pet u is specified, the password is encrypted for use on this computer by this user only. This option should not be used if your client application is running as a windows service. If your client application runs as a service, use the -pet c option instead.
The pre-existing -pe option which provides simple obfuscation continues to be supported; however, its use is deprecated.

Note that encryption for options -pet c and -pet u must be performed on the computer or computer/user for which it is intended to be used (decrypted). You cannot export the DSN definitions to another machine and continue to use the EncryptedPassword

Note: The new encrypted password features are not supported by client libraries that are older than 17.0.0.1272 and 16.0.0.TBD.

DBFHide Changes

The File Hiding utility dbfhide tool can be used to encrypt an entire connection string to a file for use by most of the database tools that accept connection strings (e.g., dbping -d -c @credentials.hidden). It now supports the new options -wm (computer-only) and -w (computer/user-only)

The sa_validate() procedure can be used to validate various aspects of your database in order to catch any data corruptions and allow you to address them before you get into a production-down situation. Performing validation as part of the backup process is generally a good idea, and can be automated by leveraging the SQL Anywhere event system. Here is an example of a backup event which validates the database before the backup is taken. If there are any problems, an e-mail is sent to the administrator.

CREATE EVENT "DBA"."BackupDatabase" DISABLE 
AT ALL HANDLER 
BEGIN   DECLARE res_validate VARCHAR(250);   DECLARE res_backup VARCHAR(250);   DECLARE backup_dir VARCHAR(250);   DECLARE crsr_validate dynamic scroll cursor FOR CALL sa_validate();    -- First validate the database to make sure we are not backing up a corrupted database   OPEN crsr_validate;   FETCH NEXT crsr_validate INTO res_validate;   IF res_validate <> 'No error detected' THEN     CALL xp_startsmtp('mailuser','mailserver.xyz.com’);     CALL xp_sendmail('admin@xyz.com','Database Backup Failed!',NULL,NULL,NULL,'Validation failed for database: ' || res_validate);     CALL xp_stopsmtp();       RETURN   END IF;       --Once we are satisfied the database is in good condition, we back up the database and log   -- We use a 7 day set of rolling backups       SET backup_dir = 'c:\backup\ ' + dayname(today());       BACKUP DATABASE DIRECTORY backup_dir;   EXCEPTION WHEN OTHERS THEN     SELECT errormsg() INTO res_backup;     CALL xp_startsmtp('mailuser','mailserver.xyz.com’);     CALL xp_sendmail('admin@xyz.com','Database Backup Failed!',NULL,NULL,NULL,'Backup failed for database: ' || res_backup);     CALL xp_stopsmtp(); 
END; 
--Add a schedule for the backup event to run it every day 
ALTER EVENT "BackupDatabase" ADD SCHEDULE "BackupSched" START TIME '23:00:00' ON ('Sunday','Saturday','Friday','Thursday','Wednesday','Tuesday','Monday')

One oddity you might notice about the above example, is that in order to detect a validation failure, we have to look for a specific string:

  IF res_validate <> 'No error detected'

With older versions of SQL Anywhere, the only way to determine if there were no validation errors was to look for the string ‘No error detected’ in the result set. A compounding problem is that this string is also localized to all of the languages that SQL Anywhere supports, so there is no straight forward way to code this check for all languages.

In version 17, we are addressing this by adding two new columns to the sa_validate() result set in addition to the existing “Messages” column, which make it possible to check for success consistently, regardless of deployment language.

The first column is “IsValid”, which is a bit value which is set to 0 if the validation is clean and set to 1 if there are any validation errors.

The second column is “ObjectName”, which is empty if the validation is clean. If there is a validation error, this column contains the name of the database or the table that failed validation.

The above backup event can now be updated to include the two new columns in the sa_validate() result set and simply check the “IsValid” column to determine success/failure of the validation.

CREATE EVENT "DBA"."BackupDatabase" DISABLE 
AT ALL HANDLER 
BEGIN   DECLARE res_backup VARCHAR(250);   DECLARE backup_dir VARCHAR(250);   DECLARE res_messages VARCHAR(250);   DECLARE res_isvalid integer;  DECLARE res_object VARCHAR(250);   DECLARE crsr_validate dynamic scroll cursor FOR CALL sa_validate();    -- First validate the database to make sure we are not backing up a corrupted database   OPEN crsr_validate;   FETCH NEXT crsr_validate INTO res_messages, res_isvalid, res_object;   IF res_isvalid = 0 THEN --validation failed    CALL xp_startsmtp('mailuser','mailserver.xyz.com');     CALL xp_sendmail('admin@xyz.com','Database Backup Failed!',NULL,NULL,NULL,'Validation failed for database: ' || res_object || '\n ' || res_messages );     CALL xp_stopsmtp();       MESSAGE 'Validation failed for database: ' || res_object || '\n ' || res_messages;    RETURN   END IF;   --Once we are satisfied the database is in good condition, we back up the database and log   -- We use a 7 day set of rolling backups       SET backup_dir = 'c:\backup\ ' + dayname(today());       BACKUP DATABASE DIRECTORY backup_dir;   EXCEPTION WHEN OTHERS THEN     SELECT errormsg() INTO res_backup;     CALL xp_startsmtp('mailuser','mailserver.xyz.com’);     CALL xp_sendmail('admin@xyz.com','Database Backup Failed!',NULL,NULL,NULL,'Backup failed for database: ' || res_backup);     CALL xp_stopsmtp(); 
END; 
--Add a schedule for the backup event to run it every day 
ALTER EVENT "BackupDatabase" ADD SCHEDULE "BackupSched" START TIME '23:00:00' ON ('Sunday','Saturday','Friday','Thursday','Wednesday','Tuesday','Monday')