SCN : Blog List - SAP SQL Anywhere

In this post, originally written by Glenn Paulley and posted to sybase.com in April of 2009, Glenn discusses the different terminology used by other vendors to talk about what we call materialized views in SQL Anywhere.

I also want to note that in the SAP HANA architecture, materialized views should be unnecessary because all data is available in main memory and can be processed/aggregated on the fly.

Every so often we encounter a situation where a prospective customer asks whether or not SQL Anywhere offers support for materialized views - or, more commonly, supports "materialized views" by another name. Below is a list of synonyms for materialized views across different commercial RDBMS products:

DBMS	Terminology	When	How	Who
MS SQL Server	Indexed Views	Immediate	Incremental	System
IBM DB2	Materialized Query Tables	Immediate and Deferred	Incremental or Rebuild	System or User
Oracle	Materialized Views	Deferred	Incremental or Rebuild	System or User
SQL Anywhere	Materialized Views	Immediate and Deferred	Incremental or Rebuild	System or User

In the above table, the "when" column refers to when a materialized view is updated with respect to changes made to one or more underlying base table rows. In a nutshell the choices are:

Immediate maintenance:
- Update materialized view as part of the same (update) transaction;
- Propagate base table changes to the view data in a manner consistent with underlying base tables;
- View data is never stale;
- Has the potential to cause high rates of locking conflicts and deadlocks, together with degraded concurrency for update transactions.
Lazy, 'just in time', maintenance:
- Apply changes to base tables without updating dependent materialized views;
- May log changes if incremental updates possible;
- At query execution time, a view can be used to provide results if the view is (already) fresh; otherwise, a separate synchronous transaction applies logged changes or re-computes the materialized view: query execution waits.
Deferred, on demand, maintenance:
- Apply changes to base tables without updating dependent materialized views;
- May log changes if incremental updates possible;
- Use an independent asynchronous process to update views, typically by complete recomputation.
- Applications typically allowed to control the staleness of the view.

The "how" column refers to how the view is kept up-to-date in the face of changes to the view's underlying base tables. The choices are:

Rebuild: re-compute the views from scratch;
Incremental: apply individual updates to the materialized view without complete re-computation. This is not always possible; there are efficiency tradeoffs. In practice, immediate view maintenance is always incremental.

The term "materialized view" isn't standardized; since materialized views are a performance optimization feature, like indexes, materialized views aren't covered by the ANSI/ISO SQL standards. We chose "materialized views" for SQL Anywhere not (simply) to match Oracle, but also that "materialized views" is in relatively common use in the academic literature.

Some may question the omission of Sybase IQ from the above table. Actually, Sybase IQ doesn't offer support for materialized views. Instead, Sybase IQ supports a related (and older) idea called a join index. Join indices were originally described in a 1987 TODS paper [1] by Patrick Valduriez, then at MCC and now at INRIA. Similar to a materialized view, a join index materializes the result of an inner | outer | full-outer join between two tables. The advantage of a full-outer join index is that it can be used to answer any inner | outer | full-outer join query simply through applying restriction. The difference is that a materialized view is more generalized, able to materialize n-way inner | outer | full-outer joins along with restriction, projection, and grouping. In fact, batch-update materialized views can contain arbitrary relational algebra operations. My thanks to my colleague Anil Goel for putting this material together for a presentation on materialized views given at Techwave 2008.

[1] Patrick Valduriez (June 1987). Join indices. ACM Transactions on Database Systems 12(2), pp. 218-246.

In this post, originally written by Glenn Paulley and posted to sybase.com in May of 2009, Glenn talks about using SELECT over various DML queries, a feature that was added to SQL Anywhere in version 12

Few, I think, would argue with the statement that SQL is a complex language, and there is considerable "bulk" in ANSI/ISO standard SQL that is of debatable value. What I'd like to describe in this post, however, is an extension to standard SQL that I think is really quite useful - so much so that I wish I'd thought of it, but I have to give credit to the people at IBM, Krishna Kulkarni in particular. The extension is another form of "table reference" in the FROM clause of a SELECT statement: a table function that contains an update statement (INSERT, UPDATE, MERGE, or DELETE). It's easiest to illustrate this extension with an example:

SELECT T_updated.*
FROM NEW TABLE ( UPDATE T SET x=7 WHERE y=6 ) AS T_updated
WHERE T_updated.pk IN ( SELECT S.fk FROM S where S.z = 8 )

What does this SQL statement do? First, the embedded UPDATE is executed, creating a virtual derived table consisting of the modified rows of table T, as specified by the NEW keyword. Second, the quantifier T_updated is evaluated over these virtual derived table rows, and the SELECT statement processes them as if they were from any other table reference; in the example, only the modified rows of T that satisfy the WHERE clause on line 3 become part of the SELECT's result set. With the above model, it is straightforward to join the modified rows to other tables, return the modified rows to the application via a cursor, output the modified rows to a file, and so on. So simple, it's brilliant. Without this extension, one would have to define a AFTER or BEFORE trigger to copy the modified rows to another (different) table, manage the contents of that other table (handle concurrent updaters), and execute a separate SELECT statement over the trigger-inserted table (only) after the UPDATE statement had been executed. That's a fair amount of work just to output what changes an update statement made. Here is the grammar for this form of table reference:

<table primary> ::= <table or query name> [ [ AS ] <correlation name> [ ( <derived column list> ) ] ]        | other forms of table references, such as derived tables       | <data change delta table> [ AS ] <correlation name> [ ( <derived column list> ) ]<data change delta table> ::= <result option> TABLE ( <data change statement> )<data change statement> ::= <searched delete statement>       | <insert statement>       | <merge statement>       | <searched update statement><result option> ::= NEW | OLD | FINAL

NEW signifies that the virtual derived table will contain new or modified rows as a result of an INSERT, UPDATE, or MERGE statement. OLD signifies that the virtual derived table will contain copies of rows prior to their modification or deletion. FINAL causes an exception to be raised if the rows that are modified are altered by any AFTER trigger.

In addition to the above semantics, there are additional restrictions on how the update statement can be utilized, largely a result of avoiding problems due to the order of evaluation. One such restriction is that only one "data change delta table" can be specified in a single request.

The ability to specify a "data change delta table" is an important extension to standard SQL, and has been available in IBM's DB2 product since the DB2 8.1.4 release. In March 2008, this feature was adopted as an enhancement to ANSI/ISO standard SQL, and should appear as part of the forthcoming SQL:2011 standard.

One shortcoming of the syntax supported by both DB2 and the forthcoming SQL standard is that one cannot directly refer to both the original and modified versions of the same row, in the same way one can do in a row-level trigger by using the REFERENCING [ OLD | NEW ] AS ... syntax. Instead, one has to construct two common table expressions (using two WITH clauses), one to construct a derived table containing the "after" version with the update DML statement, and the other to return the table's existing values. The query must then join these two quantifiers together to get the old and new versions of the same row. Jan-Eike Michels of IBM's Santa Teresa laboratory was kind enough to provide an example, which I've modified slightly:

WITH temp1 (T1PK, T1C1 ) AS (SELECT PK, C1 FROM T),      temp2 ( T2PK, T2C2 ) AS (SELECT PK, C1 FROM NEW TABLE (UPDATE T SET C1 = C1 + 2)) 
SELECT T2PK AS keyvalue, T1C1 AS old, T2C1 AS new, T2C1 - T1C1 AS delta
FROM temp1 JOIN temp2 ON (temp1.T1PK = temp2.T2PK )

At the moment, SQL Anywhere does not support "data change delta tables" - at least, not yet (note: As mentioned in the introduction, this feature was added to SQL Anywhere version 12). My thanks to Jan-Eike for his example and for answering some technical questions surrounding this SQL extension, and to Krishna for bringing it forward to the INCITS DM32.2 SQL standards committee.

In this post, originally written by Glenn Paulley and posted to sybase.com in May of 2009, Glenn talks about ORMs and the difficulties they pose to the SQL Anywhere query optimizer and the increasing impact they are having on all RDBMS's. As Glenn mentions in his post, I recognize the attractiveness of ORMS but I am not a big of the complexity they introduce into applications.

Object-relational mapping toolkits such as Hibernate/NHibernate, LINQ, and others permit one to develop object-oriented database applications in the paradigm offered by the object-oriented language (Java, C#, C++) and workaround the issues of the impedance mismatch between the application's abstractions and the persistent relational store. My colleague Jason Hinsperger has written previously regarding the additional complexity that ORM toolkits bring to the application. That additional complexity has far-reaching tentacles, and impacts database management system implementations as well. Let me explain what I mean by that.

While ORM software toolkits insulate application programmers from dealing directly with a relational database, and the paradigm mismatch that results from its use, at the same time the application is made much more complex through the addition of the ORM layer. As an example, Java Hibernateconsists of 266 packages, 1938 classes, 18,680 functions, and over 118K lines of code. From the programmer's standpoint, however, a characteristic that the object-relational mapping layer provides is that much of that complexity is hidden during program development. One can create a very sophisticated set of mappings that result in considerably complex SQL statements, but from within the program the method call that results in such a statement can appear the same as any other.

Here's an example, provided by my colleague Ani Nica. The query below is the SQL generated by an example query from the Microsoft Entity Framework test suite:

SELECT 
[Project9].[ContactID] AS [ContactID],
[Project9].[C1] AS [C1],
[Project9].[C2] AS [C2],
[Project9].[ContactID1] AS [ContactID1],
[Project9].[SalesOrderID] AS [SalesOrderID], 
[Project9].[TotalDue] AS [TotalDue] 
FROM ( SELECT [Distinct1].[ContactID] AS [ContactID],                           1 AS [C1],                           [Project8].[ContactID] AS [ContactID1],                           [Project8].[SalesOrderID] AS [SalesOrderID],                           [Project8].[TotalDue] AS [TotalDue],                           [Project8].[C1] AS [C2]             FROM (SELECT DISTINCT [Extent1].[ContactID] AS [ContactID]                           FROM [DBA].[Contact] AS [Extent1] INNER JOIN [DBA].[SalesOrderHeader] AS [Extent2]                                                ON EXISTS (SELECT cast(1 as bit) AS [C1]                                                                     FROM ( SELECT cast(1 as bit) AS X ) AS [SingleRowTable1]                                                                                       LEFT OUTER JOIN (SELECT [Extent3].[ContactID] AS [ContactID]                                                                                                                FROM [DBA].[Contact] AS [Extent3]                                                                                                                WHERE [Extent2].[ContactID] = [Extent3].[ContactID] ) AS [Project1]                                                                                          ON cast(1 as bit) = cast(1 as bit)                                                                                       LEFT OUTER JOIN (SELECT [Extent4].[ContactID] AS [ContactID]                                                                                                                FROM [DBA].[Contact] AS [Extent4]                                                                                                                WHERE [Extent2].[ContactID] = [Extent4].[ContactID] ) AS [Project2]                                                                                       ON cast(1 as bit) = cast(1 as bit)                                                                                   WHERE ([Extent1].[ContactID] = [Project1].[ContactID])                                                                                                OR (([Extent1].[ContactID] IS NULL) AND ([Project2].[ContactID] IS NULL)) )                                                                                 ) AS [Distinct1]                                                                 LEFT OUTER JOIN                                                                (SELECT [Extent5].[ContactID] AS [ContactID],                                                                                [Extent6].[SalesOrderID] AS [SalesOrderID],                                                                                [Extent6].[TotalDue] AS [TotalDue], 1 AS [C1]                                                                      FROM [DBA].[Contact] AS [Extent5]                                                                                  INNER JOIN [DBA].[SalesOrderHeader] AS [Extent6]                                                                                  ON EXISTS (SELECT  cast(1 as bit) AS [C1]                                                                                                          FROM ( SELECT cast(1 as bit) AS X ) AS [SingleRowTable2]                                                                                                                        LEFT OUTER JOIN (SELECT [Extent7].[ContactID] AS [ContactID]                                                                                                                                                       FROM [DBA].[Contact] AS [Extent7]                                                                                                                                                       WHERE [Extent6].[ContactID] = [Extent7].[ContactID] ) AS [Project5]                                                                                                                            ON cast(1 as bit) = cast(1 as bit)                                                                                                                        LEFT OUTER JOIN (SELECT [Extent8].[ContactID] AS [ContactID]                                                                                                                                                        FROM [DBA].[Contact] AS [Extent8]                                                                                                                                                        WHERE [Extent6].[ContactID] = [Extent8].[ContactID] ) AS [Project6]                                                                                                                          ON cast(1 as bit) = cast(1 as bit)                                                                                                          WHERE ([Extent5].[ContactID] = [Project5].[ContactID])                                                                                                                       OR (([Extent5].[ContactID] IS NULL) AND ([Project6].[ContactID] IS NULL))                                                                  ) ) AS [Project8]                                                                  ON ([Project8].[ContactID] = [Distinct1].[ContactID])                                                                        OR (([Project8].[ContactID] IS NULL) AND ([Distinct1].[ContactID] IS NULL))                                               ) AS [Project9]       ORDER BY [Project9].[ContactID] ASC, [Project9].[C2] ASC

Did you note the join conditions containing the EXISTS predicates? In days gone by, when application programs and their accompanying SQL statements were composed by hand, an attempt at placing such a query into production would be met with, at a minimum, a severe reprimand of the application programmer. With ORM toolkits, however, the programmer is insulated from such gory details; they may only realize that their application is "slow". Poor performance must be the database system's fault.

At the same time, it is typical for object-oriented applications to process relational data a row-at-a-time, in keeping with the navigational paradigm offered by object orientation. In Hibernate parlance, such applications embody the "N+1 SELECTS problem" as they implement client-side joins from within the application. And herein lies the problem: like other commercial and open-source database management systems, over the years SQL Anywhere has been significantly enhanced to provide state-of-the-art support for complex query processing: OLAP functionality, the MERGE statement, materialized views, sophisticated query processing methods such as hash anti-semijoin, computed columns, alternative join techniques, and so on. However, each additional piece of analysis within the query optimizer yields additional overhead for every statement. Consequently, it is extremely difficult for a database management system to offer precisely the same level of query processing performance for simple statements from release to release; every additional bit of SQL input analysis incurs additional CPU cost. Hence migrating from one release to the next of a commercial DBMS may yield reduced performance.

In response, what you see offered by the various vendors are mechanisms designed to mitigate these performance issues, primarily through various forms of caching: the caching of SQL result sets, caching of access plan strategies, caching of expressions, and so on, all leading to additional sophistication and complexity within each DBMS, and higher and higher utilization of memory. Moreover:

the characteristics of SQL queries generated by ORM toolkits are driving query optimization enhancements in relational database management systems;
technologies that offer pre-optimization of generated SQL statements, such as that provided by Semmle with their .QL query language, will become increasingly important because these technologies can much more effectively exploit underlying domain and mapping information in their optimization of database requests much more ably than can a "generic" SQL optimizer in a relational database management system.
How precisely to provide such optimization, and where in the application stack to implement it is an interesting question and is worthy of some significant research.

In summary, the proliferation of ORM toolkits is having an impact on every relational database system vendor. I expect that impact to continue unabated in the near term, particularly as such toolkits become ever more popular.

In this post, originally written by Glenn Paulley and posted to sybase.com in May of 2009, Glenn talks about MVCC and snapshot isolation (which is also part of SAP HANA) and why it is so useful in providing a consistent level of performance as the number of concurrent users increases.

Many commercial and open-source database management systems, including Microsoft SQL Server, Oracle, MySQL (with InnoDB or Falcon storage engines), PostgreSQL, Firebird, H2, Interbase, Sybase IQ, and SQL Anywhere support multi-version concurrency control, abbreviated as MVCC and often referred to as snapshot isolation.

Why is support for snapshot isolation so important? Well, snapshot isolation provides another widget in the DBA's toolkit, providing reasonable semantics that avoids various types of update anomalies, while not incurring the considerable overhead and contention of serializable query execution strategies.

The term serializability characterizes execution schedules where interleaved interaction of two or more database applications occur as if those transactions were executed serially (one following another). Prior to the idea of snapshot isolation, documented for the first time in reference [1] and subsequently implemented in Oracle, the way to achieve serializable transaction execution was through strict two-phase locking. However, the vast majority of database applications cannot tolerate the lack of concurrency that serializable execution entails; almost always, application developers are willing to tradeoff serializable semantics for improved concurrency (ie. weaker transaction isolation) and snapshot isolation is an important type of weaker concurrency control.

The ANSI/ISO SQL Standard defines isolation levels in terms of anomalies that may be avoided. They are (SQL:2008, Section 4.35.4, pp. 124-5):

The isolation level specifies the kind of phenomena that can occur during the execution of concurrent SQL transactions. The following phenomena are possible:
P1 ("Dirty read"): SQL-transaction T1 modifies a row. SQL-transaction T2 then reads that row before T1 performs a COMMIT. If T1 then performs a ROLLBACK, T2 will have read a row that was never committed and that may thus be considered to have never existed.
P2 ("Non-repeatable read"): SQL-transaction T1 reads a row. SQL-transaction T2 then modifies or deletes that row and performs a COMMIT. If T1 then attempts to reread the row, it may receive the modified value or discover that the row has been deleted.
P3 ("Phantom"): SQL-transaction T1 reads the set of rows N that satisfy some search condition. SQL transaction T2 then executes SQL statements that generate one or more rows that satisfy the search condition used by SQL-transaction T1. If SQL-transaction T1 then repeats the initial read with the same search condition, it obtains a different collection of rows.

It is well-known [1] that the above definitions are lacking in describing all of the anomalies that may occur at isolation levels lower than SERIALIZABLE; the paper by Berenson et al. [1] is highly recommended for DBAs and application programmers alike to help understand the anomalous behaviour that can be encountered at lower isolation levels. However, a common characteristic that ANSI isolation levels 1 through 3 incur is that a writer will block a reader; as with many other DBMS implementations, SQL Anywhere's concurrency control is based on locking and write locks cause blocking, except for read transactions at isolation level 0 (READ UNCOMMITTED) which offers no correctness guarantees. Moreover, writers always block other writers: SQL Anywhere does not permit "dirty writes" - termed P0 in reference [1] - at any isolation level, due to the ROLLBACK and recovery issues that entail if dirty writes are permitted.

The SQL standard does not specify how these P1, P2, and P3 anomalies are to be avoided; every database management system is free to implement its own solutions. For example, beginning with SQL Anywhere Version 10, SQL Anywhere utilizes intent locks with update-able cursors to prevent concurrent updates; intent locks permit read transactions to process the row, at least until the row is actually modified and the intent lock is upgraded to a write lock.

In addition to expanding on the ANSI isolation levels, reference [1] defined snapshot isolation: the basic idea is for each transaction to "see" a consistent snapshot of the database as of transaction start, and that this snapshot remains unaffected by other concurrent update transactions. Because of some quirks of terminology, many believe that snapshot isolation offers serializable semantics. However, snapshot isolation is not serializable [3] and some researchers have over the years proposed modifications to snapshot isolation so that it does offer serializability [4].

Snapshot isolation as proposed in [1] and implemented in Oracle is based on first-committer-wins; that is, if two transactions modify the same row, which is permitted in this scheme, the first transaction to COMMIT "wins", and other transactions in conflict will be unable to COMMIT, and must ROLLBACK. In contrast, snapshot isolation in SQL Anywhere is based on first-writer-wins, which (still) forces writers to block writers. This has the advantage of simplifying an application's COMMIT logic, but the disadvantage of being subject to greater risk of deadlock. However, SQL Anywhere's snapshot isolation retains the benefits of writers not blocking readers. This permits an application to "see" a consistent state of the database since the start of the transaction, making it straightforward, for example, for a read-only transaction to analyze an entire database without regard to updates made by concurrent transactions. This is a very powerful benefit of snapshot isolation.

Of course, snapshot isolation doesn't come for free. It is necessary for the database system to construct archive copies of changed data in anticipation of new snapshot transactions. With SQL Anywhere, copies of snapshot rows are managed automatically, written to the temp file (which grows on demand) as necessary. However, though the management impact is near zero, query performance can suffer as snapshot rows may need to be fetched individually from the snapshot row store in the temp file, based on the snapshot semantics of the transaction. The degree of performance degradation depends entirely on the application and its workload, and will be worse with update-intensive workloads. Nonetheless, those same update-intensive workloads may not be perform well with traditional ANSI isolation levels based on locking, because of the lock contention that may be incurred, and the greater potential for deadlock. Hence careful capacity planning should be undertaken prior to deploying such an application in a production setting.

NB. Links to papers are to freely available, public preprint versions.

[1] Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O'Neil, and Patrick O'Neil (June 1995). A Critique of ANSI SQL Isolation Levels. Proceedings of the 1995 ACM SIGMOD Conference, San Jose, California, pp. 1-10. Also available as Microsoft Research Technical Report MSR-TR-95-51.

[2] Atul Adya, Barbara Liskov, and Patrick O'Neil (March 2000). Generalized Isolation Level Definitions. In Proceedings of the 2000 IEEE International Conference on Data Engineering, San Diego, California, pp. 67-78.

[3] Alan Fekete, Elizabeth O'Neil, and Patrick O'Neil (September 2004). A Read-only Transaction Anomaly Under Snapshot Isolation. ACM SIGMOD Record 33(3), pp. 12-14.

[4] Alan Fekete, Dimitrios Liarokapis, Elizabeth O'Neil, Patrick O'Neil, and Dennis Shasha (June 2005). Making snapshot isolation serializable. ACM Transactions on Database Systems 30(2), pp. 492-528.

In this post, originally written by Glenn Paulley and posted to sybase.com in June of 2009, Glenn talks about customizing query semantics via the use of hints. While hints can be incredibly useful in some situations, it is important to note that the vast majority of the time, SQL Anywhere query optimization and execution does an incredible job of doing 'the right thing', taking into account all of the context in which it is running in order to provide performant execution of queries and DML.

There are a number of mechanisms one can use to affect the precise semantics of an SQL query, in particular to insulate (or expose) the affects of concurrent transactions to the results of a particular SQL statement. One such mechanism is the type of cursor that is used. As an example, INSENSITIVE cursors materialize the entire result set of a SQL query at OPEN time, insulating the query from the effects on concurrent updates, even from the same transaction, prior to the first FETCH of that result. On the other hand, SQL Anywhere KEYSET-DRIVEN (or SCROLL) cursors memoize rows as they are FETCHed by the application, and return a warning (error) back to the application if the same result row is re-FETCHED and has been concurrently updated (deleted).

Another mechanism is the isolation level used by the SQL statement. Lower ANSI SQL isolation levels than SERIALIZABLE have obvious benefits to improving concurrency, but at the risk of incurring anomalies during query execution because of concurrent updates, and interactions between them. Among other semantic effects, the use of table hints permit one to specify semantic changes on a query, or even table, basis. Ordinarily I'm not in favour of the use of hints in SQL queries, as I have a strong bias to letting the query optimizer choose the access plan as it sees fit. However, in many instances query or table hints can be extremely useful, particularly enabling fine-grained control of locking behaviour.

SELECT FOR UPDATE

Before we get to table hints, I'd like to mention two concurrency control hints at the statement level that are specified with the FOR UPDATE syntax. The basic FOR UPDATEclause explicitly declares an updateable cursor; however, at isolation levels 0 and 1 long-term locks on these are not acquired, and hence these rows are open to modification or deletion by other connections. To verify the subsequent updateability of these rows, there are two options:

Specify FOR UPDATE BY LOCK. This causes the acquisition of an INTENT row lock on each row as it is FETCHed by the application. INTENT locks permit other connections to read the row, but no other connection can acquire an INTENT or WRITE lock on it. INTENT locks are long-term locks that are held until COMMIT/ROLLBACK.
Specify FOR UPDATE BY TIMESTAMP or FOR UPDATE BY VALUES. In this case, SQL Anywhere forces the use of a KEYSET-DRIVEN cursor, as a form of optimistic concurrency control, to enable notification that a particular row has been altered or deleted by another connection.

Table hints

As with other systems, such as Microsoft SQL Server, SQL Anywhere supports table hints using an additional WITH clause. Here is an example, using the demo.db sample database:

SELECT *
 FROM CUSTOMERS WITH ( NOLOCK )

The NOLOCKtable hint causes the server to access the Customers table at isolation level 0. Note that table hints only apply to base or global shared temporary tables; hints are ignored if they are used with a view or proxy table. Here is a complete list of the table hints supported with by a SQL Anywhere 11.0.1 server:

NOLOCK- use isolation level 0 (no READ locking). Compatible with Microsoft SQL Server.
READUNCOMMITTED - synonym for NOLOCK.
READCOMMITTED- use short-term read locks at isolation level 1.
REPEATABLEREAD- use read locks at isolation level 2.
SERIALIZABLE - use read locks at isolation level 3.
HOLDLOCK - synonym for SERIALIZABLE, also supported by Sybase Adaptive Server Enterprise and Microsoft SQL Server.
READPAST. The READPAST table hint is supported for SELECT statements (only) in conjunction with isolation level 1. READPAST avoids blocking during a scan - either an index scan or a table scan - by simply "jumping" over rows that are locked with INTENT or WRITE locks. In this sense, READPAST exhibits unsafe semantics by simply eliminating uncommitted updates from the computation. However, the significant advantage to READPAST is that it is extremely useful for maintaining queues, or key pools, within base tables, yet avoiding concurrency conditions due to blocking. READPASTis also supported by Microsoft SQL Server and Sybase Adaptive Server Enterprise.
UPDLOCK- apply INTENT locks to each row of the scan.
XLOCK - apply WRITE locks to each row of the scan, prohibiting any other connections from accessing the rows except for transactions at isolation level 0.

Finally, the table hint FASTFIRSTROW causes the SQL Anywhere optimizer to use an optimization goal of FIRST-ROW for the SELECT block containing that table reference. Note that it doesn't matter which table receives the hint; any table reference in a SELECT block accompanied by a FASTFIRSTROWhint changes the goal.

OPTION clause

Beginning with SQL Anywhere 10.0.1, data manipulation (DML) statements (SELECT, UPDATE, etc) support an OPTION clause. A useful ability of the OPTION clause is to permit the application developer to override optimization bypass and force cost-based optimization of the statement. In addition, one can set specific connection options, such as ISOLATION LEVEL or OPTIMIZATION_GOAL, for this statement alone, avoiding the need to alter these option settings individually with a SET OPTION statement. As an example: SELECT * FROM Customers OPTION (OPTIMIZATION_LEVEL = 2, ISOLATION_LEVEL = 2 ) At present, the following connection options can be set in a query's OPTIONclause:

ISOLATION_LEVELoption
MAX_QUERY_TASKSoption
OPTIMIZATION_GOALoption
OPTIMIZATION_LEVELoption
OPTIMIZATION_WORKLOADoption
USER_ESTIMATES option

We will be looking at expanding this list to include other concurrency-control-related options in a future SQL Anywhere release.

In this post, originally written by Glenn Paulley and posted to sybase.com in June of 2009, Glenn continues to talk about customizing query semantics via the use of hints. As he says in his post, hints can be incredibly useful in some situations, but it is important to note that the vast majority of the time, SQL Anywhere query optimization and execution does an incredible job of doing 'the right thing', taking into account all of the context in which it is running in order to provide performant execution of queries and DML. Using hints should only be done in specific scenarios by experienced users.

In an earlier post on SQL request hints, I described techniques to modify concurrency control semantics with a specific SQL statement. In this second post, I'd like to describe some additional hinting capabilities supported by SQL Anywherewith respect to query optimization.

Just to be clear once again, I'm not advocating the use of hints. Using hints should be done with considerable care, as index hints override the query optimizer's decision-making logic. They also can become problematic from a maintenance standpoint. My recommendation is that hints should be used only by experienced users as workarounds for specific problems.

User-specified Selectivity Estimates

SQL Anywhere has supported self-managing table and column statistics for nearly two decades, and starting with the 8.0.0 release (December 2000) utilizes self-tuning column histograms to estimate column distributions and frequent-value statistics. In addition to histograms, the SQL Anywhere query optimizer utilizes a variety of other tools, including index probes, to estimate the selectivity of a predicate. Accurate selectivity estimates assist the query optimizer in estimating the cardinality of joins and other intermediate results, and hence accurate estimates lead to higher-quality access plans.

However, as I wrote last September, there are ample opportunities to conjure SQL queries that are difficult to optimize. One example is a SQL query that contains a predicate containing a function, or some form of additional computation, such as:

SELECT COUNT(*)
FROM SalesOrders so JOIN SalesOrderItems soi ON (so.id = soi.id)
WHERE DATEDIFF(day, so.OrderDate, soi.Shipdate) > 0

The problem is that the SQL Anywhere query optimizer is unable to decompose and analyze the comparison predicate involving DATEDIFF - and so makes a guess at its selectivity, which can be seen in a graphical plan for the statement:

In the above example, the guess estimate of 25% selectivity isn't really a problem, simply because it does not affect the access plan chosen by the query optimizer: here, because both tables are already cached and resident in the buffer pool, the optimizer has chosen a straightforward nested-loop join, rendering the additional DATEDIFF predicate as residual to the join.

There are, of course, situations where a residual predicate such as the one above is problematic, particularly when there are multiple joins. In these cases, the cardinality estimation error that stems from the initial guess, coupled with a set of assumptions about intermediate result set sizes yields, at the end of the day, an extremely poor access plan due to the physical access path(s) chosen for a particular table.

In these cases, one can workaround the issue by specifying a hard-coded selectivity estimate directly into the SQL text, which actually is a long-standing feature of SQL Anywhere. To specify a user estimate, one first enables the USER_ESTIMATES connection option, and in the query's WHEREclause expresses the predicate's selectivity as a percentage directly:

SELECT COUNT(*) 
FROM SalesOrders so JOIN SalesOrderItems soi ON (so.id = soi.id)
WHERE ( DATEDIFF(day, so.OrderDate, soi.Shipdate) > 0, 83.50 )

Here is what the graphical plan looks like with this modification:

Now, instead of the optimizer expecting 274 (25 percent of 1097) rows as input to the GROUP BY operator (to compute the COUNT(*)), the optimizer believes that the predicate selectivity is 83.5 percent, yielding 916 rows - which is the correct answer.

Adjusting predicate selectivity is one way to workaround selectivity estimation problems, but specifying user estimates of selectivity can itself be problematic, particularly when a query utilizes a host variable and the column distribution is skewed; in such cases specifying any hardcoded estimate is a problem.

Index Selection Table Hints

In addition to the locking table hints I mentioned previously, SQL Anywhere permits one to specify a hint that names a specific index, or indexes, to be used as the physical access path for a specific table. As with concurrency control hints, table hints on views or proxy tables are silently ignored. Here's an example:

SELECT * 
FROM Customers WITH ( INDEX(CustomersKey) )

The access path hints one can specify are:

WITH ( INDEX index-name ). This hint overrides the query optimizer's access path selection algorithm. An error is produced if an index of the given name does not exist. Index hints can be used for base tables, temporary tables, and materialized views.
WITH ( INDEX index-name1, index-name2, ... ). This form of index hint permits up to four indexes to be specified. If any of the specified indexes cannot be used, or do not exist, an error is returned. Multiple indexes can be exploited by the SQL Anywhere query optimizer for multi-index retrieval.
WITH ( NO INDEX ). This hint forces a sequential scan of the table.
WITH ( INDEX index-name1, index-name2, ... ) INDEX ONLY { ON | OFF }. With this modification, one can specify the use of index-only retrieval. With INDEX ONLY ON, the query optimizer will attempt to produce an access plan that utilizes index-only retrieval with the specified indexes. If any of the specified indexes cannot be used in satisfying an index-only retrieval, an error is returned (for example, if the named index does not exist, or the indexed attributes alone cannot satisfy the query). One can specify INDEX ONLY OFF to prevent index-only retrieval.
FORCE INDEX ( index-name ) is provided for compatibility with MySQL, and has the same semantics as WITH( INDEX index-name). FORCE INDEX does not support specifying more than one index.

Hints are useful - virtually all systems offer them out of necessity, and they can solve real problems for application developers. The trick is knowing when to use them.

In this post, originally written by Glenn Paulley and posted to sybase.com in June of 2009, Glenn discusses how regular expressions can be used within queries in SQL Anywhere.

SQL Anywhere version 11.0.0 introduced support for search conditions that include regular expression searching. There are two variants of regular expression search predicates that one can use, each with their own semantics: SIMILAR TO and REGEXP.

SIMILAR TO

The SIMILAR TO predicate is part of the 2008 ANSI/ISO SQL standard. However, the draft of the next SQL standard, due in the 2011 timeframe, is currently under development - in fact, a WG3 editing meeting is underway in Korea this week - and SIMILAR TO will likely be eliminated from subsequent versions of the SQL standard as its functionality is being replaced by the REGEXP_LIKE predicate (see below).

The syntax of the SIMILAR TO predicate is straightforward:

expression [ NOT ] SIMILAR TO pattern [ ESCAPE escape-expression ]

but as usual the devil is in the details. For starters, here's an example, using the SQL Anywhere demo database:

SELECT *
FROM Customers
WHERE PostalCode NOT SIMILAR TO '([0-9]{5})|([0-9]{5}-[0-9]{4})|([A-Z][0-9][A-Z][[:whitespace:]]{1}[0-9][A-Z][0-9])'

which finds all those addresses with invalidly-formatted postal codes (either US or Canadian); the accepted codes have the formats of (a) five numbers (US), (b) five numbers, a dash, and four numbers (US), and (c) the six-character alphanumeric Canadian postal codes with a single embedded blank.

The regular expression patterns supported by SIMILAR TO, however, differ from those supported by regular expression conditions in other software packages (such as Perl). Here is a small, non-exhaustive list of differences (a more exhaustive list is contained in the SQL Anywhere documentation):

As with LIKE and REGEXP, SIMILAR TO matches entire values, not portions of values.
SIMILAR TO uses "%" (percent) and "_" (underscore) as wildcard characters, in the same way as LIKE. One uses "%" instead of ".*".
SIMILAR TO doesn't support a variety of sub-character classes, such as [:ascii:], [:blank:], or [:punct:].
Perhaps most importantly, SIMILAR TO uses collation-based comparisons when comparing string values. This can be useful. For example, with SQL Anywhere's default case-insensitive string matching, the pattern [A]{1} is equivalent to [a]{1}, and these equivalences may also apply to accented characters with specific collations. However, a significant drawback is that range patterns don't work properly; the range pattern [A-C] does not, in fact, match only the upper case characters A, B, and C. Rather, in the default case-insensitive collation [A-C] matches any of the characters A, B, b, c and C; it does not match "a" because the character "a" precedes "A" in the collation sequence.

This means, then, that the example above fails to properly validate Canadian postal codes; the query would accept Canadian postal codes containing lower-case letters.

REGEXP

With the SQL Anywhere 11.0.1 release, the REGEXP predicate supports regular expression patterns in a manner similar to Perl and other UNIX-based tools that support regular expression searching. Once again, the syntax is straightforward:

expression [ NOT ] REGEXP pattern [ ESCAPE escape-expression ]

In the SQL Standard, the syntax is virtually identical except that the predicate uses the keyword LIKE_REGEXP. Supported patterns are those from the XQuery portion of the standard. In SQL Anywhere, we've adopted pattern syntax from a variety of sources, primarily Perl. REGEXP does not use collation-based matching; matching is based on code point values in the database character set. For example, the comparison X REGEXP '[A-C]', for the single character X, is equivalent to CAST(X AS BINARY) >= CAST(A AS BINARY) AND CAST(X AS BINARY).REGEXP supports the common meta-characters and sub-classes familiar to programmers, and also supports special escaped characters such as "\s" for a space, or "\r" for carriage return, and look-ahead and look-behind assertions. Here is the same example for validating postal codes, but this time using REGEXP:

SELECT *
FROM Customers
WHERE PostalCode NOT REGEXP '([0-9]{5})|([0-9]{5}-[0-9]{4})|([A-Z][0-9][A-Z]\s[0-9][A-Z][0-9])'

Finally, note that the SQL Anywhere query optimizer will automatically optimize REGEXP and SIMILAR TO predicates - as it does for LIKE predicates - to be used as sargable predicates for index scans, depending on the specific pattern.

SAP takes the security of its products very seriously. The recent OpenSSL vulnerability known as Heartbleed does impact some users of SQL Anywhere.

Here are the details:

Affected Components

SQL Anywhere Server – If you use TLS (Transport Layer Security) communications and/or HTTPS web services they are vulnerable, though only to the networks that can access the server. Note that calling external web services over HTTPS from the database server is also affected.
MobiLink Server – If you use TLS and/or HTTPS communications they are vulnerable, though only to the networks that can access the MobiLink server
Relay Server Outbound Enabler

Affected Versions - note that all platforms are impacted by this issue.

SQL Anywhere 12.0.1 ebf 3994-4098
SQL Anywhere 16.0 ebf 1690-1880

Current Workaround

To avoid being exposed due to this problem, you can revert to an ebf/SP prior to the ones listed above, or to the GA release.
Regenerate any certificates that you were using.
Change any passwords/keys associated with SQLA web service calls or TLS authentication.

Resolution

Download and apply SQL Anywhere 12.0.1 ebf 4099 or newer and/or SQL Anywhere 16.0 ebf 1881 or newer when it becomes available. A further announcement will be made when the patch is available
Regenerate any certificates that you were using.
Change any passwords/keys associated with SQLA web service calls or TLS authentication.

In addition, here is the text of the latest response (as of this posting) from the SAP security team, released earlier today on service marketplace (http://service.sap.com/securitynotes):

Deficiencies in releases of OpenSSL libraries

SAP takes any security-related report very seriously. We will notify our customers appropriately as relevant new information on this topic becomes available.

We take the opportunity to remind you to increase the security of your SAP systems by installing the available security patches. For information on SAP’s security notes and patches, please go to the SAP Security Notes page on the SAP Service Marketplace extranet at https://service.sap.com/securitynotes.

SAP has received information about security deficiencies in some releases of OpenSSL libraries, used in a number of software products of different vendors. These deficiencies are referred to under the name of the “Heartbleed” vulnerability (CVE-2014-0160, see http://heartbleed.com). SAP security teams are in the process of investigating if products are possibly affected by the reported vulnerability. At the current state of investigations we have no indications that SAP NetWeaver and SAP HANA are affected.

If there are any further questions, please don't hesitate to contact SAP support.

In this post, originally written by Glenn Paulley and posted to sybase.com in July of 2009, Glenn discusses using the OPENSTRING clause in a select statement to load data into SQL Anywhere.

Invariably one needs to transfer the contents of a flat file into, or, sometimes, out of, a SQL Anywhere server. Prior to Version 11, importing a flat file into the server could be done in the following ways (without resorting to writing code to do so):

using the LOAD TABLE statement;
using the INPUT statement from within the DBISQL tool;
loading using the xp_read_file() procedure; or
establishing a proxy table to the flat file using Remote Data Access services.

Each of the above solutions have trade-offs. The first two options require the creation of a table in which to load the data. LOAD TABLE is faster than INPUT, but suffers from two disadvantages: firstly, that the file must be directly accessible from the server machine, and secondly, the individual rows being inserted are not recorded in the transaction log, complicating recovery. The third option also requires that the file be local to the server machine. Unfortunately xp_read_file() creates an opaque value that (still) requires parsing. The fourth option, a proxy table, is even more cumbersome to use, requiring the possible creation of an external server, an EXTERNLOGIN object, etc.

With Version 11, SQL Anywhere handles interactions with flat files more flexibly. In this post, I'll describe the functionality provided by OPENSTRING, and I'll follow this with a subsequent post on additional options.

OPENSTRING

To illustrate these new features, I wanted to upload into the server the CSV output file generated by StatCounter for my blog. Here are the first few lines of the file (I opened a StatCounter account on 20 January of this year):

Day,Date,Page Loads,Unique Visitors,First Time Visitors,Returning Visitors
Tuesday,20th January 2009,"83","48","46","2"
Wednesday,21st January 2009,"163","108","102","6"
Thursday,22nd January 2009,"127","105","99","6"
Friday,23rd January 2009,"126","91","87","4"
Saturday,24th January 2009,"42","37","35","2"
Sunday,25th January 2009,"52","38","36","2"
Monday,26th January 2009,"171","133","119","14"
Tuesday,27th January 2009,"157","110","101","9"

Rather than use a proxy table, LOAD TABLE, or INPUT INTO, I decided to avoid creating a table at all, and simply refer to the flat file directly in my SELECT statement using OPENSTRING.

OPENSTRING is a table expression whose input parameter can be either a flat file or a (string) variable. When specified the server parses the file input and constructs rows of a virtual table matching the schema that one specifies in a clause of the OPENSTRING expression. Here is the SQL grammar for an OPENSTRING table expression.

<openstring-expression> ::= OPENSTRING ( { FILE | VALUE } <string-expression> ) WITH ( <rowset-schema> ) [ OPTION ( <scan-option> ...  ) ]<rowset-schema> ::= <column-schema-list> | TABLE [owner.]table-name [ ( column-list ) ]<column-schema-list> ::= { <column-name> <user-or-base-type> | FILLER( ) } [ , ... ]<column-list> ::= { <column-name> | FILLER( ) } [ , ... ]<scan-option> ::= BYTE ORDER MARK { ON | OFF }        | COMMENTS INTRODUCED BY <comment-prefix>        | DELIMITED BY <string>           | ENCODING <encoding>        | ESCAPE CHARACTER <character>        | ESCAPES { ON | OFF }        | FORMAT { TEXT | BCP }        | HEXADECIMAL { ON | OFF }        | QUOTE <string>        | QUOTES { ON | OFF }        | ROW DELIMITED BY <string>        | SKIP integer        | STRIP { ON | OFF | LTRIM | RTRIM | BOTH }

Note that the options for OPENSTRING match those available for LOAD TABLE. In my case, the Excel-CSV file uses comma-delimited fields, and the first line contains the attribute names which must be ignored for processing the actual values. Here is a SQL statement that creates a result set from this input flat file directly:

SELECT stat_weekday,      CAST ( ( REGEXP_SUBSTR( str_stat_date, '[0-9]+(?=(st|nd|th|rd|ST|ND|TH|RD )\s.*)' )                    || REGEXP_SUBSTR( str_stat_date, '(?<=[0-9]+(st|nd|th|rd|ST|ND|TH|RD)).*' ) ) AS DATE ) as stat_date,      page_loads, unique_visitors, first_time_visitors, returning_visitors
FROM OPENSTRING( VALUE READ_CLIENT_FILE ('c:\gpaulley\blog\summary-6July2009.csv' ) )    WITH( stat_weekday char(10), str_stat_date char(30), page_loads int, unique_visitors int, first_time_visitors int, returning_visitors int)        OPTION( SKIP 1 DELIMITED BY ',' ) AS summary

Some points to make about the above statement:

The parameter to OPENSTRING is either VALUE or FILE. If FILE is specified, the flat file must be local to the server machine; in this case, the file was on my own computer. Version 11 supports the READ_CLIENT_FILE function that uses SQL Anywhere's CMDSEQ wire protocol to fetch a file's contents from the client machine. READ_CLIENT_FILE creates an internal string that OPENSTRINGsubsequently parses to create the rows.
The WITH clause specifies the schema of the file. Conversions to server data types from the strings in the file are performed automatically. However, because the CSV file generated by StatCounter contained ordinal date values (eg. '21st January 2009') the date values are parsed as strings.
To generate a modified string that SQL Anywhere can convert to a DATE, I used the REGEXP_SUBSTR function, another new feature of Version 11 that accompanies SQL Anywhere's regular expression support. Here, the first usage of REGEXP_SUBSTR returns the truncated, numeric, day of the month, using a positive lookahead zero-width assertion. The second instance of REGEXP_SUBSTR, which is similar, uses a positive look-behind zero-width assertion to return the rest of the string. When concatenated, the two functions convert '21st January 2009' to '21 January 2009' and the server can handle that conversion automatically, with an appropriate choice ('DMY') of the DATE_ORDER connection option.
Using READ_CLIENT_FILE requires two things:
- The database must be enabled for client file access by enabling the ALLOW_READ_CLIENT_FILE option. This option can be set only by a user with DBA authority.
- A user invoking READ_CLIENT_FILE must have the READCLIENTFILE authority.

Here is a screen shot of the query and its result set:

OPENSTRING is a table expression that can appear in any DML statement - including INSERT and MERGE - and can reference any string variable in the query's scope. Moreover, one can utilize OPENSTRING in a view. Once the view definition has been established, one can then create INSTEAD OF triggers so that procedures or applications using the flat file from the view can seamlessly issue update DML statements against the view.

In this post, originally written by Glenn Paulley and posted to sybase.com in July of 2009, Glenn discusses using the VARBIT datatype in SQL Anywhere.

The ANSI/ISO SQL Standard eliminated the BIT and BIT VARYING data types with the formal adoption of SQL:2003; the last standard to support them, including the BIT_LENGTH() function, was SQL:1999. The SQL:2003 standard retained the BOOLEAN type to hold the truth values TRUE and FALSE.

The functionality provided by bit string arrays can be useful in a number of instances, and despite the deprecation of the BIT VARYING type from the SQL:2003 standard, SQL Anywhere introduced support for the BIT VARYING type in Versions 10 and up. The following type declarations are supported for bit string arrays:

BIT VARYING [ ( length ) ]
LONG BIT VARYING

with VARBIT as an additional shorthand for BIT VARYING. If length is unspecified, it defaults to 1. A LONG BIT VARYING column constitutes a BLOB with a maximum length of 2GB.

In SQL Anywhere, the single-valued BIT type can be used as a synonym for the SQL Standard's BOOLEAN type.

BIT VARYING Scalar and Aggregate Functions

The BIT VARYING and LONG BIT VARYING types can be manipulated using the following scalar functions:

BIT_LENGTH()- returns the length of the bit string. Note that this function does not have the same semantics as the BIT_LENGTH() function from SQL:1992, which would return the number of bits of a character string. Hence the query
```
SELECT BIT_LENGTH( '01101011' ); 
```
returns the value 8 (the string in quotes is interpreted as a binary string), rather than 64 (using SQL:1992 semantics for BIT_LENGTH()where the string of 0's and 1's are interpreted as a character string).
BIT_SUBSTR( bit-expression [, start [, length ] ] )- returns a substring of the bit array.
COUNT_SET_BITS( bit-expression )- returns the number of '1' bits in the bit array.
GET_BIT( bit-expression, position )- returns a BIT value of the bit at the specific position in the array.
SET_BIT([ bit-expression, ]bit-position [, value ]) - set the value of the bit at the given position to values. The default value is '0'. SET_BIT() returns a LONG VARBIT expression containing the modified bit string. If bit-expression is unspecified, the bit string defaults to a string of '0' bits of "position" length.

and by the following aggregate functions:

BIT_AND( expression ) is an aggregate function that performs a bit-wise AND of successive bit array values from multiple rows. For example, the query
```
SELECT BIT_AND( CAST(row_value AS VARBIT) ) 
FROM dbo.sa_split_list('0001,0111,0100,0011') 
```
returns the bit array '0000' since a bit-wise AND of the four values yields a bit string of all '0's.
BIT_OR() and BIT_XOR() are similar to BIT_AND(), performing bit-wise OR and XOR operations respectively.
The SET_BITS( integer-expression ) aggregate function returns a VARBIT array with bit positions set to '1' corresponding to the integer values of the expression in each row in the group. As an example, the following statements return a bit array with the 2nd, 5th, and 10th bits set to 1 (or 0100100001):
```
CREATE TABLE T( x INTEGER ); 
INSERT INTO T values( 2 ); 
INSERT INTO T values( 5 ); 
INSERT INTO T values(10 ); 
SELECT SET_BITS( x ) FROM T;
```

Type conversions to BIT VARYING

With conversions from other types to the BIT VARYING type, SQL Anywhere tries as much as possible to perform intuitive conversions. A fairly complete description is available in the SQL Anywhere documentation, but here are a few examples:

INTEGER to BIT VARYING: When converting an integer to a bit array, the length of the bit array is the number of bits in the integer type, and the bit array's value is the integer's binary representation. The most significant bit of the integer becomes the first bit of the array.
```
SELECT CAST( CAST( 8 AS TINYINT ) AS VARBIT ) 
```
returns a VARBIT(8) containing '00001000'.
BINARY to BIT VARYING: When converting a binary type of length n to a bit array, the length of the array is n * 8 bits. The first 8 bits of the bit array become the first byte of the binary value. The most significant bit of the binary value becomes the first bit in the array. The next 8 bits of the bit array become the second byte of the binary value, and so on:
```
SELECT CAST( 0x8181 AS VARBIT ) 
```
returns a VARBIT(16) containing '1000000110000001'.
CHAR or VARCHAR to BIT VARYING: when converting a character data type of length n to a bit array, the length of the array is n bits. Each character must be either '0' or '1' and the corresponding bit of the array is assigned the value 0 or 1.
```
SELECT CAST( '001100' AS VARBIT )
```
returns a VARBIT(6) containing '001100'.
BIT VARYING to INTEGER: when converting a bit array to an integer data type, the bit array's binary value is interpreted according to the storage format of the integer type, using the most significant bit first.
```
SELECT CAST( CAST( '11000010' AS VARBIT ) AS INTEGER )
```
returns the integer value 194 (110000102 = 0xC2 = 194).

The sa_get_bits() system procedure

The dual of the SET_BITS() aggregate function is the system stored procedure sa_get_bits() that generates a row for each bit in a bit array, and (optionally and by default) which can generate rows for only those bit positions that are '1'. Here's an example that generates a row for each bit position in the input expression, regardless of its value:

Many PDF readers for smart phones (Android/iphone) and tablets manage the pdf files based solely on the Title and Author fields in the PDF file. While for this is fine for your average book, it is not all that helpful with manuals that tend to have abbreviated or no data in the title/author fields. In the case of the manuals for Sybase IQ, I’m unable to load the manuals for say v11.0 and v12.0.1 as they have the same Title/Author data.

How to fix? Easy. Go get Calibre. Drop the PDF files on to the running Calibre. Edit them by hitting the E key.

In my case, I edited the “Title”, “Author”, “Tags”, “Publisher” and “Languages”:

Calibre doesn’t modify the PDF files themselves so I will need to export the files to a custom directory. In Calibre nomenclature, this is “Saving”. Highlight all the titles you want to export and hit “S” twice. Why twice? No idea. Choose the directory.

SQL Anywhere 12.0.1 (r) Server – Database Administration – SAP, Inc_

You can now copy the exported PDF files to your phone, tablet, whatever without fear of the v12.0.1 version of the P&T Guide being rejected by Aldiko because the v12.0 version is already added.

Here are the SQL Anywhere v12.0.1 manuals that I’ve ‘fixed’ to work with Aldiko. They are identical to the PDFs on sybooks with the exception of the PDF fields I mentioned previously.

No copyright infringement is intended. SAP/Sybase, please feel free to take these and host them.

Files at Jason L. Froebe - Tech tips and How Tos for Fellow Techies SAP Sybase SQL Anywhere 12.0.1 manuals fixed to work with t…

In this post, originally written by Glenn Paulley and posted to sybase.com in July of 2009, Glenn discusses using the LOAD TABLE to load data into SQL Anywhere when the data resides on the client.

In my last post on this topic, I described the ability of SQL Anywhere to reference a CSV file directly from within a query using the READ_CLIENT_FILE() function. In this post, I want to describe similar extensions to the LOAD TABLE statement that are supported in Version 11 and up.

Loading Client-Resident Files With LOAD TABLE

The ability of the SQL Anywhere server to callback to a client to retrieve data with READ_CLIENT_FILE() is also available with LOAD TABLE. With previous SQL Anywhere releases, LOAD TABLE could only access files directly accessible from the server machine. Now, with the appropriate authorizations (READCLIENTFILE authority) and enablement (the ALLOW_READ_CLIENT_FILE database option) one can load a file from a client machine directly into a base or temporary table on the server. Following part un, where I described processing a CSV-file generated from my StatCounter account, here's an example loading the CSV file into a permanent table using LOAD TABLE:

CREATE TABLE visitor_summary (     dayoftheweek    CHAR(9) NOT NULL,    record_date    CHAR(30) NOT NULL,    page_loads      INT NOT NULL,    unique_visitors INT NOT NULL,    first_time_visitors INT NOT NULL,    returning_visitors INT NOT NULL );
LOAD TABLE visitor_summary USING CLIENT FILE 'c:\gpaulley\blog\Summary-6July2009.csv' DELIMITED BY ',' SKIP 1

USING CLIENT FILE does not materialize the contents of the file on the server. Hence the client file can be of arbitrary size.

In Version 11, LOAD TABLE supports more than the loading of files. The syntax for LOAD TABLE includes the USING VALUE clause, enabling one to load data into a table from any expression of CHAR, NCHAR, BINARY, LONG VARCHAR, LONG NVARCHAR, or LONG BINARY type in the identical manner to OPENSTRING. Hence the above LOAD TABLE statement could be written as

LOAD TABLE visitor_summary USING VALUE READ_CLIENT_FILE('c:\gpaulley\blog\Summary-6July2009.csv' ) DELIMITED BY ',' SKIP 1

Loading Data From a Table Column

In Version 11, the LOAD TABLE statement has explicit syntax for loading data from another column in another table, where the table column contains one or more "rows" as a BLOB or CLOB (and hence are limited to 2GB each in size). Here's an example:

BEGIN    DECLARE summary_data LONG VARCHAR;    DECLARE LOCAL TEMPORARY TABLE summary_temp ( summary_line INT NOT NULL PRIMARY KEY, summary_contents LONG VARCHAR NOT NULL NO INDEX) ON COMMIT PRESERVE ROWS;    SET summary_data = xp_read_file( 'c:\gpaulley\blog\Summary-6July2009.csv' );    INSERT INTO summary_temp VALUES ( 1, summary_data );    LOAD TABLE visitor_summary USING COLUMN summary_contents FROM summary_temp ORDER BY summary_line SKIP 1 DELIMITED BY ',' WITH ROW
END

The syntax is

LOAD TABLE ... USING COLUMN column_name FROM table_name ORDER BY key [ loading options ]

which causes the load to process the values from "table_name.column_name". The ORDER BY clause is not optional; one must specify a total ordering of the rows in "table_name" by referencing columns that cover a primary key, unique index, or unique constraint of "table_name". The LOAD processes all of the rows of "table_name" in this order. If we modify the above example to insert multiple rows into the temporary table (ie. duplicating the INSERT on line 14):

BEGIN    DECLARE summary_data LONG VARCHAR;    DECLARE LOCAL TEMPORARY TABLE summary_temp ( summary_line INT NOT NULL PRIMARY KEY, summary_contents LONG VARCHAR NOT NULL NO INDEX) ON COMMIT PRESERVE ROWS;    SET summary_data = xp_read_file( 'c:\gpaulley\blog\Summary-6July2009.csv' );    INSERT INTO summary_temp VALUES ( 1, summary_data );    INSERT INTO summary_temp VALUES ( 2, summary_data );    INSERT INTO summary_temp VALUES ( 3, summary_data );    LOAD TABLE visitor_summary USING COLUMN summary_contents FROM summary_temp ORDER BY summary_line SKIP 1 DELIMITED BY ',' WITH ROW LOGGING;
END

then 3 duplicate copies of each row of StatCounter summary data will be loaded into "visitor_summary".

Loading Data and the Transaction Log

One of main performance benefits of using LOAD TABLE over INSERT statements is better execution speed. One of the ways that execution speed is improved is that triggers on the table being loaded do not fire. A second speedup technique is that, by default, the contents of the data being loaded are not written to the transaction log; only the text of the LOAD TABLE statement itself is written to the log. This has several critical implications:

If the database is being mirrored as part of a high-availability system, the newly-loaded data cannot be sent to the mirroring server.
Similarly, rows loaded using LOAD TABLE are problematic for log-based synchronization (Mobilink or SQL Remote) since the rows themselves do not appear in the transaction log.
If recovery of the database is required and the LOAD TABLE statement must be replayed, the original file that was loaded must be available for the server to replay the LOAD TABLE statement from the transaction log. If the file is unavailable, recovery will fail. If the file is different from the original, it is possible for the database to become logically corrupt.

SQL Anywhere Version 11 offers additional mechanisms to circumvent these issues with LOAD TABLE. The LOAD TABLE statement offers a WITH LOGGING clause to explicitly specify how the statement is to be logged. The possible options are:
- WITH FILE NAME LOGGING clause. This clause matches the server's default behaviour when loading server-resident files, which is to cause only the LOAD TABLE statement to be recorded in the transaction log. This level of logging cannot be used when loading from an expression or a client file. When you do not specify a logging level in the LOAD TABLE statement, WITH FILE NAME LOGGING is the default level when specifying:
```
LOAD TABLE ... FROM filename-expression 
LOAD TABLE ... USING FILE filename-expression 
```
- WITH ROW LOGGING clause. The WITH ROW LOGGING clause causes each row that is loaded to be recorded in the transaction log as an INSERT statement. This level of logging is recommended for databases involved in synchronization, and is supported in database mirroring. However, when loading large amounts of data, this logging type can impact performance, and results in a much longer transaction log.
  This level is also ideal for databases where the table being loaded into contains non-deterministic values, such as computed columns, or CURRENT TIMESTAMP defaults.
- WITH CONTENT LOGGING clause. The WITH CONTENT LOGGING clause causes the database server to chunk together the content of the rows that are being loaded. These chunks can be reconstituted into rows later, for example during recovery from the transaction log. When loading large amounts of data, this logging type has a lower impact on performance compared to logging each individual row, and offers increased data protection. Nonetheless, using WITH CONTENT LOGGING does result in a longer transaction log. This level of logging is recommended for databases involved in mirroring, or where it is desirable to not maintain the original data files for later recovery.
  
  The WITH CONTENT LOGGING clause cannot be used if the database is involved in synchronization.
  
  When you do not specify a logging level in the LOAD TABLE statement, WITH CONTENT LOGGING is the default level when specifying:
  
  LOAD TABLE ... USING CLIENT FILE client-filename-expression
  LOAD TABLE ... USING VALUE value-expression
  LOAD TABLE ... USING COLUMN column-expression

I originally posted this a few years ago to help people configure their Java applications to use the iAnywhere JDBC driver, which was replaced by the SQL Anywhere JDBC driver in SQL Anywhere version 12. I am reposting it here with a few minor updates, since I still refer to it occasionally and I think it is still useful.

I have heard from customers that connecting to SQL Anywhereover JDBC can be difficult at times. In my investigations of this, I have found that this is almost always due to confusion over the classname to use to register the JDBC driver, and the URL's to use to actually connect to the database. In stepping back, I can see how people might easily get confused based on the history of the JDBC driver. Here is my attempt to clarify things by following the history of the driver, starting with SQL Anywhere version 9.

Before I go into detail on the history of the SQL Anywhere JDBC driver, here is a table which explains classpath settings, jar files required, driver name URLs and sample connection URLs.

SQL Anywhere Version	JDBC jar file to include in classpath	Driver classname	Connection URL
9.0.2	%ASA90%\java\jodbc.jar	ianywhere.ml.jdbcodbc.IDriver	jdbc:odbc:Driver=Adaptive Server Anywhere 9.0;UID=DBA;PWD=sql;eng=demo
10.0.0	%SQLANY10%\java\jodbc.jar	ianywhere.ml.jdbcodbc.jdbc3.IDriver	jdbc:odbc:Driver=SQL Anywhere 10 Demo;UID=DBA;PWD=sql;eng=demo
10.0.1	%SQLANY10%\java\jodbc.jar	ianywhere.ml.jdbcodbc.jdbc3.IDriver	jdbc:ianywhere:Driver=SQL Anywhere 10;DSN= SQL Anywhere 10 Sample
11.0.0	%SQLANY11%\java\jodbc.jar	ianywhere.ml.jdbcodbc.jdbc3.IDriver	jdbc:ianywhere:Driver=SQL Anywhere 10;DSN= SQL Anywhere 11 Sample
11.0.1	%SQLANY11%\java\sajdbc.jar	sybase.jdbc.sqlanywhere.IDriver	jdbc:sqlanywhere:uid=DBA;pwd=sql;eng=demo
12.0.0	%SQLANY12%\java\sajdbc4.jar	no longer required for JDBC 4.0	jdbc:sqlanywhere:uid=DBA;pwd=sql;eng=demo
16.0	%SQLANY16%\java\sajdbc4.jar	no longer required for JDBC 4.0	jdbc:sqlanywhere:uid=DBA;pwd=sql;eng=demo

Adaptive Server Anywhere version 9.0 (aka SQL Anywhere 9.0)
In version 9, SQL Anywhere supported JDBC 2.0 using an iAnywhere generic JDBC-ODBC bridge driver (similar to but different from the Sun JDBC/ODBC driver). The jar file is jodbc.jar, and resides in the %ASA90%\java directory. To use the iAnywhere JDBC driver, you need to include the jar in your classpath. Then, you need to register it in your java app using the following code:
```
DriverManager.registerDriver(    (Driver)Class.forName( "ianywhere.ml.jdbcodbc.IDriver" ).newInstance() );
```
Since the iAnywhere JDBC/ODBC driver is a bridge driver, to connect to your SQL Anywhere database, you need to specify a "DRIVER=" parameter along with the rest of your connect string. For example:
```
Connection con = DriverManager.getConnection(    "jdbc:odbc:Driver=Adaptive Server Anywhere 9.0;UID=DBA;PWD=sql;eng=demo" );
```
or, you could use an ODBC data source like this:
```
Connection con = DriverManager.getConnection(    "jdbc:odbc:DSN= Adaptive Server Anywhere 9.0 Sample" );
```
SQL Anywhere 10.0.0
In version 10, we added support for JDBC 3.0. To use the version 10 iAnywhere JDBC/ODBC bridge driver, you need to again include %SQLANY10%\java\jodbc.jar in your classpath. However, the class name for driver registration is slightly different:
```
DriverManager.registerDriver(    (Driver)Class.forName( "ianywhere.ml.jdbcodbc.jdbc3.IDriver" ).newInstance() );
```
Once registered, the connection URL was the same as in verison 9, above.
SQL Anywhere 10.0.1
After version 10 was released, we noticed that in some customer issues involving JDBC, the iAnywhere driver was not always being loaded when it was supposed to be, particularly when the Sun JDBC/ODBC bridge driver was present. It turns out that our use of "jdbc:odbc" in the connection URL was not sufficient to guarantee that the iAnywhere driver would be used during a connection. If the Sun bridge were present, it could be picked up and used instead, which lead to all sorts of unexpected behaviour. To resolve this problem, the 10.0.1 maintenance release introduced a new URL header for the iAnywhere driver, "jdbc:ianywhere". From this point forward, the URL to register the driver was the same as with v10, but the correct URL to use when connecting to the database was as follows:
```
Connection con = DriverManager.getConnection(    "jdbc:ianywhere:Driver=SQL Anywhere 10;DSN= SQL Anywhere 10 Sample" );
```
The "jdbc:ianywhere" portion of the connection string was actually back-ported to a 9.0.1 ebf, so if you are running one of the later 9.0.1 or 9.0.2 ebfs, the above connection URL will work for you as well.
SQL Anywhere version 11.0.0
In SQL Anywhere version 11, there was no change in classname for the driver or URL for the connection string, but we did update to a newer version of the JDK. This meant we had to drop the JDBC 2.0 driver, because JDK 1.4 and newer no longer supported it. To make things easier for our customers, we kept the JDBC 2.0 class names in the version 10 JDBC 3.0 jar. They simply pointed to the JDBC 3.0 equivalents.
SQL Anywhere 11.0.1
In SQL Anywhere version 11.0.1, a new SQL Anywhere JDBC driver was introduced. No longer a generic iAnywhere JDBC driver, it is a JDBC driver specific to SQL Anywhere. This was done to make it easier (ie. less confusing) for people to use JDBC with SQL Anywhere. With the new driver, there is no need to install ODBC on the system. This wasn't a problem for Windows, but our Linux and Unix customers often had problems with this. As an added bonus, the performance of the driver was improved slightly because we no longer have to go through the ODBC driver manager. This change involved adding 2 new files to the SQL Anywhere installation: sajdbc.jar and dbjdbc11.dll. To use the new driver, you need to include %SQLANY11%\java\sajdbc.jar in your classpath. Then, the driver registration is as follows:
```
DriverManager.registerDriver(   (Driver) Class.forName( "sybase.jdbc.sqlanywhere.IDriver" ).newInstance() );
```
Then, to connect, you use the following URL:
```
Connection con = DriverManager.getConnection(    "jdbc:sqlanywhere:uid=DBA;pwd=sql;eng=demo" );
```
SQL Anywhere 12/16
SQL Anywhere 12 deprecated the use of the iAnywhere JDBC/ODBC bridge driver in favor of the new SQLAnywhere driver. In addition, SQL Anywhere 12 will support JDBC 4.0 (which requires JDK 1.6 or newer). To continue to use the JDBC 3.0, users do not have to make any changes from previous versions. However, to use the JDBC 4.0 support, the new driver name is "sybase.jdbc4.sqlanywhere.IDriver", and requires that %SQLANY12%\java\sajdbc4.jar be in your classpath. However, there is no need to call DriverManager.registerDriver(...) to register the driver before using it anymore. Sun has implemented automatic driver registration so that applications just need to make sure that sajdbc4.jar is in the classpath (and dbjdbc12.dll is in the path), and use the "jdbc:sqlanywhere" URL header to connect. So, to connect with SQL Anywhere 12 and JDBC 4.0, all you need is something like the following line of code:
```
Connection con = DriverManager.getConnection(    "jdbc:sqlanywhere:uid=DBA;pwd=sql;eng=demo" );
```

That concludes our history lesson. Confused yet?

In this post, originally written by Glenn Paulley and posted to sybase.com in August of 2009, Glenn talks about validation of SQL statements against the ANSI/ISO SQL Standard from within SQL Anywhere. Note also that the flagger has been enhanced to support SQL:2008 in more recent versions of SQL Anywhere.

The flagging of SQL statements is defined in the ANSI/ISO SQL:2008 standard as language features F812 ("basic" flagging) and F813 ("extended" flagging). Flagging is the notion of identifying non-conformance of specific SQL constructions with respect to the ANSI/ISO SQL standard. Here's a simple example using the sample DEMO database with SQL Anywhere:

SELECT SQLFLAGGER( 'SQL:2003/Core', 'SELECT TOP 10 * FROM Customers                                                                                WHERE State IN (''NJ'', ''NC'') AND Country LIKE ''%USA%'' ORDER BY Surname' );

Flagging in SQL Anywhere

SQL Anywhere offers several ways to invoke the SQL Flagger to check a SQL statement, or a batch of SQL statements. They include:

the SQLFLAGGER()function;
the SQL_FLAGGER_ERROR_LEVEL and SQL_FLAGGER_WARNING_LEVEL connection options;
the SA_ANSI_STANDARD_PACKAGES system procedure; and
flagging capabilities in the SQL preprocessor (SQLPP).

The SQLFLAGGER function

The SQLFLAGGER() function analyzes a single SQL statement, or batch, passed as a string argument, for compliance with a given SQL standard. The statement or batch is parsed, but not executed. The function returns a LONG VARCHAR containing any error messages output by the flagger. The first parameter is the standard/implementation to which the SQL statement is to be compared; SQL Anywhere supports compliance comparisons with SQL:2003 (Core/Package), SQL:1999 (Core/Package), SQL:1992 (Full, Intermediate, Entry) and Ultralite.

At this time, it is unknown if the forthcoming Innsbruck release of SQL Anywhere will offer Flagging support for the current SQL:2008 standard.

The SQL_FLAGGER_ERROR_LEVEL and SQL_FLAGGER_WARNING_LEVEL connection options

The SQL_FLAGGER_ERROR_LEVEL and SQL_FLAGGER_WARNING_LEVEL connection options invoke the SQL Flagger for any statement prepared or executed for the connection. If the statement does not comply with the option setting, which is a specific ANSI standard or UltraLite, the statement either terminates with an error (SQLSTATE 0AW03), or returns a warning (SQLSTATE 01W07), depending upon the option setting. If the statement complies, statement execution proceeds normally.

The SA_ANSI_STANDARD_PACKAGES system procedure

The SA_ANSI_STANDARD_PACKAGES() system procedure analyzes a statement, or batch, for the use of optional packages from the ANSI/ISO SQL:2003 or SQL:1999 international standards. The procedure takes two parameters: the first, a string that identifies the desired standard, and the second is the SQL statement to be analyzed. The result of the procedure is a list of the optional SQL standard packages utilized by the statement. Here is an example:

Flagging in the SQL preprocessor

The SQL preprocessor (SQLPP) has the ability to flag static SQL statements in an embedded SQL application at compile time. This feature can be especially useful when developing an UltraLite application, to verify SQL statements for UltraLite compatibility. Using the flagger with SQLPP simply involves setting additional command line switches when invoking SQLPP.

In this post, originally written by Glenn Paulley and posted to sybase.com in August of 2009, Glenn talks about blank padded character strings and how they behave with respect to the LIKE predicate.

In the SQL:2008 standard, fixed-length character string values are blank padded. Blank padding occurs during storage of a fixed-length character string value when its original size is less than the declared width of the column. Blank padding also occurs when fixed-length character strings are compared using any string comparison predicate. For storage, here is the relevant quote from Section 9.2 (Store assignment), General rule 2(b)(iii), that defines the behaviour for storing value V in column T:

If the declared type of T is fixed-length character string with length in characters L and the length in characters M of V is less than L, then the first M characters of T are set to V and the last M characters of T are set to spaces.

SQL Anywhere, however, does not blank-pad fixed-length character strings. In a SQL Anywhere database, every string is stored as if the column is aVARCHARtype. This means that all blanks in a string value (trailing or otherwise) are treated as significant characters; hence the value 'a ' (the character 'a' followed by a blank) is not equivalent to the single-character string 'a'. Inequality comparisons also treat a blank as any other character in the collation.

SQL Anywhere offers the ability to mimic ANSI SQL character-string comparison semantics with the "blank padding" option, which can be specified with either the dbinit utility or the CREATE DATABASE statement. With the blank-padding option enabled, trailing blanks in a string are ignored when being compared. Ignoring trailing blanks has equivalent semantics to blank-padding for equality and inequality ("!=") operations. However, this behaviour is not identical to blank-padding semantics for other comparison operators such as less than ("<")

LIKE Semantics With and Without Blank Padding

The semantics of a LIKE pattern in SQL Anywhere does not change if the database is blank-padded because matching the expression to the pattern involves a character-by-character (or code point by code point, in the case of UTF8 databases) comparison in a left-to-right fashion. No additional blank padding (or truncation) is performed on the value of either expression or pattern during the computation. Therefore, the expression "a1" matches the pattern "a1", but not the patterns "a1 " ("a1" with a trailing blank) or "a1_". These semantic differences occur whenever the expression or the pattern contain trailing spaces, and, as we shall see, also illustrate differences in other relational DBMS products, in virtually all cases due to the retention of legacy behaviour.

LIKE examples

To illustrate - my thanks to John Smirnios for the following analysis - we define a table T consisting of four string columns as follows:

CREATE TABLE T( a CHAR(1), b CHAR(2), c CHAR(3), d VARCHAR(10) );

and into table T we insert two rows, the first row with a single character 'a' in each column, and the second row with the value 'a ' (an 'a' followed by a blank) in each column, as follows:

INSERT INTO T VALUES( 'a',  'a',  'a',  'a' ); 
INSERT INTO T VALUES( 'a ', 'a ', 'a ', 'a ' );

A test of 12 specific test cases across a variety of database systems yielded the following results:

			Rows returned by "Column LIKE Pattern"
Test case	Column	Pattern	SQL Anywhere	Oracle	DB2	ASE	SQL Server	IQ
1	a	'a'	1,2	1,2	1,2	1,2	1,2	1,2
2	b	'a'	1	none	none	1,2	1,2	none
3	c	'a'	1	none	none	1,2	1,2	none
4	d	'a'	1	1	1	1,2	1,2	1,2
5	a	'a '	none	none	none	1,2	none	none
6	b	'a '	2	1,2	1,2	1,2	1,2	1,2
7	c	'a '	2	none	none	1,2	1,2	none
8	d	'a '	2	2	2	1,2	none	1,2
9	a	'a_'	none	none	none	none	none	none
10	b	'a_'	2	1,2	1,2	1,2	1,2	1,2
11	c	'a_'	2	none	none	1,2	1,2	none
12	d	'a_'	2	2	2	none	none	2

Additional notes:

Sybase ASE and Microsoft SQL Server 2005 always strip all trailing blanks from the end of VARCHAR values. In these two systems, it is impossible to store a blank at the end of a VARCHAR string.
In Sybase IQ, the predicate "d LIKE 'a[ ]'" returns row 2, even though the predicate "d LIKE 'a '" returns both rows 1 and 2.
In Sybase ASE, "d LIKE 'a[ ]'" returns no rows even though the predicate "d LIKE 'a '" returns both rows 1 and 2.

Discussion

SQL Anywhere treats all strings as VARCHAR, even in a blank-padded database. For VARCHAR strings, SQL Anywhere's behaviour matches DB2 and Oracle and the ANSI/ISO SQL standard (test cases 4, 8, and 12).

DB2 and Oracle have identical semantics. Fixed-width character columns are effectively always padded to their maximum length and the string's length is always equal to the maximum width of the column. The end of the string must match the end of the pattern. VARCHAR fields retain any trailing blanks that were inserted by the user; blanks are never added to or stripped from VARCHAR fields.

Sybase ASE appears to strip trailing blanks from the pattern string, but it does not strip 'equivalent to blank' expressions (see note 3). However, ASE will retain a single trailing blank in the case of a pattern ending in '%' followed by one or more blanks; this specific exception is documented in the ASE SQL User's Guide. ASE also effectively strips trailing blanks from the match value and then re-pads CHAR columns with enough blanks to match the pattern (but not enough to exceed the width of the column). For VARCHAR match values, blanks are pre-stripped (see note 1) and blanks are never added to allow a match to occur. A pattern ending with an equivalent-to-blank ('[ ]') will never match a VARCHAR string.

Microsoft SQL Server 2005 does not strip trailing blanks from the pattern. However, like ASE, SQL Server appears to strip trailing blanks from the match value and then re-pad CHAR columns with enough blanks to match the pattern but not enough to exceed the width of the column. Blanks are never appended to a VARCHAR to allow a match to occur.

In another post, I'll attempt to outline the differences in semantics with trailing blanks and empty strings with client-server protocols.

My sincere thanks to my colleague John Smirnios for his thorough analysis of LIKE semantics with SQL Anywhere and these other database management systems.

In this post, originally written by Glenn Paulley and posted to sybase.com in May of 2009, Glenn talks about thread deadlock in SQL Anywhere, and how application design can cause the problem. Note that more recent versions (16 and later) of SQL Anywhere dyanamically adjust the mutli-programming level so there is no longer a need to set the -gn server option in most situations. This can also mask the poor design that Glenn talks about below, but it cannot .

Thread deadlock is a specific error (SQLCODE -307, SQLSTATE '40W06') that a SQL Anywhere server will return for a specific request. In this post I want to document why and how thread deadlock can occur, and the mechanisms that can be used to diagnose the problem.

SQL Anywhere's threading architecture

Like other database management systems, SQL Anywhere implements its own threading architecture rather than rely solely on the threading model of the underlying operating system. Because SQL Anywhere supports a wide variety of OS and hardware platforms - Windows, Linux, Windows CE, Sun OS, AIX, HP-UX, and Mac OS/X to name a few - SQL Anywhere will utilize "lightweight" threads (often called fibers) on the operating systems (ie. Windows, Linux) that support them, and regular OS threads on those OS platforms that do not.

Moreover, in SQL Anywhere the server does not dedicate a thread (fiber) to a specific connection. Rather, a fixed-size pool of server threads are assigned dynamically to tasks as they enter the server for execution. Often a task is an SQL statement from an application or stored procedure, but there are many different types of tasks that a thread can service. Once a task is scheduled on a thread (fiber), that thread is assigned to process that task until the task completes or is cancelled.

By default, SQL Anywhere creates 20 threads when the server is started (3 on Windows CE). This default can be changed by using the -gn command line switch. In effect, the number of threads determines the server's multiprogramming level - the maximum number of tasks that can be active at any one time. Server threads are independent of the number of connections made to any database on that server. Hence a given thread (fiber) can first service a task for one database, and subsequently service a task for a connection to a different database.

Thread deadlock - the condition

Threads in the SQL Anywhere server service tasks, which ordinarily are database requests such as PREPARE, DESCRIBE, OPEN, FETCH. Often these tasks can be serviced very quickly; sometimes they take considerably longer, such as OPENing an INSENSITIVE cursor over a large result set. At any one point, the thread servicing that task may be executing a query access plan operator, marshalling result expressions into output buffers, waiting for an I/O operation to complete, or it may be blocked on a shared resource: for example, a schema lock or a row lock.

Given a multiprogramming level of n, thread deadlock is the situation where n-1 threads (fibers) are servicing active tasks but are blocked, and the nth thread (fiber), also servicing an active task, is about to block. The server must prevent all threads (fibers) from being blocked since this would result in a "hung" engine - no threads would be available to perform any work since all are blocked, no new connections could be handled and all new tasks would be queued.

This situation differs from "true" deadlock in the sense that in "true" deadlock two or more threads complete a cycle of dependencies such that none of the threads (fibers) can continue. With thread deadlock, however, it is possible for completely unrelated SQL requests to be blocked, each tying up a server thread (fiber), so that if the nth thread attempts to block the SQL request will receive the -307 error. Recall that the set of threads (fibers) in the server service all SQL requests, even for those connections connected to different databases - so thread deadlock can occur due to the combined workload of each of the databases.

Busy servers that service tens or hundreds of connections may experience thread deadlock in cases where many requests are long-running, either due to the size of the database or due to blocking. In this case, an appropriate remedy is to increase the server's multiprogramming level by restarting the server with a higher value for the -gn command line switch.

All too often, however, application systems can experience thread deadlock because of excessive or unintentional contention due to application design. In these cases, scaling the application to larger and larger datasets or numbers of connections exacerbates the problem. Moreover, increasing the multiprogramming level to higher values rarely provides relief.

How to incur thread deadlock

To illustrate how to (easily) obtain instances of thread deadlock, I'll use a simple multi-client example where each client periodically inserts a row of "sensor data". The "sensor data" will be stored in the following table:

CREATE TABLE sensor_data (  sensor_id BIGINT NOT NULL DEFAULT AUTOINCREMENT PRIMARY KEY,  sensor_number INTEGER,  sensor_data  VARCHAR(50) NULL 
)

In addition, each time a client inserts a row of sensor data, the client will update a summary record that contains the total number of inserted records for that sensor. The summary data table is as follows:

CREATE TABLE summary_data (  sensor_number INTEGER PRIMARY KEY,  sensor_count  BIGINT 
)

The logic for each client connection is embodied in the following stored procedure:

CREATE OR REPLACE PROCEDURE INSERT_SENSOR_DATA()  BEGIN      DECLARE sensor_ident INTEGER;      SET sensor_ident = MOD( 1000.0 * RAND(), 10 );      INSERT INTO sensor_data VALUES( DEFAULT, sensor_ident, 'This is a test.' );      IF EXISTS ( SELECT * FROM summary_data WHERE summary_data.sensor_number = sensor_ident ) THEN        UPDATE summary_data SET sensor_count =                sensor_count + 1 WHERE summary_data.sensor_number = sensor_ident      ELSE INSERT INTO summary_data VALUES ( sensor_ident, 1 )      END IF;  END

The logic in the above procedure is straightforward. The first step (line 14) is to insert the row of sensor data to the sensor_data table. The second step is to modify the summary table. A complication is to determine if the summary row for the sensor in question is extant; if so, the count for that row is incremented (line 17), otherwise a new row is inserted (line 19).

Warning: while the above code is straightforward, it is also wrong. The procedure's logic as written contains a race condition and will cause frequent deadlocks and/or incorrect results, depending on the isolation level being used. These details are not important for the thread deadlock case I'm trying to illustrate.

Setup for this example also requires the setup of the tables utilized by the TRANTEST utility to track request response times, by executing the trantabs.sql script in the samples/trantest directory. Setup also requires the following:

TRUNCATE TABLE summary_data;
TRUNCATE TABLE sensor_data;
SET OPTION "DBA".LOG_DEADLOCKS = "ON";
CALL SA_SERVER_OPTION( 'RememberLastStatement', 'Yes' );

We use the performance analysis utility TRANTEST from the SQL Anywhere samples to execute the procedure above from multiple client connections simultaneously. Here's the TRANTEST command line:

TRANTEST -a ESQL -c "uid=dba;pwd=sql" -f insert_sensor_data.sql -i 2 -k 5 -l 15 -m 0 -n 25 -o results.txt -w 0

In summary, TRANTEST will create 25 ESQL connections that will continuously call the script in "insert_sensor_data.sql" with zero think time at isolation level 2 for a total elapsed time of 15 seconds, issuing a COMMIT every 5 transactions. The "insert_sensor_data.sql" file contains the single line

CALL INSERT_SENSOR_DATA()

I chose 25 clients because I'm running an 11.0.1 server with the default multiprogramming level of 20.

Problem determination of thread deadlock

There are two ways one can determine if thread deadlock has occurred, and the set of connections and SQL requests that were involved. The first is using the built-in diagnostic procedure sa_report_deadlocks(), which is enabled via the LOG_DEADLOCKS option as documented above. Here is a portion of the result after executing the above example with TRANTEST:

Line 54-67 is the DBISQL window illustrate thread deadlock, where the first row (line 54) is the "victim" (the CALL statement was executing on the last non-blocked thread). The rows following indicate the status of other connections; sure enough, each of these is blocked while executing the INSERT_SENSOR_DATA() procedure. The rows returned by sa_report_deadlocks() details both the table (object 3358, the summary_data table, from the SYSOBJECTS catalog table) and the row identifier of the row in summary_data causing the block.

The reason behind the contention is straightfoward: because the procedure attempts to both read (line 16 in the procedure) and modify (line 17) rows in the summary_data table, multiple clients will block on each other. With more clients than available threads, batched COMMITs, and zero think time, thread deadlock is inevitable. Increasing -gn to a value higher than the number of clients will prevent occurrences of thread deadlock, but won't solve the underlying problem, which is serialization of the execution of the INSERT_SENSOR_DATA() procedure.

A second mechanism to discover the existence of thread deadlock is through SQL Anywhere's Application Profiling capabilities, available through Sybase Central. Starting the Application Profiling wizard, followed by executing TRANTEST, yields a tracing database that documents the execution of each SQL statement issued by any connection. Here is the summary page for the test, as displayed by Sybase Central:

2009/09/thread_deadlock_summary.png

Note the summary times for the UPDATE summary_data statement: a total time of 2910 milliseconds for 9300-odd statement invocations, but a maximum time of 213 milliseconds - a sure sign of excessive blocking. If one switches to the Details pane, the occurrences of thread deadlock become obvious:

From this detailed view, one can analyze the concurrently-executing statements at the point of each occurrence of thread deadlock to determine what each thread in the server was executing at that time, which will, of course, again point to the badly-written INSERT_SENSOR_DATA() stored procedure - and the UPDATE statement on the summary_data table in particular.

In this post, originally written by Glenn Paulley and posted to sybase.com in September of 2009, Glenn discusses using the PRIORITY option in SQL Anywhere.

SQL Anywhere version 11 deprecated the BACKGROUND_PRIORITY connection option in favour of a new connection option, PRIORITY.

The PRIORITY connection option establishes the processing priority of any SQL request for this connection. The default is "Normal"; other potential values are Critical, High, Above Normal, Below Normal, Low, and Background. When SQL requests are queued for service, the server will process the queue in priority order. Setting the priority option to different values for different connections (or users) permits the categorization of service levels across the entire server's workload.

While individual users can set their own PRIORITY setting, they cannot set their connection's PRIORITY to be greater than the value of the MAX_PRIORITY option. The default setting for the MAX_PRIORITY option is also "Normal"; it's value can be altered only by a user with DBA privileges. Altering the setting is straightforward via the SET OPTION statement:

SET EXISTING OPTION PRIORITY = 'Low'

As with other server, database, and connection-level options, the values of PRIORITY and MAX_PRIORITY can be queried through various means, including the sa_conn_properties() procedure and the CONNECTION_PROPERTY() function, as follows:

In this post, originally written by Glenn Paulley and posted to sybase.com in September of 2009, Glenn discusses how options are used in SQL Anywhere, including the ability to create and set your own custom options, and the ability to monitor and prevent users from changing options.

In SQL Anywhere, server, database, and connection-level options provide application control of various behaviours that can affect server operation and/or application-visible semantics. "Options" as implemented in SQL Anywhere are not part of the ISO SQL Standard, though the concept of support for "global variables" has been discussed within the SQL standards process. SQL Anywhere options implement "global variables" in conjunction with separate mechanisms to update and retrieve their values.

SQL Anywhere server options are persistently stored in the system catalog and can be viewed using the SYSOPTION view:

Server-defined options are typed: some are binary settings (ON vs. OFF), some are integers, some are specific character string settings, and some are free-form strings. The "magic" of options is that each system option has a default value - the PUBLIC setting - which for many options can be overridden, either permanently or temporarily, by a user or even a specific connection. In the figure above, note that each option and its setting belong to a specific user; in the DEMO database, user_id "2" is the user PUBLIC. Specific option settings for other users are also saved in the catalog with their respective user_ids. These settings "override" the PUBLIC setting for that user; if a user setting does not exist, the PUBLIC setting is used as a default.

For example, by default the PUBLIC setting for the BLOCKING option is ON, meaning that SQL requests will block when conflicting on a row lock. If one desires that SQL statements should fail with an error, rather than block, one can set the BLOCKING option to OFF. Changing the option setting is done through the SET OPTION statement as follows:

SET OPTION BLOCKING = 'OFF';

which changes the BLOCKING option setting for this user and this modification is made to the database catalog. As explained in the SQL Anywhere help, there are timing dependencies to some of the options:

Changes to option settings take place at different times, depending on the option. Changing a global option such as recovery_time takes place the next time the database is started. Generally, only options that affect the current connection take place immediately. You can change option settings in the middle of a transaction, for example. One exception to this is that changing options when a cursor is open can lead to unreliable results. For example, changing date_format may not change the format for the next row when a cursor is opened. Depending on the way the cursor is being retrieved, it may take several rows before the change works its way to the user.

Changing a connection option, such as BLOCKING, typically affects new connections that connect with that userid, and of course the connection that initiated the change, but it does not affect other existing connections. If desired, one can set the option temporarily for the current connection only; this setting is retained in memory and is not made persistent to the database catalog. Changing an option temporarily is done using the TEMPORARY keyword:

SET TEMPORARY OPTION BLOCKING = 'OFF';

Since this temporary setting is not made persistent, one must query its value through other means. In DBISQL, one can use the SET DBISQL command to display the current values (temporary or otherwise) of all of the server-defined options for this connection. In the figure below, note that the value of the BLOCKING option is now set to OFF:

If not using DBISQL, one can query these values using the system procedure sa_conn_properties(), which, in addition to returning option settings for all server-defined connection options, also returns a variety of connection-related counters that can be used for diagnostic purposes:

Connection properties - including options - can be queried individually using the CONNECTION_PROPERTY() function. Note that SQL Anywhere classifies server-defined options into three categories: connection, database, and server. These properties are queried using the CONNECTION_PROPERTY(), DB_PROPERTY(), and PROPERTY() functions respectively. For options, these property functions return their current value, even if set temporarily. Temporarily setting a PUBLIC option, ie.

SET TEMPORARY OPTION PUBLIC.BLOCKING = 'OFF'

alters its value until the database is shutdown, and all new connections will inherit this setting (unless it is overridden for that user). When the database is restarted, the persistent value from the SYSOPTION view will once again be used as the default.

User-defined options

Options in SQL Anywhere are a generic mechanism; while system-defined options are by far more common, any application can establish a user-defined option to store "global variables". To establish a user-defined option, a user with DBA authority must issue a SET OPTION statement to set the value of the option for the PUBLIC user.

For example, the following statement creates the user-defined option "foo":

SET OPTION PUBLIC.FOO = 'ON';

and option "foo" is now stored persistently in the catalog. Once an option has a PUBLIC setting, we're good to go. If, as user DBA, we decide to set our own value for "foo", we can override the PUBLIC setting with:

SET OPTION FOO = 'OFF';

and now both values are visible in the SYSOPTION view (userid "1" is DBA):

User-defined option values are VARCHAR strings - they are not typed in the same manner as system-defined options. The maximum length of a user-defined option depends on the database page size; for 4K pages (the default), the maximum length is 3468 bytes. Unfortunately, user-defined options are not returned by sa_conn_properties() nor can they be queried via the connection_property() function. For most interfaces, the only means to query their value is to execute an SQL query against the SYSOPTION catalog view. However, if your application uses embedded SQL, one can utilize embedded SQL's GET OPTION statement to retrieve the option value for any option, including user-defined options.

Restricting (or reporting on) option settings

Finally, I'd like to point out a feature first added to the 10.0.1 SQL Anywhere release, intended to aid in migration from one SQL Anywhere version to another. That feature is the ability to add "watches" to detect and deny (or report upon) alterations to option settings. This is accomplished through two parameters to the sa_server_option() system procedure: OptionWatchList and OptionWatchAction. Specifying OptionWatchList permits one to specify which options (server- or user-defined) to monitor. OptionWatchAction indicates what action is to be performed: either MESSAGE, which causes a message to appear in the server's console window, or ERROR, which will cause the SET OPTION statement to return an error rather than modify the option value.

As an example, suppose after creating the user-defined option "foo" we want to prohibit any changes to its value. We first set up the watch as follows:

CALL sa_server_option( 'OptionWatchList', 'foo' );

and then specify the desired action:

CALL sa_server_option( 'OptionWatchAction', 'ERROR' );

With this watch in place, any attempt to modify the option results in an error:

In this post, originally written by Glenn Paulley and posted to sybase.com in March of 2009, Glenn talks about the perils of simplistic benchmarks and focuses specifically on measurement bias.

In the past few weeks I've witnessed a number of published performance analyses, both with and without SQL Anywhere. By and large these "benchmarks" have been exceedingly simplistic, which is unsurprising since a simple benchmark requires significantly less development effort than a complex one.

Performance analyses I see frequently, for example, involve (simply) inserting a number of rows into a table as quickly as possible. Knowing that value is a Good Thing (TM) - and in the lab we have specific tests to determine this (and other) values. However, for performance analysis of database applications, the two points I would make are:

All too often such simplistic tests are not very representative of application behaviour, and so are relatively meaningless; and
Fine-grained tests of simple operations can be subject to a wide variety of performance factors, where even minute differences in efficiency can skew the test results significantly.

With respect to the latter, here is a quote from Chapter 2 of Raj Jain's performance analysis book [1] entitled "Common Mistakes and How to Avoid Them":

It is important to understand the randomness of various systems and workload parameters that affect the performance. Some of these parameters are better understood than others. For example, an analyst may know the distribution for page references in a computer system but have no idea of the distribution of disk references. In such a case, a common mistake would be to use the page reference distribution as a factor but ignore disk reference distribution even though the disk may be the bottleneck and may have more influence on performance than the page references. The choice of factors should be based on their relevance and not on the analyst's knowledge of the factors.

The impact of implicit or explicit bias on experimental setups cannot be underestimated. Indeed, a recent paper[2] illustrates that simple compiler benchmarks, such as the SPEC CPU2006 benchmark suite, are not diverse enough to avoid the problems of what the authors call "measurement bias". Here is the paper's abstract:

This paper presents a surprising result: changing a seemingly innocuous aspect of an experimental setup can cause a systems researcher to draw wrong conclusions from an experiment. What appears to be an innocuous aspect in the experimental setup may in fact introduce a significant bias in an evaluation. This phenomenon is called measurement bias in the natural and social sciences. Our results demonstrate that measurement bias is significant and commonplace in computer systems evaluation. By significant, we mean that measurement bias can lead to a performance analysis that either over-states an effect or even yields an incorrect conclusion. By commonplace we mean that measurement bias occurs in all architectures that we tried (Pentium 4, Core 2, and m5 03CPU), both compilers that we tried (gcc and Intel's C compiler), and most of the SPEC CPU2006 C programs. Thus, we cannot ignore measurement bias. Nevertheless, in a literature survey of 133 recent papers from ASPLOS, PACT, PLDI, and CGO, we determined that none of the papers with experimental results adequately consider measurement bias.

Mytkowicz et al.'s results show that the measurement bias caused by environmental factors in the test, such as

(a) the amount of memory needed for environment variables on the machine, which can affect the alignment of the program stack, and

(b) the link order of the compiled objects within the final executable, which can impact the cache-line alignment of "hot" loops, trumps the performance factor to be measured, which in their case is the effectiveness of O3 compiler optimizations.

Their results show that modifying the experimental setup can itself yield a performance speedup of 0.8 to 1.1 - that is, their test can experience a 20% slowdown, or a 10% speedup, depending on these unmeasured factors.

The authors offer three suggestions for avoiding or detecting measurement bias, which in my view are equally applicable to benchmarks of database applications. They are:

Utilizing a larger benchmark suite, sufficiently diverse to factor out measurement bias.
Generating a large number of experimental setups, varying parameters known to cause measurement bias, and analyzing the results using statistical methods; and
Using causal analysis (intervene, measure, confirm) to gain confidence that the conclusions being drawn are valid, even in the face of measurement bias.

My thanks to colleague Nathan Auch for bringing [2] to my attention.

[1] Raj Jain (1991). The Art of Computer Systems Performance Evaluation, John Wiley and Sons, New York. ISBN 0-471-50336-3.

[2] Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney (March 2009). Producing Wrong Data Without Doing Anything Obviously Wrong! In Proceedings, 14th International Conference on Architectural Support for Programming Languages and Operating Systems, Washington, DC, pp. 265-276.

In this post, originally written by Glenn Paulley and posted to sybase.com in September of 2009, Glenn talks about the two different types of materialized views and the questions to think about when considering using materialized views.

Starting with Version 10, SQL Anywhere supports deferred-maintenance materialized views; version 11 introduced support for immediately-maintained materialized views. The major differences between the two are:

With deferred-maintenance materialized views, the query optimizer may answer queries utilizing one or more materialized views that contain stale data. The "staleness" of any view, and whether or not the view can be used in query answering, is entirely under DBA control. However, deferred-maintenance views permit one to tradeoff data accuracy with the performance gains offered by the materialized view, and the update maintenance cost for that view.
Conversely, immediately-maintained materialized view are updated within the same transaction as the base-table updates upon which the materialized view is defined. Immediately-maintained views offer a derived, up-to-the-minute copy of the view's underlying base tables, at the expense of requiring view maintenance with each update operation.

In summary, deferred-maintenance views permit the amortization of maintenance costs for the materialized view. In contrast, immediately-maintained materialized views require each update transaction to incur the overhead of view maintenance, which may result in contention between concurrently-executing transactions.

The deferred-or-immediate materialized view decision is but one of the factors to consider by a database administrator when deciding whether or not to use a materialized view - in the literature this question is known as the view selection problem. However, there are several considerations other than the deferred-or-immediate maintenance tradeoff. Here is a checklist of questions to consider when deciding upon the utility of a materialized view:

What is the set of queries that can benefit from creation of materialized views?

Answering this question involves an analysis of the query workload for the system, including detailed consideration of both the definition and frequency of individual queries. A good starting point is to begin with frequently executed and expensive queries, particularly those expensive queries with critical response time requirements. SQL Anywhere's Application Profiling capability, contained in the Sybase Central SQL Anywhere plug-in, is an excellent way to capture your application's workload and determine the "heavy hitters" contained within it.

Materialized views that can benefit multiple queries in common represent the most significant gains, because the storage and maintenance costs for the view are constant but the benefits of the materialized view increase with usage. Also, remember that a single query can make use of multiple materialized views. Splitting a complex materialized view into multiple views may permit the optimizer to utilize the materialized view to assist a larger set of queries. If considering a materialized view involving aggregation (GROUP BY), it is often better to materialize basic functions that will permit a wider applicability to multiple queries; for example, AVG() can be obtained from a combination of SUM() and COUNT(*). SQL Anywhere's query optimizer is intelligent enough to utilize SUM() and COUNT() from a materialized view when the original query contains AVG().
Does potential improvement in query performance outweigh the storage and maintenance costs of materialized views?

One must tradeoff the potential improvements in query performance with the space requirements for materialized views - and their indexes - and the maintenance costs for the view. Here, one must be aware of the update patterns from application requests; materialized views on heavily-updated base tables may have unacceptable maintenance costs, for two reasons: the cost of the updates to the materialized views themselves, and the increase in lock contention amongst update transactions from concurrent updates to the table (or index) containing the materialized view. This latter problem is difficult to assess without proper capacity planning.

DBAs often fail to realize that materialized views can be indexed, just like any other base table. Indexes are particularly useful when the application query contains additional joins to tables that are not included in the view; if indexes exist, the optimizer has more physical operator choices - particularly indexed nested-loop join - that can result in significant speed improvements.
Can the same query be allowed to return different results, if the optimizer chooses to utilize stale data from a materialized view in one case, and chooses to process the underlying (and up-to-date) base tables in another?
Can stored data for materialized views be allowed to become stale?
How stale can the data become before it is unacceptable?

These latter questions pertain to the tradeoffs of immediate versus deferred-maintenance materialized views. As described above, deferred-maintenance permits one to amortize view maintenance across multiple update transactions, at the expense of data staleness. Whether or not your application can benefit from deferred-maintenance views is primarily a business question, not a systems one.

In a later post, I've present some examples of using materialized views. My thanks to colleague Anil Goel for providing much of the detail in this article.