Welcome to weblogs.com.pk Sign in | Join | Help

Executing Queries on Dataset

This post describes and compares different options to execute queries on dataset.


But first of all, let me explain why it’s required. As Ado.Net has disconnected approach, its quite common that you retrieve the data in dataset from database layer (that might have served it from its cache to save database hit)


Suppose we need to execute few queries on the data residing in the dataset. May be while coding for some report, you need to sum the values in particular column.


Consider the following SQL statement that is being used to populate the dataset. This will give you an idea about the sample data, this post is talking about.


select createddate, cost, duration from calllogs where userid=@UserId


We now need the sum of cost and duration that needs to show in the report. Here are the possible solutions.


1- Execute the query directly to data source


We can have the result by issueing the queries directly to the database. For example;


Select sum(cost) ‘costsum’, sum(duration) ‘durationsum’ from calllogs where userid=@UserId



·         We need access to the data source.

·         If a web developer is coding some user interface (data grid for instance) and he has no knowledge of backend data source, all he has access is to the data layer, that might be maintained by other team member.

·         Having separate queries/procedures for such tasks increase maintainability.


2- Use DataTable.Compute()


As our particular scenario is simple, and few simple queries are already implemented in the DataTable class (DataSet has the collection of DataTable) We can use something like;


object r=this.ds.Tables[0].Compute("Sum(Cost)","");



·         Very few functions are supported (SUM, AVG, COUNT etc) and there might a need of having different query, for example grouping etc.


3- Use GetEnumerator() of Rows collection


As we have the data in the object form, we can get the enumerator and can processes each row in the datatable. As we scan each row, we can implement complex queries. We will be doing something shown below;


System.Collections.IEnumerator _enum=this.ds.Tables[0].Rows.GetEnumerator();

sum+=(decimal) ((System.Data.DataRow) _enum.Current)["Cost"];



·         Lengthy code, In case of implementing some special group function, the code can get complex and harder to read.

·         For each query, we need to write specialized code, and chances are that such code is not useful being its nature.


4- Use XPath


XPath is XML based approach, a latest standard for querying object graph, and as XML is extensively used in Microsoft.NET and DataSet’s XML support is very strong, as it stores the disconnected data internally in XML form. Here is the code that we will be coding using this approach:


System.Xml.XmlDataDocument xDDoc=new System.Xml.XmlDataDocument(this.ds);
System.Xml.XPath.XPathNavigator xpNav=xDDoc.CreateNavigator();
System.Xml.XPath.XPathExpression xE=xpNav.Compile("sum(//cost/text())");
object r=xpNav.Evaluate(xE);



·         Very Slow


Performance Comparison


As we have number of options available, I decided to compare the performances of each approach. I coded the data layers for one my real world application and the tests showed that:


·         Database querying is the fastest

·         DataTable queries are 11 times slower to direct Database querying approach.

·         Enumeration approach is 8 times slower to direct Database querying

·         XPath is 40 times slower to database querying.


Enumeration approach is best suited if you need to execute multiple queries. As using a single iterator and in single go, you can calculate multiple results. And you might out perform direct database querying that might take longer due to multiple database hits.



(Reproduced, originally written back in 2003)

Published Thursday, September 23, 2004 3:43 PM by khurram
Filed under:


No Comments

New Comments to this post are disabled