KnowDotNet Visual Organizer

Efficiently Using ADO.NET - Part I

Using the Sort

by William Ryan
Print this Article Discuss in Forums

If you've read my other articles on fitlering data with ADO.NET 's DataView object, you know that ADO.NET gives you some really powerful filtering capabilities.  If you haven't, I'd suggest you read up on Sorting or Filtering first.

I'm not going to spend much time convincing you why you should use a DataView as opposed to requerying the database but let me sum up the argument. When you use DataAdapter.Fill to populate a DataSet or a DataTable you have effectively created an In-Memory copy of the data your query returned.  You can also create a DataTable or DataSet from straight code without querying a backend DB but there's absolutely No difference whatsoever between a DataTable/DataSet you whip up from code and one that's built for you when you call the DataAdapter's Fill method.  (I've been in many arguments with people about this fact because they claim they've observed different behavior in Datatables/Datasets they create with code and ones pulled back from a database.  In Part II of this article, I'll prove my contention once and for  all).  However, if you create a DataTable/DataSet via a query, the performance implications are much different because you are consuming network and database resources.  So inefficient practices in this regard are much more costly.  And if you have the Data cached locally, and you also have the ability to Sort and filter it locally, there is NO reason to go back to the database just to change the sort order or filter.  The only reason you'd ever want to go back to the database for this type of operation is if the data has changed and you need the new data and you need it sorted.  However, this isn't what I'm discussing.  I'm discussing specifically making a trip to the DB just to change a sort or filter.

So let's walk through a typical example of how an ADO.NET newbie might approach a situation and compare it to how the same goal could be achieved by using the DataView correctly.

Let's say that we have a ComboBox control that is populate with an employees empolyee number, first name and lastname so each record looks something like:

1 - Ryan, William
2 - Bush, George W.
3 - Cheney, Richard

To support this, we have a DataTable called dtEmployees which has three fields, EmployeeNum, Last_Name, First_Name, Hire_Date  and Department.  We could take the typical newbie approach to doing this and constructing a SQL Statement that looked something like:

SELECT CAST(EmployeeNum AS VARCHAR) + ' - ' + Last_Name
    + ', ' + First_Name FROM tblEmployees


Then we could just iterate through the table and add each record to the ComboBox.  While this approach works, it has a very glaring problem and that's that it won't bend very much.  Why?  Well, let's say that you built the query and that EmployeeNum is the primary key.  As such, the default sort order will be on the number 1.  Well,  it's not hard to imagine that different people in different departments view different things and even if they view the same things, they view them differently.  So someone in HR wants to be able to view everything sorted by Hire_Date.  You can rewrite your original query to include an Order by Hire_Date clause, but not everyone wants this ordering.  So now you have to add a Radio button or Context Menu and give them the option to sort by Hire_Date.  If you are using the concatenated field method above, you now have to make a trip back to the database each time someone wants to change the sort order.  This may work fine for two users, but do you really want to put extra stress on your database and network just to change the sort order? Moreoever, if you make a trip back to the db, you'll need another connection object, another command object, another dataadapter and another datatable/dataset.  With this in mind, every time the user changes the sort order or wants something filtered, you are creating another four objects unnecessarily.  So if the user sorted 7 times on average and tried filtering on say two conditions, a total of 36 objects would be created to accomplish what the same thing that could be done with 4!  The really ugly part is that this scenario assumes only one user which is rarely the user base for an enterprise app.  Remember this the next time someone tries telling you that running back to the db to change sort order doesn't 'matter' b/c you have a 100mb network and a powerful server.

Now that we know what's wrong with the 'old' way, how do we do things correctly?  In this scenario, the first thing we are going to do is build an Expression Column for our data table.  (I'm not going to walk through the declaration of the DataAdapter and fill, for the time being, let's assume we have a DataTable called dtEmployees that's been filled with all of the data from out Employees table in the database).  The expression column is going to be of Type String.  Its name will be 'FullName' and it's Expression is going to be the same as the formula referenced above in the SQL statement [EmployeeNum + ' - ' + Last_Name + ', ' + First_Name]. Then we'll set the DataSource property to dtEmployees, the ValueMember property to EmployeeNum and the DisplayMember property to 'FullName' which is the name of the expression column.  When we reference the ValueMember property of the combobox, the EmployeeNum which corresponds to that particular record will be returned.  We can reference the EmployeeNum, Last_Name, First_Name combination by referencing the SelectedText property.  Hence, if the user clicked on the first item in the combobox, the
SelectedValue would be 1 but the SelectedItem would be "1 - Ryan, William" without the quotes.

Here's the actual code used to accomplish this:

DataColumn dcFullName = new DataColumn();
//dvEmployees was declared as a Property of the class
dvEmployees = dtEmployees.DefaultView;
dcFullName.DataType = System.Type.GetType("System.String");
dcFullName.ColumnName = "FullName";
dcFullName.Expression = "EmployeeNum + ' - ' + Last_Name + ', ' + First_Name";
dtEmployees.Columns.Add(dcFullName);
cboEmployeeInfo.DataSource = dtEmployees;
cboEmployeeInfo.DisplayMember = "FullName";
cboEmployeeInfo.ValueMember = "EmployeeNum";


At this point you may be asking, what does DataBinding and Expression Columns have to do with Filtering data and saving trips to the db.  The answer is 'A lot'.   This is the necessary ground work to allow us to build a flexible solution to solve the problem mentioned at the beginning of the article.  I mentioned that we have 5 possible fields that different people may want to sort on. So the next step is to provide a UI interface for the user to specify sort order.  There are many ways to do this but I chose Radio buttons for simplicity.  Anyway, I named each radio button opt + Field Name.  I wired in an event handler for the CheckedChanged event for each button.  Each time the CheckedState changes, I check whether or not the control is now checked.  If it is, I set the RowFilter to that field.

private void optHireDate_CheckedChanged(object sender, System.EventArgs e)
{
  
if(optHireDate.Checked){dvEmployees.Sort = "Hire_Date";}
}

private void optLastName_CheckedChanged(object sender, System.EventArgs e)
{
  
if(optLastName.Checked){dvEmployees.Sort = "Last_Name";}
}

private void optFirstName_CheckedChanged(object sender, System.EventArgs e)
{
  
if(optFirstName.Checked){dvEmployees.Sort = "First_Name";}
}

private void optFullName_CheckedChanged(object sender, System.EventArgs e)
{
  
if(optFullName.Checked){dvEmployees.Sort = "Full_Name";}
}

private void optHireDate_CheckedChanged(object sender, System.EventArgs e)
{
  
if(optHireDate.Checked){dvEmployees.Sort = "Hire_Date";}
}


This is really the simplest way to accomplish this task.  I could put in some additionally functionality to specify the direction of the sort.  Let's say that I have a CheckBox with the text "Descending" and the assumption is that the default sort direction will be ascending.  In this case, I'd want to modify my code a little bit so I could use a if/else structure for my sort direction.(Use the same logic for each RadioButton):

private void optLastName_CheckedChanged(object sender, System.EventArgs e)
{
      //If the radio button isn't checked, no need to go any further
  
if(!optLastName.Checked) {return;}
      
//It's checked, so now let's see if they specified descending
  
if(chkDescending.Checked)
   {
       dvEmployees.Sort = "Last_Name DESC";
   }
  
else
  {
       dvEmployees.Sort = "Last_Name ASC";
   }
}


As you can see, to change the sort order it's as easy as adding an "ASC" or "DESC" after the column name.  Compare this to making a whole trip to the database just to change the sort order and it's efficiency and elegance become quite clear.

Another thing that can be done is specifying multiple conditions for the sort.  Using the Radio buttons scenario that I've been using doesn't lend itself well to this approach because there's no logical ordinality among most controls, and you can only select one radio button at a time.  But even if you used CheckBoxes, you'd still be in the same boat because of the ordinality...if you checked FirstName and LastName, which one would come first?  There are multiple solution you could come up with to solve this problem but the goal here isn't to discuss UI design or algorithms.  The point is that you can sort on multiple fields simply by putting a comma in between them.  For instance, if we wanted to sort by Hire_Date, then Department, then last_Name, all we'd need to do is set the sort like this:

dvEmployees.Sort = "Hire_Date, Department, Last_Name";


And of course you can specify a direction if you so desired

dvEmployees.Sort = "Hire_Date, Department, Last_Name DESC";


The ultimate point that I'm trying to make is that using ADO.NET correctly makes data manipulation faster, simpler, and more efficient.  If we used an expression column instead of server side SQL Concatenation, we could easily sort on every single field in our datatable, in either ascending or descending order and we could do this without ever making a trip back to the database.  In a single user environment, you probably won't realize the superiority of this approach, but as things grow, be prepared for sluggish performance and connection timeouts if you insist on making unnecessary trips to the database.

Enough about sorting, let's spend some time on Filtering.  Now, a discussion on advanced filtering can get pretty in depth so I'm going to address that in Part II of this article.  But I'd like to start the discussion by talking about RowStateFiltering.  

As you know, a DataTable is a cached representation of your query results and a Dataview is simply a Virtual representation of that table. DataViews give us a lot of functionality but much of its functionality is misunderstood.  One of the more important features is RowState. Once you pull over your data into a DataSet/DataTable, you can essentially do four things to a given row:

1)  Modify existing records
2)  Delete existing records
3)  Insert new records
4)  Remove existing records


These are pretty much self explanatory but two things probably need some clarification.  When I say "Modify" many newbies have pointed out that deleting a record thereby modifies it so these are redundant.  However, this is a semantical difference.  Modify in the sense of RowState has a very precise meaning and that is "changing existing data without deleting it or removing it".  With that in mind, changing the value of FirstName is a row from "William" to "Adam" would cause the row to be marked as Modified, however, deleting the row wouldn't.  This may seem trite but there's a good reason for this.  When you build a DataAdapter, you need to specify a SELECT command at a minimum to get it to work.  If you use a implementation of the CommandBuilder object like a SqlCommandBuilder the DataAdapter's Update, Insert and Delete Commands will be generated for you by the CommandBuilder.  It actually infers this from the select command (assuming you have a primary key, if you don't neither the commandbuilder or the DataAdapter Configuration Wizard will be able to automatically generate commands for you).  So, when you call the DataAdapter's Update Method, it walks through the rows one at a  time and if a row has RowState Modified, then it will call the Update command mapping the fields of the Datarow to the parameters it has built. If a row has  a RowState of Deleted, then it will call the Delete Command mapping the parameters the same way the Update command does.  If a row is marked as Inserted, it will do the same thing with the Insert Command.  (What actually appears in the Where clause of the command depend on the object you are using and the type of concurrency you specify but that's a different issue I'll discuss shortly.)  However, if you call Remove on a row, it no longer exists in the datatable.  It has no  rowstate and as far as the DataTable is concerned, it doesn't exist.  So, you can call remove, change a bunch of data, then call update.  When you look at the Database, you'll notice that the data from the row you 'Removed' will still be there.  However, if your code was correct, the rows you 'Deleted' will no longer be in the database after you call DataAdapter.Update.  This is defintely worth taking note of because scarcely a week goes by where I don't see a  newsgroup post from Someone who 'Removed' some rows from a DataTable and then checked the DataSet.HasChanges property which returns false.  If on other changes were made, this will always be the case because Remove doesn't change the rowstate and rowstate is what the .HasChanges method looks to.

At this point you might be saying "So What?"  Well, there are many situations where it's important to know what's changed.  Before you delete a bunch of records, you may want to pop up a window listing all of the deleted rows and asking the user  to confirm that they want to in fact delete these rows and give them a chance to undo any changes.  Moreoever, there are many situations where a user may make a change and shortly thereafter realize they made  a mistake or have some other reason to undo what they've done.  Without looking to RowState, this would be almost impossible to do.  To examine the state of any given row, you can use the RowStateFilter.  The RowStateFilter property is of Type DataViewRowState.  Since this is an enumeration, you have the support of Intellisense in case you can't remember the values.  A summary is provided below:

DataRowState

Name                                    Value
Added                                      4
CurrentRows                                22
Deleted                                    8
ModifiedCurrent                            16
ModifiedOriginal                           32
None                                       0
OriginaRows                                42
Unchanged                                 2


These are pretty much self explanatory, but a few of them may need some explanation.  Added will show you any rows that have been added.  Rows marked as added will use the DataAdapter.InsertCommand to add the new rows.  CurrentRows will show you Unchanged, Original, and New rows.  Deleted will show you only Rows marked as Deleted.  Note that if you call the Remove Method on a Row, its RowState will NOT be Deleted.  ModifiedCurrent are the rows that have been modified.  ModifiedOriginal is essentially the same, it will show the original rows in their current state, which will show the modified rows if they've been modified.  None will simply place no filter on the rows.  OriginalRows will show you all of the original rows, not any that have been added.  Unchanged will show you all of the rows that haven't been modified or deleted.

So, if you had a grid with some options for the user to see different views of the data, and you wanted to see only the rows that have been added (the ones that will use the DataAdapter.InsertCommand to update the database), this would get you there:

dvEmployees.RowStateFilter = DataViewRowState.Added;


If you wanted to show only the rows that the user has modified you'd use similar logic:

dvEmployees.RowStateFilter = DataViewRowState.Deleted;


Now that you have a collection of rows that have been deleted, if you want to undo any of the chanes, you simply reference the row and then call the .RejectChanges method like this:

dtEmployees.Rows[0].RejectChanges();

Anyway, you have 8 choices that you can offer your users to view a portion of your data based on the state of the given row.  Although I didn't mention is specifically, the Rows that you have returned after setting the filter may appear in more than one scenario.  If you set your RowStateFilter to ModifiedCurrent and then set it to OriginalRows, you'd have many of the same rows, so all of these aren't mutually exlcusive.  I'd encourage you to create a DataView on one of the DataTables you use at work, change some of the values in a grid or other control, and then change the RowStateFilter so you can get a better feel for how these work.  If you don't use this method, the alternatives to find out rowstate are pretty ugly.  You could create your own collection and when you make a change to your datatable, you could add it to the collection along with what you did.  This is cumbersome, would take a fair amount of code and wouldn't give you any functionality you don't have already built in.  You could requery the database filling another datatable and do a comparison from there, but this would be very error prone, particularly in an enterprise situation because you can't assume that the values in the database are the 'real' original values you had because someone could have changed them since you fired your original Select statement.  You could use the DataSet.GetChanges, but this gives you all of the changes so you'd have to walk through it manually to determine the different states.  This would make displaying the data in a UI more cumbersome and it would certainly complicate your update logic.  Wouldn't it be easier to deal with a set of rows all of which will be used with a specific Delete/Insert/Update statement instead of walking through them and trying to figure it out on your own?

Finally to wrap things up, I want to cover one more topic.  Each of the topics I've discussed are setting the groundwork for some more advanced techniques that I'll be addressed in Part II and Part III or this article.

Let's say that we are dealing with a DataTable and you want to loop through the rows and do something.  Basically, you have two ways to do this.  The first is using the numeric index which is the faster way to enumerate it albeit less readable:

for(int i = 0; i < dtEmployees.Rows.Count-1; i++)<BR> {
//Do Whatever Here, if you use this methodology
//you'll have to reference the row and the column and it's
//Highly recommended that you use the Numeric Index for the column
//While the Row index method is also less readable, it's faster.  The foreach
//loop is much more clear in my opinion, but the perfomance hit is negligible
//when you are referencing columns, this performance hit isn't small and the larger
//the table, the more pronounced the difference.  You can use an Enumeration
//Where you map the Column Index to an enumerated name so you get the best of both
//worlds.
}


The next way is using the foreach loop

foreach(DataRow dr in dtEmployees.Rows)
{
   //You can refernce the row
   // like this dr[0] (which will reference column 0
}


Like I mentioned, the Index based reference is faster for both the row and the column.  I personally prefer the foreach syntax just because its' more clear and you eliminate the possibility of an IndexOutOfRangeException (although this isn't very hard to prevent).  However, when referencing columns, I'd HIGHLY encourage you to use either the Index of the Column, or as Bill Vaughn suggests in his superb book.  If you use an Enumeration, you'll get the best of both worlds, the robustness and speed of the Index based lookup, and the readability of the String based reference.  The reason for this is that if you use the name based reference, at each pass of the loop, .NET will have to resolve the name each time you reference a column.  So if you had four columns and 1,00 rows, that's 4,000 lookups that will need resolved vs 0 if you use the enumeration or an index.  On small datatables the performance benefit may be negligible, but on larger sets, particularly on resource challenged machines, the difference can be quite profound.  Even more important is the robustness.  Often, you'll use an Alias for a columnname so that it's formatting is more user friendly.  Now, if you use Index based references, you'd have to physically change the column position in order to break your query.  Practically speaking, this doesn't happen very often because a)  The position of the column doesn't matter from a logic point of view (in fact, it's one of the rules of relational database theory, the location of columns and rows should have no bearing on the operation of the db) b) You'll have to lock your table in order to do this which isn't desirable.  c)  You'll fundamentally change your database by switching columns b/c you'll have to delete the column and then move it somewhere else which isn't very practical.  On the other hand, changing an Alias is a pretty common thing to do.  SQL Server and Oracle don't like spaces in names like First_Name or Last_Name, but what computers like and what humans like are often different and humans tend to prefer names like First Name or Last Name.  If you reference the column name directly, you'll break the query just by taking out the underscore or changing the alias. As a matter of fact, adding an alias where one didn't exist before will break the query too.  This is a lot of downside just to keep a methodolgy that's statistically inclined to break easily.

With that said, enumerating a DataView is much different. There isn't a Rows collection so you can't walk through it to enumarate your dataview.  And since your dataview will have fewer rows than the table its based on in most situations where a filter is applied, you can't use the datatable to walkt through the DataView's rows.  So how do you do it?  The DataView has a really great method known as .GetEnumerator() which returns an IEnumerator object.  Now, in the same vein as the DataReader's Read method.  As you'd expect, while MoveNext evaluates to true, the enumerator will move to the next item in its list.  To reference the current row, you use the IEnumerator.Currrent property.  Since this is simply an interface, it's not going to natively return a DataRowView so we have to cast it.  Other than that, there's not much to it.


IEnumerator viewCounter = dvEmployees.GetEnumerator();
DataRowView drv;
while(viewCounter.MoveNext)
{
    drv = (DataRowView) viewCounter.Current;
}


So to sum things up, I've walked you through using the Sort method so you can sort your data without making a trip back to the database.  I've walked you through specifying the direction and using multiple sort criteria.  From there, we moved through using filtering on RowState so the we can view just about everything in regard to the state of our locally cached data.  Since we've used the DataView object to accomplish all of this, I ended up walking you through using the GetEnumerator method so you could walk through the filtered rows without having to access the underlying datatable.

In the next two articles, we're going to walk through advanced Searching and Filtering and how to set up your datatables to support this functionality.

Writing Add-Ins for Visual Studio .NET
Writing Add-ins for Visual Studio .NET
by Les Smith
Apress Publishing