Chapman's Coding Corridor: LINQ

Showing posts with label LINQ. Show all posts

Sunday, December 9, 2007

Linq to Sql: The Good, The Bad & The Bottom Line

I promised my take on Linq to Sql a few days ago. I have spent some time over the past couple days playing with Linq to Sql connected to the AdventureWorks SQL Server sample database.

I have a lot of experience working with NHibernate so you may see some comparisons throughout the post.

Overview

Most everyone who is likely to read this post probably knows what Linq to Sql is. For those that don't, Linq to Sql (and really Linq in general) has been one of the most talked about (Once we found out Linq to Entities wasn't going to ship with Visual Studio 2008) and hyped features of Visual Studio 2008 and the .NET 3.5 framework.

Linq to Sql is actually a big shift for Microsoft. Linq to Sql is Microsoft's first production quality Object Relational Mapper or O/RM for short. They may have tried in the past with products such as ObjectSpaces, but this is the first tool to be released as a completed tool. O/RM tools exist to try and solve the Object-relational impedance mismatch which basically says that most applications are developed in object oriented programming languages these days yet the data which they operate on is typically stored in a relational database. This process of moving data between the objects and relations and vice versa is described as the impedance mismatch. There are obviously many fundamental differences between data stored in a relation and data stored in our objects.

Traditionally Microsoft has endorsed using DataSets to solve this problem. DataSets are essentially a relation based object in your object oriented programming language. Essentially it would allow you to work with your data in your application as relational data. The problem with this? You fail to take advantage of object oriented application design and the advantages it brings to you. Typically these programs have little testability and a significant amount of duplication. As such many O/RM tools became popular (although far less so than if Microsoft had endorsed them) such as NHibernate, LLBLGen Pro, Vanatec OpenAccess, Wilson ORMapper, EntitySpaces, eXpress Persistent Objects and many others (apologies to any I didn't list).

Note that Linq to Sql isn't necessarily a direct competitor to NHibernate or the other above listed O/RM tools for the .NET framework, that is Linq to Entities (AKA ADO.NET Entity Framework). Linq to Sql is more of an introduction to the O/RM world.

The Good

The Linq query language itself

The Linq query language is just awesome. It really is a joy when you start to work with it. It can quickly become a pain because it is complex, but then it makes you realize just how powerful it is. I have never seen a query language that is quite so rich. Basic queries are very simple to write and understand, yet it also provides functionality for very complex queries.

Plus, the queries are strongly typed so now there is much less to worry about when refactoring your business objects as compile-time checks are now available for your queries. Note that even with stored procedures, if you change a column in a table referenced by a stored procedure, it won't inform you that you just broke a stored procedure. Likewise stored queries in your applications will not inform you if you change a property name or column either.

For fun see the following blog post: Taking LINQ to Objects to Extremes: A fully LINQified RayTracer. This is not something you woudl actually do, but it does help reinforce just how powerful Linq really is.

Better Naming Conventions Than NHibernate

While working with Linq to Sql I felt that the methods on the context were easy to understand and more intuitive than the NHibernate equivalents. For example, when you want to save your changes to your database NHibernate says Flush whereas Linq to Sql uses SubmitChanges. But the big advantages are Linq to Sql's InsertOnSubmit vs NHibernate's Save as well as Attach versus NHibernate's Update or Lock methods.

I can't tell you how many times I've explained how the Save, Update and Lock functionality for NHibernate works. Most people seem to think that they need to call these methods to cause a database operation to take place. They assume Save = Execute Insert NOW, and Update means execute an update NOW! Then they use Flush for good measure because someone told them too. The Linq to Sql naming convention seems to imply that that is not quite what is going on.

Simple to Get Started

It didn't take me very long to get up and going with Linq to Sql. While I'm not the biggest fan of the Object Relational Designer, it sure is easy to use and fast to build basic object graphs. Someone who is not familiar with O/RM tools should be able to have objects mapped to database tables in a matter of minutes. This could work very well for simple RAD applications. This process really couldn't be much simpler.

Superior Optimistic Concurrency Support

My apologies to any O/RM tools out there that have as good concurrency support as Linq to Sql, I just know I prefer the flexibilty offered by Linq to Sql over NHibernate's. Now, that being said NHibernate's concurrency has always worked fine for me, it's just nice to have additional options.

First, when a ChangeConflictException is thrown it includes a ton of information such as the entity involved, the columns involved and allows your code to recover from it. Linq to Sql will also let you configure if you want to catch all change conflicts or fail as soon as the first conflict is found. These are features, which to my knowledge, NHibernate does not support.

Plus, this is basic but Linq to Sql has native support for SQL Server timestamp columns. This allows you to ensure that you know of all updates even if it occurs outside the scope of Linq to Sql. For some reason NHibernate still does not support this type of column. Instead it rolls its own version column.

Resolving stale data with RefreshMode allows for many options when re-syncing your objects with the database. Again, I just like the options.

Superior Stored Procedure Support

If you have a wealth of stored procedures, rest assured they are easy to use from Linq to Sql. Just drag (I do feel dirty using that word) the stored procedure from the server explorer to the methods list in the object relational designer and you will see a new method on your associated context which directly calls that stored procedure. To your code it looks the same as any other method.

Note it is also possible to write your Linq to Sql CRUD through stored procedures. This is also a relatively simple process.

The Bad

Very Basic Object Support

This is actually the killer here. Linq to Sql is a very basic O/RM and does not support many of the object oriented concepts sophisticated applications are likely to use. Just a few of the missing features are:

No Inheritance
No Value based objects (IE NHibernate Components)
No Timespan support (A huge problem for the Logistics field I work in)
Collections limited to EntitySet (which isn't even a real Set)

Where is the Dictionary support at least?

No SaveOrUpdate Equivalent

This forces more persistence knowledge to a lower level requiring that all code which associates an object with a context must know if it already exists in the database or not. This basically just adds extra checks in your code which should not be necessary. Sometimes it can seem a bit dirty to check if an object already has a primary key or not yourself, it seems like logic which doesn't belong within the application itself.

GUI based Drag & Drop

Yes, I know you can use a seperate mapping file, much like you can with NHibernate, but this isn't realistic. If you don't use the designer, you don't get the code generation. If you don't get the code generation you are responsible for writing all of the many Hooks in your objects that Linq to Sql needs. Folks, these objects are quite dirty. At least with NHibernate your objects are complete persistence ignorant (aka POCO aka Plain Old CLR Object) meaning they look clean and usable for more than just NHibernate. Therefore using anything besides the designer isn't very feasible.

The big problem here though is that your entire object graph needs to live in one diagram and the code behind these objects winds up in a single code file by default. This just isn't acceptable for applications of any size. Diagrams which contain 20-30 objects would be a major pain here, let alone applications that have hundreds. For large applications this just wouldn't fly.

Relationships Aren't Interface Based

All of the associations to related objects are handled with EntitySet and EntityRef. Whereas with NHibernate you have the ISet and just the object type you expect. This basically forces the Linq to Sql references on your object, decreasing the ability for unit testing your objects in my opinion. I also don't like the persistence based decencies on my objects.

Transaction API is Goofy

For whatever reason you need to handle all explicit transactions outside of the Linq to Sql context. You have to create and the commit it outside the context while supplying the transaction to the context while it is in use. Linq to Sql implicitly uses transactions for all calls to SubmitChanges, but you would think it would be possible to begin new transactions via the context, and then commit or roll them back through the context as well.

The Bottom Line

Really, I have only touched on a brief overview of Linq to Sql here. The important question I ask myself is, "Would I use this framework?". Well, it's a bit of a difficult question. If I was writing a small application which I knew would not grow in to a large one and my object model would be simple enough for the limited object support, yes I would use it. I could get up and going very fast, and I enjoy working with the context interfaces.

However, if I was working on a larger application (really doesn't take much to be too large for what I would do with Linq to Sql), or one which I thought had potential to adjust and grow over time, I would skip Linq to Sql and look for my trusty NHibernate.

So really, it would only be used for a very small subset of problems out there that I would try to solve.

All of that being said, I think Linq to Sql is very important to the .NET development community. Microsoft has historically tried to pretend that O/RM tools didn't exist and to do any development except their DataSets or repetitive patterns was crazy. Now that Microsoft has a framework to endorse it should greatly expand the exposure to such technologies in the .NET development community. I think overall this is a good thing, and will result in overall superior developers.

My only concern with this introduction is that people may get the idea that O/RM tools are nice, and get you up and going fast but fall flat on their face once you try to do anything advanced and then you need to resort to the same tools you used all along. This was actually a very common opinion by people I talked to about NHibernate a few years ago. They had heard of others using O/RM tools (not NHibernate specifically) and how they just don't handle advanced things, they are only good for simple things.

With Linq to Sql I hope developers become exposed to O/RM and become curious about other tools such as NHibernate when Linq to Sql is too simple for what they need instead of grouping all O/RM tools together as being too simple and idealistic.

I'm actually excited about the potential of the .NET development community now that more people will be exposed to O/RM. Long live O/RM tools, you have been lifesavers for me!

--John Chapman

Saturday, December 8, 2007

C# Type Inference But Still Strongly Typed

I have spent considerable time today reviewing Linq and more specifically Linq to Sql. I'm currently working on a blog post where I'll go in to the details of what I think the pros and cons of Linq to Sql are as well as my overall opinion. In case you couldn't guess it I'll be using NHibernate for my comparisons, after all it is what I'm familiar with.

While reviewing some things I ran in to the following compile time check. It was very simple for me to resolve, but I wonder if it will cause developers to fall in to traps. Especially those developers who have some experience with weakly typed languages such as Javascript.

Take a look at the following code I wrote:


AdventureWorksDataContext context = 
    new AdventureWorksDataContext();

var orders = from po in context.PurchaseOrderHeaders
           select po;

if (chkUseDate.Checked)
{
  orders = from po in orders
           where po.OrderDate > dtOrderFrom.Value
           select po;
}

orders = from po in orders
       orderby po.OrderDate ascending
       select new
       {
           po.PurchaseOrderID,
           po.RevisionNumber,
           po.OrderDate,
           po.ShipDate
       };

Does anyone see what is wrong with the code above and why it failed to compile?

The compile-time error was:

Cannot implicitly convert type 'System.Linq.IQueryable<AnonymousType#1>' to 'System.Linq.IQueryable<BLL.PurchaseOrderHeader>'. An explicit conversion exists (are you missing a cast?)

After seeing that I immediately realized that I tried to use an object which type inferred to return PurchaseOrderHeader objects to return anonymous type objects instead. You can't just change a reference to be of another type in C# 3.0, hence the strong typing, I should know better.

But honestly, with the whole var keyword, I wasn't really thinking about it. It was a minor slip up, but I wonder how many developers will fall in to that trap. I think some developers may have seen the var keyword before in Javascript and they may have used it in the fashion I just did.

That being said, I have been enjoying my time with Linq today. I should have a post up within the next few days with more details.

P.S. If you're wondering what is going on with the 3 step linq ueries above, that's how you write dynamic queries in Linq. Simply reference the previously defined query in your new linq query in order to further restrict the query which you are building. Keep in mind that writing a linq query doesn't perform any operations. You have to either enumerate over the values of the query or call a method on the query like ToArray(), ToDictionary(), Select() etc...

If you're curious how to resolve the issue above you just need to declare a new variable for the last query to store the new type. var results = <Linq expression> would work just fine.

--John Chapman

Sunday, October 28, 2007

.NET 3.5: The Good Stuff

With my last two postings "Partial Methods. What The?" and "C# 3.0 Extension Methods? A Good Idea?" taking shots at new features in .NET 3.5, I wanted to make a new post where I look at some of the new features I'm actually excited about. Even though this blog is titled "Chapman's Constant Complaining", I'm not negative about everything! There is a lot of good on the horizon.

1. Anonymous Types

Anonymous types are tool in .NET 3.5 that allows you to specify types based on the properties contained in the type, rather than the class declaration. The type has no name, meaning that you will not be able to construct a new instance of this type by using the new className() mechanism.

Basically, anonymous types free the developer from having to define a new class for every one off purpose in the application. I'm an NHibernate addict. I think it's absolute fabulous tool, and if you're reading this and you've never tried it, go download it right now. You can find it at www.nhibernate.org. Anyway, with NHibernate I find myself creating classes all the time which contain just the data I want returned from a query. You then create a constructor which takes just the fields you want and then reference this type in your HQL query. This is very helpful when trying to tune an HQL query for a complex search page where you need to collect display data from many objects, or even aggregates of child data.

In theory anonymous types will free developers from this time consuming task. NHibernate, much like LINQ, would be able to construct a new anonymous type for us containing just the properties we asked the HQL to return us. Note that this is actually the main use of anonymous types in .NET 3.5, support for arbitrary data being returned from LINQ expressions.

Anonymous types go beyond just queries though. How many times have you needed a simple two or three property class to perform calculations within a method? I've created one off "info" classes in the past. I don't know why I call them info, I just do. We now have the ability to define that "info" class as an anonymous type and no longer worry about the actual class declaration.

2. LINQ (Language Integrated Query)

This is the big daddy of .NET 3.5. Who hasn't heard of LINQ? The way it has been talked about it sounds like it will solve every problem ever created by a developer. I actually think LINQ is way over hyped. over hyped or not, it's still a very cool new feature which will change how we develop applications with .NET going forward.

First, I want to cover why I think LINQ is over hyped. LINQ is being talked about like there has never been anything similar in the past. Most examples which are provided are used to query a database using LINQ to SQL which is really just a very simplified OR/M tool. What some people don't realize is that tools like NHibernate, LLBLGen Pro, Entity Spaces and others have been around for a long time, offering better OR/M tools and very sophisticated query mechanisms for some time.

Now, LINQ still deserves it's credit here because it actually takes what we've learned from our OR/M tools a bit further. I'm most exited about having strong typing on my queries written in C#. I hate that the only tools I have available to me are based on strings. If an object's property is renamed it is often very difficult to spot any potential queries which were not updated until runtime. Having this check performed at compile time is a huge advantage.

Secondly I think it helps people look at their C# objects in a different manner. When trying to explain OR/M tools to developers who haven't previously used an OR/M I try to explain that the OR/M is really a synchronization tool, not a database persistence tool. Think of the database as just your extended memory, perhaps a second level memory store. As far as you are concerned there is no difference between objects in memory and objects in a database. LINQ helps to re-enforce this thinking in that you can write the same queries against your objects in memory as you do your objects in the database. I think this may cause some issues for some people at first, but eventually will be very beneficial.

3. Object Initializers

This is sort of a minor one. Object Initializers seem like they are purely syntactic sugar which can be skipped, but they allow us to offer new features like the above mentioned anonymous types and LINQ. Object initializers also allow us to offer mechanisms to define our objects without having to worry about which constructors the class's author defined for us. I think overall it makes the code a bit cleaner and easier to read.

4. Type Inference (The C# var keyword)

This could be argued as a minor point as well, except for the necessity of the feature for the above mentioned anonymous types and LINQ. I think it is convenient to be able to declare types with a var keyword when declaring a variable with an assignment.

That being said, it is a feature I don't see myself using a whole lot. I would prefer to type the entire type when it is known. This makes it a bit easier to read the code in my opinion. If you are using an IDE like Visual Studio 2008 it should not make much of a difference, but is it really that hard to write int i = 1 instead of var i = 1? I know we're really concerned about the long generic types, but a little extra typing never hurt anyone.

So while I think this is a very useful feature, I'm a little concerned that it may be taken to the extreme in some code where every variable declaration is defined as var. Plus people already make the mistake of constructing new types only to throw them away on the next line. I hope this doesn't make that problem worse. Who hasn't seen the following code?


ArrayList list = new ArrayList();
list = RunQuery();

5. Lots of Other Stuff

There are actually many other cool features built in to .NET 3.5 for us, like built in APIs for RSS 2.0 and ATOM as well as an improved garbage collector. Overall I'm excited about .NET 3.5. Now if only we could do something about those extension methods and partial methods!

--John Chapman

Chapman's Coding Corridor

About Me

Blog Archive