Sunday, December 9, 2007

Linq to Sql: The Good, The Bad & The Bottom Line

I promised my take on Linq to Sql a few days ago. I have spent some time over the past couple days playing with Linq to Sql connected to the AdventureWorks SQL Server sample database.

I have a lot of experience working with NHibernate so you may see some comparisons throughout the post.

Overview

Most everyone who is likely to read this post probably knows what Linq to Sql is. For those that don't, Linq to Sql (and really Linq in general) has been one of the most talked about (Once we found out Linq to Entities wasn't going to ship with Visual Studio 2008) and hyped features of Visual Studio 2008 and the .NET 3.5 framework.

Linq to Sql is actually a big shift for Microsoft. Linq to Sql is Microsoft's first production quality Object Relational Mapper or O/RM for short. They may have tried in the past with products such as ObjectSpaces, but this is the first tool to be released as a completed tool. O/RM tools exist to try and solve the Object-relational impedance mismatch which basically says that most applications are developed in object oriented programming languages these days yet the data which they operate on is typically stored in a relational database. This process of moving data between the objects and relations and vice versa is described as the impedance mismatch. There are obviously many fundamental differences between data stored in a relation and data stored in our objects.

Traditionally Microsoft has endorsed using DataSets to solve this problem. DataSets are essentially a relation based object in your object oriented programming language. Essentially it would allow you to work with your data in your application as relational data. The problem with this? You fail to take advantage of object oriented application design and the advantages it brings to you. Typically these programs have little testability and a significant amount of duplication. As such many O/RM tools became popular (although far less so than if Microsoft had endorsed them) such as NHibernate, LLBLGen Pro, Vanatec OpenAccess, Wilson ORMapper, EntitySpaces, eXpress Persistent Objects and many others (apologies to any I didn't list).

Note that Linq to Sql isn't necessarily a direct competitor to NHibernate or the other above listed O/RM tools for the .NET framework, that is Linq to Entities (AKA ADO.NET Entity Framework). Linq to Sql is more of an introduction to the O/RM world.

The Good

  • The Linq query language itself
The Linq query language is just awesome. It really is a joy when you start to work with it. It can quickly become a pain because it is complex, but then it makes you realize just how powerful it is. I have never seen a query language that is quite so rich. Basic queries are very simple to write and understand, yet it also provides functionality for very complex queries.

Plus, the queries are strongly typed so now there is much less to worry about when refactoring your business objects as compile-time checks are now available for your queries. Note that even with stored procedures, if you change a column in a table referenced by a stored procedure, it won't inform you that you just broke a stored procedure. Likewise stored queries in your applications will not inform you if you change a property name or column either.

For fun see the following blog post: Taking LINQ to Objects to Extremes: A fully LINQified RayTracer. This is not something you woudl actually do, but it does help reinforce just how powerful Linq really is.
  • Better Naming Conventions Than NHibernate
While working with Linq to Sql I felt that the methods on the context were easy to understand and more intuitive than the NHibernate equivalents. For example, when you want to save your changes to your database NHibernate says Flush whereas Linq to Sql uses SubmitChanges. But the big advantages are Linq to Sql's InsertOnSubmit vs NHibernate's Save as well as Attach versus NHibernate's Update or Lock methods.

I can't tell you how many times I've explained how the Save, Update and Lock functionality for NHibernate works. Most people seem to think that they need to call these methods to cause a database operation to take place. They assume Save = Execute Insert NOW, and Update means execute an update NOW! Then they use Flush for good measure because someone told them too. The Linq to Sql naming convention seems to imply that that is not quite what is going on.
  • Simple to Get Started
It didn't take me very long to get up and going with Linq to Sql. While I'm not the biggest fan of the Object Relational Designer, it sure is easy to use and fast to build basic object graphs. Someone who is not familiar with O/RM tools should be able to have objects mapped to database tables in a matter of minutes. This could work very well for simple RAD applications. This process really couldn't be much simpler.
  • Superior Optimistic Concurrency Support
My apologies to any O/RM tools out there that have as good concurrency support as Linq to Sql, I just know I prefer the flexibilty offered by Linq to Sql over NHibernate's. Now, that being said NHibernate's concurrency has always worked fine for me, it's just nice to have additional options.

First, when a ChangeConflictException is thrown it includes a ton of information such as the entity involved, the columns involved and allows your code to recover from it. Linq to Sql will also let you configure if you want to catch all change conflicts or fail as soon as the first conflict is found. These are features, which to my knowledge, NHibernate does not support.

Plus, this is basic but Linq to Sql has native support for SQL Server timestamp columns. This allows you to ensure that you know of all updates even if it occurs outside the scope of Linq to Sql. For some reason NHibernate still does not support this type of column. Instead it rolls its own version column.

Resolving stale data with RefreshMode allows for many options when re-syncing your objects with the database. Again, I just like the options.
  • Superior Stored Procedure Support
If you have a wealth of stored procedures, rest assured they are easy to use from Linq to Sql. Just drag (I do feel dirty using that word) the stored procedure from the server explorer to the methods list in the object relational designer and you will see a new method on your associated context which directly calls that stored procedure. To your code it looks the same as any other method.

Note it is also possible to write your Linq to Sql CRUD through stored procedures. This is also a relatively simple process.

The Bad
  • Very Basic Object Support
This is actually the killer here. Linq to Sql is a very basic O/RM and does not support many of the object oriented concepts sophisticated applications are likely to use. Just a few of the missing features are:
    • No Inheritance
    • No Value based objects (IE NHibernate Components)
    • No Timespan support (A huge problem for the Logistics field I work in)
    • Collections limited to EntitySet (which isn't even a real Set)
      • Where is the Dictionary support at least?
  • No SaveOrUpdate Equivalent
This forces more persistence knowledge to a lower level requiring that all code which associates an object with a context must know if it already exists in the database or not. This basically just adds extra checks in your code which should not be necessary. Sometimes it can seem a bit dirty to check if an object already has a primary key or not yourself, it seems like logic which doesn't belong within the application itself.
  • GUI based Drag & Drop
Yes, I know you can use a seperate mapping file, much like you can with NHibernate, but this isn't realistic. If you don't use the designer, you don't get the code generation. If you don't get the code generation you are responsible for writing all of the many Hooks in your objects that Linq to Sql needs. Folks, these objects are quite dirty. At least with NHibernate your objects are complete persistence ignorant (aka POCO aka Plain Old CLR Object) meaning they look clean and usable for more than just NHibernate. Therefore using anything besides the designer isn't very feasible.

The big problem here though is that your entire object graph needs to live in one diagram and the code behind these objects winds up in a single code file by default. This just isn't acceptable for applications of any size. Diagrams which contain 20-30 objects would be a major pain here, let alone applications that have hundreds. For large applications this just wouldn't fly.
  • Relationships Aren't Interface Based
All of the associations to related objects are handled with EntitySet and EntityRef. Whereas with NHibernate you have the ISet and just the object type you expect. This basically forces the Linq to Sql references on your object, decreasing the ability for unit testing your objects in my opinion. I also don't like the persistence based decencies on my objects.
  • Transaction API is Goofy
For whatever reason you need to handle all explicit transactions outside of the Linq to Sql context. You have to create and the commit it outside the context while supplying the transaction to the context while it is in use. Linq to Sql implicitly uses transactions for all calls to SubmitChanges, but you would think it would be possible to begin new transactions via the context, and then commit or roll them back through the context as well.

The Bottom Line

Really, I have only touched on a brief overview of Linq to Sql here. The important question I ask myself is, "Would I use this framework?". Well, it's a bit of a difficult question. If I was writing a small application which I knew would not grow in to a large one and my object model would be simple enough for the limited object support, yes I would use it. I could get up and going very fast, and I enjoy working with the context interfaces.

However, if I was working on a larger application (really doesn't take much to be too large for what I would do with Linq to Sql), or one which I thought had potential to adjust and grow over time, I would skip Linq to Sql and look for my trusty NHibernate.

So really, it would only be used for a very small subset of problems out there that I would try to solve.

All of that being said, I think Linq to Sql is very important to the .NET development community. Microsoft has historically tried to pretend that O/RM tools didn't exist and to do any development except their DataSets or repetitive patterns was crazy. Now that Microsoft has a framework to endorse it should greatly expand the exposure to such technologies in the .NET development community. I think overall this is a good thing, and will result in overall superior developers.

My only concern with this introduction is that people may get the idea that O/RM tools are nice, and get you up and going fast but fall flat on their face once you try to do anything advanced and then you need to resort to the same tools you used all along. This was actually a very common opinion by people I talked to about NHibernate a few years ago. They had heard of others using O/RM tools (not NHibernate specifically) and how they just don't handle advanced things, they are only good for simple things.

With Linq to Sql I hope developers become exposed to O/RM and become curious about other tools such as NHibernate when Linq to Sql is too simple for what they need instead of grouping all O/RM tools together as being too simple and idealistic.

I'm actually excited about the potential of the .NET development community now that more people will be exposed to O/RM. Long live O/RM tools, you have been lifesavers for me!

--John Chapman

4 comments:

Anonymous said...

good post; speaking as a o/r m vendor, I share most of the opinion about what Linq to SQL is good for (educate the market that o/r m is the way to go) but also a risk to hit the wall quickly and not to look further for pref. o/r m tools working in big real world projects.

Anonymous said...

with my 20+ years programming with databses, I have had a good look at LINQ to SQL, whilst a quick look suggests an improvement, once you look under the hood the performance and the concurrency issues are too problematic for me to use commercially at the moment. Within days of giving it a shot, I have back to datareaders and stored procedures, because they do exactly what I need. On a larger project you would end up with so many classes just for the LINQ to SQL that it would become unmanagable. Maybe version 2 will be better.

Anonymous said...

Actually, you don't have to use the visual designer to get code generation. If you use SQL Metal (a command line tool included with VS 2008 that I believe the LINQ to SQL visual designer uses underneath) you can generate all your classes and mapping from just a database connection. Using SQL Metal you can also tweek how the classes are generated (in case your database tables/columns aren't named very well). You can even generate POCO clases and the mapping file separately. You can then modify the mapping without modifying your entity classes.

At my work we are currently trying to decide between LINQ to SQL and nHibernate. I'm working on the LINQ to SQL research, so I thought I'd share a little bit that I've found out.

Using SQL Metal is much more flexible than using the visual designer.

Anonymous said...

Hi guys,

This is probably unfortunate for NHibernate but I have spent alot of time trying to fix DL's where contractor's have learn't NHibernate while building them. As a result I too have learn't loads about the product, mostly what is bad though. Everything from untested DL's with horrible HQL strings to custom query provider's with overlapping sessions in multi threaded environments(session!=threadsafe). Is there not possibly a remote argument for NHibernate allowing too much? I think people neglect to acknowledge things for both their simplicity and bleeding obvious flaws ... isn't it a hallmark of underdesign which Agile/SCRUM advocates? Everytime I see a NHibernate implementation I cringe at seeing what developers have done with it ...

Blogger Syntax Highliter