LINQ in .NET returning duplicate rows from a working SQL view [SOLVED]

How to correct duplicate row data in Entity Framework and LINQ for .NET

Twice in the past couple of months I’ve gotten tripped up by the same data issue when using LINQ to Entities in the .NET framework. I create a simple view in my database which performs some aggregation to return a resultset containing just two columns. Querying the view returns the expected results. Performing a LINQ query on the view returns incorrect and confusing results.

The Problem

For a little background, the view is performing a JOIN on another table, then grouping the results by an ID and summing another column to create and ID,SUM pair of columns. It looks something like the following:

SELECT b.Id, COUNT(DISTINCT a.UserId) AS users FROM table1 AS a INNER JOIN table2 AS b ON a.Id = b.Id GROUP BY b.Id, a.OtherId HAVING (COUNT(DISTINCT a.UserId) > 1)

Executing this query in SQL returns results such as:

In LINQ, I make a simple call to the same view:

var results = dataContainer.MyViewName.ToList();

The result set that is returned from that call looks like this:

As you can see, the correct number of records for each group is returned but the second column value is the same for each row.

The Cause

The reason this is happening is that the Entity Framework has attempted to identify a unique primary key on the view when you added the view object to your .edmx file. Since no primary key is set on a view, it simply took the first column in the table. The problem here should now be obvious, the first column is clearly not unique.

Since EF thinks the first column is unique, it does some internal data building on the result set to build out the entity object by pairing the rest of the data columns to its matching record based on this “primary” key value. The end result is that the same record is duplicated for every data row since the same value is always assigned to the identical primary key.

The Solution

To solve this problem there are two straightforward options.

The first is that you can simply add a column to your view select statement which is unique. You don’t have to use that column in your program but it will allow EF to properly build each object for you. Of course, base on your query and groupings this may not be possible.

The second option is to add a pseudo key to your view. This option should work in basically every case and is the option that we went with. The new column will create an incrementing integer which is basically a pointer to the row number of the results. To do this in MS SQL Server, add the following to the beginning of your column selection:

ROW_NUMBER() OVER( ORDER BY b.Id) as NID

which alters the example view to:

SELECT ROW_NUMBER() OVER( ORDER BY b.Id) as NID, b.Id, COUNT(DISTINCT a.UserId) AS users FROM table1 AS a INNER JOIN table2 AS b ON a.Id = b.Id GROUP BY b.Id, a.OtherId HAVING (COUNT(DISTINCT a.UserId) > 1)

Executing this query in SQL returns results such as:

Now if you drop the view from your data model and re-add it, EF will pick up the real primary key column that is labeled NID and your LINQ statement will perform as expected.

Read more of Matthew Mombrea's ByteStream blog and follow Matt on Twitter (@mombrea) and Google+. For the latest IT news, analysis and how-tos, follow ITworld on Twitter and Facebook.

From CIO: 8 Free Online Courses to Grow Your Tech Skills
Join the discussion
Be the first to comment on this article. Our Commenting Policies