This thread looks to be a little on the old side and therefore may no longer be relevant. Please see if there is a newer thread on the subject and ensure you're using the most recent build of any software if your question regards a particular product.
This thread has been locked and is no longer accepting new posts, if you have a question regarding this topic please email us at support@mindscape.co.nz
|
I have a LINQ query I am using to select data from the database, and it is running rather slowly. When I look at the SQL generated, Lightspeed is pretty much doing a 'select *' from the database. The query is as follows (simplified to select less data, but which shows off even more the issue): var query = from p in uow.Products The select SQL that gets produced is as follows: SELECT The SQL generated is great, except for the fact it is selecting ALL the data from ALL the tables involved in the select. Which is a ton of data that is not needed at all for this particular query. Now I suspect the reason it is doing that is because it is actually materializing all the objects involved in the query into the identity map, so the end result is that after I have run this query all the following objects will be in the identity map: All Products in the result set Now I can understand that this might be efficient if we have a long running context, and that I might be later going back and referencing those objects at a later date, but 99% of the time that won't be the case at all. Most of the time this query will simply be used to display data in a grid on a web page, and you will page through it or sort/filter it to find useful information. Any use of the objects selected out of this query (say an order, or a product) will NOT be used in this unit of work at all, but would be used on a totall different web page (ie: clicking through to the product). Which means it still needs to get loaded from the database anyway. It most other ORM's the generated SQL just selects out the results from the database that match the result set, and the entities involved in the query do NOT end up in the identity map because the LINQ query was a projection to another type. I was not selecting out the actual entities in full in this query; if I did, it would be valid to ensure they are in the identity map. Rather I just want the ORM to let me perform a LINQ query and run it against the database, giving me the result set I need back, in this case projected into a DTO object that can be serialized over JSON. I suspect one reason for wanting to do this is that IF for some reason I had loaded a product, or an order or something else from the result set into the identity map and modified it and NOT saved it to the database yet, by making sure we materialize the entire entities into the identity map means we will pick up any changes made to those entities correctly in the query. In reality, the performance overhead of doing just that is way too high, and I would rather just call SaveChanges() if I had actually done this in my code to ensure it is in the database, for the LINQ query to find when it runs. I found it quite common in Entity Framework to have to persist changes to the database before a resulting LINQ query would pick up the changes to objects that had been loaded and modified. Not to mention that 99% of the time nothing would have caused any of those entities to be selected into the identity map before this query is run anyway (actually probably more like 99.999% of the time). Also, it is worth bearing in mind that in many situations NEW objects might be getting added to the unit of work identity map, and then a few lines of code later you may need to run a query that relies on those objects being present in the database (I have those exact situations in our code already and I suspect they are quite common). And unless I am mistaken, Lightspeed is NOT going to run a LINQ to Objects query against the identity map prior to running the SQL against the database for performance reasons, so in this situation you would still need to perform a SaveChanges() anyway, to make sure the entities end up in the database and the LINQ query can run and pick them up. To me this is an unacceptable performance overhead, and an area where if I wrote the same query in direct ADO.NET it would literally SMOKE the LINQ query probably by a factor of 100x or more. It is this kind of stuff that ORM's do to keep us at a much higher level than the database that make me seriously consider just doing PetaPoco as a bare metal mini-ORM. I need an ORM that is as bare metal as you can get when I am running LINQ queries, but do neat stuff with the identity map ONLY when I am either finding an object by ID in the database OR running a LINQ query when I am actually selecting a complete object (or set of complete objects) from the database. If I am doing a projection on the result set, that is a clear message to the ORM that I DON'T want you to *** around with objects in the identity map and simply pull what is in the database as efficiently as possible. |
|
|
By contrast, this is the SQL generated by Entity Framework: SELECT |
|
|
Ignoring the fact that Entity Framework is ridiculously slow for other reasons, it does produce the SQL I want. |
|
|
Just to illustrate just how bad this really is, I ran a performance test against my unit test database, which is quite small. It would be a lot worse on the real database given that it will produce a LOT more data. Here are the results: Lightspeed (0): 745.0427 |
|
|
Here are the results when I run this against a backup of the live database: Lightspeed (0): 2106.1205 |
|
|
As part of the LINQ provider, before we can perform a projection on the server side LightSpeed needs to make a determination about if it can actually translate what is being asked for into a server side statement. In the case above we dont have the support for translating the projection so it falls back on selecting out the entities and then running the projection on the client. Ill have a look at the specifics as to what is missing support wise and see if we can add that in - no promises though. In terms of the performance comparison, you are comparing apples with oranges. ADO.NET is not an ORM nor does it have to parse a LINQ expression tree. I presume you are running the same exact statement as LightSpeed generates (rather than the one EF generates) in the comparison, so the difference between the numbers you are seeing will represent the time taken to parse the LINQ query (which is not insignificant so compiled queries would likely help with subsequent runs) and then determine what SQL needs to be generated to load the entity and any related entities and then in this particular case materialize the entities and finally perform the projection client side. If you are happy to be working with a data reader then you can fall back on ADO.NET by using the UnitOfWork.PrepareCommand(IDbCommand) call to attach the command to the same connection that the UnitOfWork has open and then execute that to return a DataReader, so you might want to consider that if that fits with what you are doing.
Jeremy |
|
|
When I debug through the code I can see that it is failing on this line: if (!ContainsOnlyPropertiesOfEntityTypes(Projectors[Projectors.Count - 1])) in SingleQueryPlan.cs, so I assume that is why it is not able to project natively. Which is odd, because the stripped down query I posted in my original post is pretty simple, so it is not clear to me why it cannot be converted to a native projection query? Clearly fixing that should have a positive effect on the performance. However it is not entirely clear to me if these kinds of LINQ queries are bringing in data and keeping track of all the materialized entities in the identity map or not. As I said before, for this kind of query keeping track of the entities in the identity map is just going to add a lot of extra overhead that is not necessary, because we won't be using those entities in the same unit of work anyway (one or two of them might be used in a separate unit of work in a totally different HTTP request). When I debug the code, before I call the query the Unit Of Work has zero entities in it. When I complete the request, it has 58 items in the identity map, and the result of the query in my unit test produces 52 entries in the projected DTO entity. So clearly we have a worst case scenario where more data than necessary is loaded into into memory (and tracked in the identity map) and then the final projection is done in memory. I know comparing against ADO.NET is an Apples to Oranges comparison, but it is valid because it lets you know the 'goal' performance for an ORM. The goal of an ORM should be to perform the queries as fast as possible, and try to approach the speed of ADO.NET. Clearly it will never be possible to match the speed, because the ORM does stuff that is not done in an ADO.NET layer (kind of like comparing C to hand coded assembler). But it is a good goal to have in mind, because as a developer, if I am faced wit a tool that can make me more productive and allow us to write code faster, but it takes a 40x performance hit, we simply cannot use it. If the performance hit is more reasonable, say 2-3x, now we are in the ballpack. The way I look at it, the ORM has three things that it strives to do: 1. Simplify modelling the database and translating the database tables into object model entities 2. Simplify writing CRUD operations to the database 3. Simplify writing queries against the database by allowing you to query the object space, rather than the physical tables Although sometimes I am not entirely sure if LINQ syntax actually makes things clearer, or just makes them more confusing! Most important though, is that if we have to take a 40x performance hit to do those three things above, the solution is not going to be very useable. We need it to be faster. So the way I look at it, there are three things internally that a LINQ provider ORM does where performance is critical: 1. Compiles LINQ queries into SQL statements Clearly as the support for complex LINQ statements and complex model mappings get added, the complexity of turning LINQ statements into SQL grows and the performance suffers. Compiled LINQ queries with a query cache will probably make a big different here, and eliminate a lot of the overhead of building the SQL for subsqequent queries. Lightspeed is already one of the fastest ORM's I have tested at building the LINQ queries, and I expect it will be really good with compiled LINQ. It would be nice to get some kind of warning though if something causes a LINQ statement to perform a memory operation, rather than doing it silently. If my LINQ statements cannot be translated entirely to SQL, I don't want them executing. If something needs to run in memory, I want to be the one that controls that so I know the performance overheads associated with it. 2. Execute the SQL from the LINQ query and materialize the entities This is something where Lightspeed is pretty fast (much faster than Entity Framework), but I still think there is room for improvement in this area using techniques similar to what PetaPoco does. The goal would be to get the objects out of the database as quickly as possible, and building a compiled entity factory for each LINQ statement would make a lot of sense. Perhaps a compiled entity factory could be compiled and cached alongside the compiled LINQ query in the new compiled query cache, so the second time a q 3. |
|
|
Grr. Not sure what happened there. I guess I hit a wrong key and submitted my post! Anyway. to finished up: Perhaps a compiled entity factory could be compiled and cached alongside the compiled LINQ query in the new compiled query cache, so the second time a query is called and it is in the cache, both the compiled SQL along with a compiled entity factory can be used to both execute the SQL quickly and return the entities quickly. 3. Managing entities in the identity map. The primary purpose of the identity map is to keep track of what entities are already loaded, so that you can make changes to them and them simply save the unit of work and those changes will get updated. This is pretty important, and a fundamental part of the way the ORM's work. But I think it is a mistake to materialize entities and stick them into the identity map when the RESULT of a query is a projection, and not native entities. In this case, I am really trying to just run a SQL query against the database and get results out, and I want it done as fast as possible. So I want the LINQ compiled into SQL, executed, and then my projected entities returned to me without the identity map involved. Otherwise we are just adding a whole bunch of extra overhead to manage the identity map when the entities being tracked in it simply won't be touched. Yes, if I run a query to find all the entities that match a particular query, and I select real entities out of my query, then I would want them tracked in the identity map because it is likely I am doing this specifically to make changes to each entity as it is loaded, and probably want to then submit them back. I can understand that there might be situations where I might then run a query against the database in a loop where some of the entities I am querying are in the identity map, but the extra overhead of making sure the same entity values are returned in my projection is simply too high IMHO. I would rather require that I have to call SaveChanges() to flush the changes to the database than incur the overhead of the entity map being managed in this case. |
|
|
In LightSpeed the LINQ provider is a translator between LINQ expressions and our underlying querying API so it is not going to translate directly into SQL. You will naturally gain some performance by using the underlying querying API rather than going through LINQ - that said it can be a lot more natural to express your queries using LINQ so its a bit of a trade-off for developer productivity. In general we have avoided where possible the use of client side execution but there has to be a balance struck when it comes to projections. Because LightSpeed is entity centric it is intending to operate over entities, so expressing projections in terms of entities may often mean we need to perform that projection client-side to take into account lazy-loaded collections and the like. Not doing so would cause most queries to fail which would not be an acceptable outcome. We are doing some work on upping the profiling in LightSpeed over the next wee while so the suggestion of raising a warning when a client side projection occurs is a good one and we could raise this as part of the profiling information which would allow you to keep track of things. That said, just watching the SQL is also a good way of tracking this as you are already doing. In terms of the identity map, yes - any entities which are materialized will be added to the identity map so they can be tracked. From LightSpeeds perspective this work is done within the querying engine (in Mindscape.LightSpeed.dll) which is not aware of the LINQ provider, so while you may have used it as a projection an entity was still loaded so it still needs to be tracked. There is a deliberate disconnect between the LINQ provider and the core querying engine while we maintain .NET 2.0 compatibility.
Hope that provides a bit more info! Jeremy |
|
|
I guess maybe what I need is a really fast LINQ to SQL engine, combined with a really fast object materializer and no identity map. That would get me the performance I need, without losing too much of the benefits of an ORM (relationships would need to be managed directly with foreign keys and entities saved separately). |
|
|
Hi Kendall, Just to note - you might not want an identity map in this situation but you'll be throwing away a HUGE amount of performance if you don't have one. Querying every single time you need an object is extremely wasteful. We do agree in this specific situation that we can improve the query, so please don't take this reply as saying we won't look at improving it. However typically performance gains in software development do not come from micro optimisations through saving milliseconds at the query level - it comes from the overall architecture of the solution. Throwing away an identity map and just issuing queries exactly as you want them will not give you the performance you need. The caching layers on top are where the real performance gains are to be made. Large scale sites do not handle load by simply improving the query cost - they get rid of the queries all together if possible by caching those objects in things like an identity map and second level caches. My concern here is that you're taking a very focused look at the millisecond count on a query and not enough on the overall performance picture. You should be investigating what makes sense for entity caching, page output caching, how long your unit of work runs for (if it's per request as recommended for web apps I'd be alarmed if you demanded to not use the identity map). Those areas are where you will dramatically improve your system performance - not just shave a few milliseconds off a query. As you've seen to date (and this will only improve with compiled queries) - LightSpeed has a fast LINQ to SQL translation. It also has crazy fast object materalization. I suspect if we didn't have the identity map and other caching options it would only be a matter of time before you saw the benefits of having them and would be wanting support for that also. If you have very special situations where you absolutely must have a query the way you need it and you want to ignore the identity map then you can use a FindBySql or Stored Procedure. John-Daniel Trask |
|
|
I have thought about this a lot over the weekend, and you hit the nail on the head. This IS about performance, and the arguments both for and against and identity map will also revolve around performance. Getting rid of it gains me performance right now, in that my queries right much faster against the DB, at the expense of always running the query when you need an entity. Having an identity map means you gain performance later, when you need to materialize the same entity directly using a FindById() call for instance. But the key is that with any large scale application, you are going to get the larger performance gains with caching, period. But you will get the largest performance gains using caching by information at the application level, rather than trying to do it at the ORM level. We already have quite a few layers of caching built into our code at the application level, and especially with web sites, if you identify something that is slow you can work out what components of what is being displayed are invariant across page requests, and build a system to cache those values. But I am not entirely convinced that caching at the ORM level is going to be win in the long run, when individual queries take 40x as long to execute? We need to get that performance difference down. However bear in mind that with all the ORM's I have seen to date, and as I mentioned before, I don't think queries are going to get executed against the identity map entities before it gets executed in SQL. From what I have seen, the SQL is always generated and always executed, but the resulting data that is pulled out of the database is matched up against the entity map. I have yet to see a query short circuit and not actually hit the database. I am not sure if that will even happen with a 2nd level cache implementation either, at least not in OpenAccess (EF does not have a 2nd level cache yet). So in that case, there is no caching advantage to having the identity map there if you are running LINQ queries, since it is going to hit the database with SQL anyway. Maybe that is something that is different in Lightspeed, in that the query will be executed against the 2nd level cache if it is present? I can't see how a query could be completely eliminated from hitting the database, unless the 2nd level cache is doing some aggressive caching on entire tables? I have not yet played with stored procs and Lightspeed; I assume that when I map a stored proc to Lightspeed and execute it, it will just materialize the results into an entity that maps to the stored proc columns and never hits the identity map? If so, I will make a stored proc to execute this particular query that is causing me problems and see how it stacks up performance wise. That might be a good compromise to put the important queries like this into stored procs. But of course this particular query can be a whole lot better if you guys can work out why it is doing the projection in memory somehow? Or is that simply not possible due to the translation from LINQ to the Lightspeed query language? Maybe you need a LINQ to SQL implementation for better performance in the LINQ case, rather than building a query to run against the older query language? |
|
|
Its performing the projection in memory because there we are making the determination that the projection cannot be supported, which is indeed the case, because the projection builder does not know how to handle conditional expressions. I will be having a look at adding in some support for conditionals and will update you once we have some progress on this.
Jeremy |
|
|
Those are necessary because of the left joins. Originally I just selected the values but I was getting null exceptions. Is there another way to express this without the conditionals to project the results of the left joins? I also tried navigation properities but this crashed also. |
|
|
Hi Kendall, Just a quick update to let you know we have added in the support for handling conditionals for server side projections and this will be available in the next nightly build.
Jeremy |
|
|
Excellent. I am out of the office until Monday so I will give this a spin then.
|
|
|
I tried the most recent nightly build, but it made no difference to this query. It still cannot be projected natively? |
|
|
Are you able to send over a small repro of your model and the associated query? A query with the same syntax is working correctly here in our test suite so perhaps there is some difference associated with your model that is affecting this.
Jeremy |
|
|
Ivan should already have my model and test database, and the query was posted at the start of the thread that can be run against that model. My model has not really changed at all from what I sent him before, but I can send an updated test case with that specific query in it if you would like? |
|
|
The model we currently have doesnt build because of the circular references, presumably you have corrected this so could you send through the updated one you are working off just so we are working off the same model :)
Thanks! Jeremy |
|