Excessive select data in Lightspeed for LINQ query?

Boy does time fly!

This thread looks to be a little on the old side and therefore may no longer be relevant. Please see if there is a newer thread on the subject and ensure you're using the most recent build of any software if your question regards a particular product.

This thread has been locked

This thread has been locked and is no longer accepting new posts, if you have a question regarding this topic please email us at support@mindscape.co.nz

KendallB
97 posts

I have a LINQ query I am using to select data from the database, and it is running rather slowly. When I look at the SQL generated, Lightspeed is pretty much doing a 'select *' from the database. The query is as follows (simplified to select less data, but which shows off even more the issue):

var query = from p in uow.Products
              join pi in uow.InventoryHistories on p.Id equals pi.ProductId
              join e in uow.Employees on pi.EmployeeId equals e.Id into es
              from e in es.DefaultIfEmpty()
              join o in uow.Orders on pi.ChangerId equals o.Id into os
              from o in os.DefaultIfEmpty()
              join s in uow.OrderStatuses on o.OrderStatusId equals s.Id into ss
              from s in ss.DefaultIfEmpty()
              where p.Id == productId
              orderby pi.Id descending
              select new DTO.InventoryHistory {
                  Id = pi.Id,
                  EmployeeId = (e == null) ? (int?)null : e.Id,
                  OrderStatus = (o == null) ? (int?)null : o.OrderStatusId,
              };

The select SQL that gets produced is as follows:

SELECT
t0.products_id AS "t0.products_id",
t0.products_barcode AS "t0.products_barcode",
t0.products_cost AS "t0.products_cost",
t0.products_date_added AS "t0.products_date_added",
t0.products_date_available AS "t0.products_date_available",
t0.date_counted AS "t0.date_counted",
t0.dealer_price AS "t0.dealer_price",
t0.demand_rank AS "t0.demand_rank",
t0.Discontinued AS "t0.Discontinued",
t0.products_discounts_id AS "t0.products_discounts_id",
t0.distributor_price AS "t0.distributor_price",
t0.products_drop_ship AS "t0.products_drop_ship",
t0.products_status AS "t0.products_status",
t0.export_rule_id AS "t0.export_rule_id",
t0.Featured AS "t0.Featured",
t0.products_free_ship_eligible AS "t0.products_free_ship_eligible",
t0.harmonized_code_id AS "t0.harmonized_code_id",
t0.products_height AS "t0.products_height",
t0.products_image AS "t0.products_image",
t0.products_image_lrg AS "t0.products_image_lrg",
t0.products_image_med AS "t0.products_image_med",
t0.products_image_sm_1 AS "t0.products_image_sm_1",
t0.products_image_sm_2 AS "t0.products_image_sm_2",
t0.products_image_sm_3 AS "t0.products_image_sm_3",
t0.products_image_sm_4 AS "t0.products_image_sm_4",
t0.products_image_sm_5 AS "t0.products_image_sm_5",
t0.products_image_sm_6 AS "t0.products_image_sm_6",
t0.products_image_xl_1 AS "t0.products_image_xl_1",
t0.products_image_xl_2 AS "t0.products_image_xl_2",
t0.products_image_xl_3 AS "t0.products_image_xl_3",
t0.products_image_xl_4 AS "t0.products_image_xl_4",
t0.products_image_xl_5 AS "t0.products_image_xl_5",
t0.products_image_xl_6 AS "t0.products_image_xl_6",
t0.last_back_in_stock AS "t0.last_back_in_stock",
t0.products_last_modified AS "t0.products_last_modified",
t0.last_out_of_stock AS "t0.last_out_of_stock",
t0.products_length AS "t0.products_length",
t0.locations_id AS "t0.locations_id",
t0.products_map AS "t0.products_map",
t0.manufacturers_id AS "t0.manufacturers_id",
t0.max_demand_count AS "t0.max_demand_count",
t0.max_stock_level AS "t0.max_stock_level",
t0.min_stock_level AS "t0.min_stock_level",
t0.products_model AS "t0.products_model",
t0.origin_country_id AS "t0.origin_country_id",
t0.products_price AS "t0.products_price",
t0.print_barcode AS "t0.print_barcode",
t0.products_quantity AS "t0.products_quantity",
t0.products_qty_blocks AS "t0.products_qty_blocks",
t0.products_price1_qty AS "t0.products_price1_qty",
t0.products_price2_qty AS "t0.products_price2_qty",
t0.products_price3_qty AS "t0.products_price3_qty",
t0.products_price4_qty AS "t0.products_price4_qty",
t0.products_price1 AS "t0.products_price1",
t0.products_price2 AS "t0.products_price2",
t0.products_price3 AS "t0.products_price3",
t0.products_price4 AS "t0.products_price4",
t0.restock_days AS "t0.restock_days",
t0.products_retail AS "t0.products_retail",
t0.ship_restriction AS "t0.ship_restriction",
t0.ship_separately AS "t0.ship_separately",
t0.slave_only AS "t0.slave_only",
t0.stock_multiples AS "t0.stock_multiples",
t0.products_tax_class_id AS "t0.products_tax_class_id",
t0.products_ordered AS "t0.products_ordered",
t0.Virtual AS "t0.Virtual",
t0.products_weight AS "t0.products_weight",
t0.products_width AS "t0.products_width",
t1.products_inventory_history_id AS "t1.products_inventory_history_id",
t1.changer_id AS "t1.changer_id",
t1.products_cost AS "t1.products_cost",
t1.date_changed AS "t1.date_changed",
t1.demand_rank AS "t1.demand_rank",
t1.Discontinued AS "t1.Discontinued",
t1.employees_id AS "t1.employees_id",
t1.products_status AS "t1.products_status",
t1.final_price AS "t1.final_price",
t1.products_map AS "t1.products_map",
t1.products_model AS "t1.products_model",
t1.products_price AS "t1.products_price",
t1.products_id AS "t1.products_id",
t1.products_quantity AS "t1.products_quantity",
t1.products_retail AS "t1.products_retail",
t1.reason_changed AS "t1.reason_changed",
t1.supplier AS "t1.supplier",
t2.employees_id AS "t2.employees_id",
t2.employees_enabled AS "t2.employees_enabled",
t2.employees_lastlogin AS "t2.employees_lastlogin",
t2.employees_login AS "t2.employees_login",
t2.employees_name AS "t2.employees_name",
t2.employees_pin AS "t2.employees_pin",
t2.employees_password AS "t2.employees_password",
t2.employees_pwdchange AS "t2.employees_pwdchange",
t3.orders_id AS "t3.orders_id",
t3.cc_avs_code AS "t3.cc_avs_code",
t3.billing_address_format_id AS "t3.billing_address_format_id",
t3.billing_city AS "t3.billing_city",
t3.billing_company AS "t3.billing_company",
t3.billing_country AS "t3.billing_country",
t3.billing_name AS "t3.billing_name",
t3.billing_postcode AS "t3.billing_postcode",
t3.billing_state AS "t3.billing_state",
t3.billing_street_address AS "t3.billing_street_address",
t3.billing_suburb AS "t3.billing_suburb",
t3.brand_id AS "t3.brand_id",
t3.orders_carriers_id AS "t3.orders_carriers_id",
t3.checked_by_id AS "t3.checked_by_id",
t3.cc_expires AS "t3.cc_expires",
t3.cc_number AS "t3.cc_number",
t3.cc_owner AS "t3.cc_owner",
t3.cc_type AS "t3.cc_type",
t3.Currency AS "t3.Currency",
t3.currency_value AS "t3.currency_value",
t3.customers_address_format_id AS "t3.customers_address_format_id",
t3.customers_city AS "t3.customers_city",
t3.customers_company AS "t3.customers_company",
t3.customers_country AS "t3.customers_country",
t3.customers_id AS "t3.customers_id",
t3.customers_name AS "t3.customers_name",
t3.customers_po AS "t3.customers_po",
t3.customers_postcode AS "t3.customers_postcode",
t3.customers_state AS "t3.customers_state",
t3.customers_street_address AS "t3.customers_street_address",
t3.customers_suburb AS "t3.customers_suburb",
t3.cc_cvv_code AS "t3.cc_cvv_code",
t3.date_checked AS "t3.date_checked",
t3.date_pulled AS "t3.date_pulled",
t3.date_purchased AS "t3.date_purchased",
t3.date_shipped AS "t3.date_shipped",
t3.delivery_address_format_id AS "t3.delivery_address_format_id",
t3.delivery_city AS "t3.delivery_city",
t3.delivery_company AS "t3.delivery_company",
t3.delivery_country AS "t3.delivery_country",
t3.delivery_name AS "t3.delivery_name",
t3.delivery_postcode AS "t3.delivery_postcode",
t3.delivery_state AS "t3.delivery_state",
t3.delivery_street_address AS "t3.delivery_street_address",
t3.delivery_suburb AS "t3.delivery_suburb",
t3.customers_email_address AS "t3.customers_email_address",
t3.for_resale AS "t3.for_resale",
t3.is_paid AS "t3.is_paid",
t3.last_modified AS "t3.last_modified",
t3.orders_status AS "t3.orders_status",
t3.paypal_ipn_id AS "t3.paypal_ipn_id",
t3.payment_days AS "t3.payment_days",
t3.payment_method AS "t3.payment_method",
t3.Points AS "t3.Points",
t3.points_redeemed AS "t3.points_redeemed",
t3.pulled_by_id AS "t3.pulled_by_id",
t3.delivery_residential AS "t3.delivery_residential",
t3.delivery_rush AS "t3.delivery_rush",
t3.delivery_saturday_delivery AS "t3.delivery_saturday_delivery",
t3.ship_priority AS "t3.ship_priority",
t3.shipped_by_id AS "t3.shipped_by_id",
t3.delivery_signature_confirm AS "t3.delivery_signature_confirm",
t3.customers_telephone AS "t3.customers_telephone",
t3.cc_txn_id AS "t3.cc_txn_id",
t4.orders_status_id AS "t4.orders_status_id",
t4.shipped_status AS "t4.shipped_status",
t4.language_id AS "t4.language_id",
t4.orders_status_name AS "t4.orders_status_name",
t4.sort_order AS "t4.sort_order"
FROM
products t0
    INNER JOIN products_inventory_history t1 ON t0.products_id = t1.products_id
    LEFT OUTER JOIN employees t2 ON t1.employees_id = t2.employees_id
    LEFT OUTER JOIN orders t3 ON t1.changer_id = t3.orders_id
    LEFT OUTER JOIN orders_status t4 ON t3.orders_status = t4.orders_status_id
WHERE
t0.products_id = 2226
ORDER BY
t1.products_inventory_history_id DESC

The SQL generated is great, except for the fact it is selecting ALL the data from ALL the tables involved in the select. Which is a ton of data that is not needed at all for this particular query.

Now I suspect the reason it is doing that is because it is actually materializing all the objects involved in the query into the identity map, so the end result is that after I have run this query all the following objects will be in the identity map:

All Products in the result set
All InventoryHistories in the result set
All Employees in the result set
All Orders in the result set
All OrderStatuses in the result set

Now I can understand that this might be efficient if we have a long running context, and that I might be later going back and referencing those objects at a later date, but 99% of the time that won't be the case at all. Most of the time this query will simply be used to display data in a grid on a web page, and you will page through it or sort/filter it to find useful information. Any use of the objects selected out of this query (say an order, or a product) will NOT be used in this unit of work at all, but would be used on a totall different web page (ie: clicking through to the product). Which means it still needs to get loaded from the database anyway.

It most other ORM's the generated SQL just selects out the results from the database that match the result set, and the entities involved in the query do NOT end up in the identity map because the LINQ query was a projection to another type. I was not selecting out the actual entities in full in this query; if I did, it would be valid to ensure they are in the identity map. Rather I just want the ORM to let me perform a LINQ query and run it against the database, giving me the result set I need back, in this case projected into a DTO object that can be serialized over JSON.

I suspect one reason for wanting to do this is that IF for some reason I had loaded a product, or an order or something else from the result set into the identity map and modified it and NOT saved it to the database yet, by making sure we materialize the entire entities into the identity map means we will pick up any changes made to those entities correctly in the query. In reality, the performance overhead of doing just that is way too high, and I would rather just call SaveChanges() if I had actually done this in my code to ensure it is in the database, for the LINQ query to find when it runs. I found it quite common in Entity Framework to have to persist changes to the database before a resulting LINQ query would pick up the changes to objects that had been loaded and modified. Not to mention that 99% of the time nothing would have caused any of those entities to be selected into the identity map before this query is run anyway (actually probably more like 99.999% of the time).

Also, it is worth bearing in mind that in many situations NEW objects might be getting added to the unit of work identity map, and then a few lines of code later you may need to run a query that relies on those objects being present in the database (I have those exact situations in our code already and I suspect they are quite common). And unless I am mistaken, Lightspeed is NOT going to run a LINQ to Objects query against the identity map prior to running the SQL against the database for performance reasons, so in this situation you would still need to perform a SaveChanges() anyway, to make sure the entities end up in the database and the LINQ query can run and pick them up.

To me this is an unacceptable performance overhead, and an area where if I wrote the same query in direct ADO.NET it would literally SMOKE the LINQ query probably by a factor of 100x or more. It is this kind of stuff that ORM's do to keep us at a much higher level than the database that make me seriously consider just doing PetaPoco as a bare metal mini-ORM.

I need an ORM that is as bare metal as you can get when I am running LINQ queries, but do neat stuff with the identity map ONLY when I am either finding an object by ID in the database OR running a LINQ query when I am actually selecting a complete object (or set of complete objects) from the database. If I am doing a projection on the result set, that is a clear message to the ORM that I DON'T want you to *** around with objects in the identity map and simply pull what is in the database as efficiently as possible.

Posted on Aug 08 2011

KendallB
97 posts

By contrast, this is the SQL generated by Entity Framework:

SELECT
Project1.products_inventory_history_id,
Project1.C1,
Project1.C2
FROM (SELECT
Extent2.products_inventory_history_id,
CASE WHEN (Extent3.employees_id IS NULL) THEN (NULL) ELSE (Extent3.employees_id) END AS C1,
CASE WHEN (Extent4.orders_id IS NULL) THEN (NULL) ELSE (Extent4.orders_status) END AS C2
FROM products AS Extent1
INNER JOIN products_inventory_history AS Extent2 ON Extent1.products_id = Extent2.products_id
LEFT OUTER JOIN employees AS Extent3 ON Extent2.employees_id = Extent3.employees_id
LEFT OUTER JOIN orders AS Extent4 ON Extent2.changer_id = Extent4.orders_id
LEFT OUTER JOIN orders_status AS Extent5 ON Extent4.orders_status = Extent5.orders_status_id
WHERE Extent1.products_id = @p__linq__0) AS Project1
ORDER BY Project1.products_inventory_history_id DESC

Posted on Aug 08 2011

KendallB 97 posts	Ignoring the fact that Entity Framework is ridiculously slow for other reasons, it does produce the SQL I want. Posted on Aug 08 2011

KendallB
97 posts

Just to illustrate just how bad this really is, I ran a performance test against my unit test database, which is quite small. It would be a lot worse on the real database given that it will produce a LOT more data. Here are the results:

Lightspeed (0): 745.0427
Lightspeed (1): 296.0169
Lightspeed (2): 308.0176
Lightspeed (3): 289.0165
Lightspeed (4): 287.0165
Lightspeed (5): 288.0164
Lightspeed (6): 289.0166
Lightspeed (7): 303.0173
Lightspeed (8): 290.0166
Lightspeed (9): 293.0167
Lightspeed (Average): 293.683455555556

ADO.NET (0): 39.0022
ADO.NET (1): 7.0004
ADO.NET (2): 7.0004
ADO.NET (3): 7.0004
ADO.NET (4): 6.0003
ADO.NET (5): 7.0004
ADO.NET (6): 9.0005
ADO.NET (7): 8.0005
ADO.NET (8): 6.0003
ADO.NET (9): 7.0004
ADO.NET (Average): 7.11151111111111

I was not quite right in that it would be 100x slower, but it is 41x slower. I am going to run the same test against our real database now, and see just how bad this can get.

Posted on Aug 08 2011

KendallB
97 posts

Here are the results when I run this against a backup of the live database:

Lightspeed (0): 2106.1205
Lightspeed (1): 905.0517
Lightspeed (2): 968.0554
Lightspeed (3): 903.0517
Lightspeed (4): 932.0533
Lightspeed (5): 1004.0574
Lightspeed (6): 967.0553
Lightspeed (7): 967.0553
Lightspeed (8): 961.055
Lightspeed (9): 940.0538
Lightspeed (Average): 949.720988888889

ADO.NET (0): 194.0111
ADO.NET (1): 22.0012
ADO.NET (2): 25.0015
ADO.NET (3): 29.0016
ADO.NET (4): 29.0017
ADO.NET (5): 21.0012
ADO.NET (6): 33.0019
ADO.NET (7): 28.0016
ADO.NET (8): 27.0015
ADO.NET (9): 22.0013
ADO.NET (Average): 26.2237222222222

Here the performance difference is that it takes 36x longer to run the Lightspeed version of the code. It is actually slightly less than I had expected, but still way, way higher than it should be. I cannot use this in our code if we are seeing queries like this take 36x longer. Not unless I set up a web farm with 10 more servers just to handle the extra load ...

Posted on Aug 08 2011

jeremy
3,400 posts

As part of the LINQ provider, before we can perform a projection on the server side LightSpeed needs to make a determination about if it can actually translate what is being asked for into a server side statement. In the case above we dont have the support for translating the projection so it falls back on selecting out the entities and then running the projection on the client. Ill have a look at the specifics as to what is missing support wise and see if we can add that in - no promises though.

In terms of the performance comparison, you are comparing apples with oranges. ADO.NET is not an ORM nor does it have to parse a LINQ expression tree.

I presume you are running the same exact statement as LightSpeed generates (rather than the one EF generates) in the comparison, so the difference between the numbers you are seeing will represent the time taken to parse the LINQ query (which is not insignificant so compiled queries would likely help with subsequent runs) and then determine what SQL needs to be generated to load the entity and any related entities and then in this particular case materialize the entities and finally perform the projection client side.

If you are happy to be working with a data reader then you can fall back on ADO.NET by using the UnitOfWork.PrepareCommand(IDbCommand) call to attach the command to the same connection that the UnitOfWork has open and then execute that to return a DataReader, so you might want to consider that if that fits with what you are doing.

Jeremy

Posted on Aug 08 2011

KendallB
97 posts

When I debug through the code I can see that it is failing on this line:

if (!ContainsOnlyPropertiesOfEntityTypes(Projectors[Projectors.Count - 1]))

in SingleQueryPlan.cs, so I assume that is why it is not able to project natively. Which is odd, because the stripped down query I posted in my original post is pretty simple, so it is not clear to me why it cannot be converted to a native projection query? Clearly fixing that should have a positive effect on the performance.

However it is not entirely clear to me if these kinds of LINQ queries are bringing in data and keeping track of all the materialized entities in the identity map or not. As I said before, for this kind of query keeping track of the entities in the identity map is just going to add a lot of extra overhead that is not necessary, because we won't be using those entities in the same unit of work anyway (one or two of them might be used in a separate unit of work in a totally different HTTP request). When I debug the code, before I call the query the Unit Of Work has zero entities in it. When I complete the request, it has 58 items in the identity map, and the result of the query in my unit test produces 52 entries in the projected DTO entity. So clearly we have a worst case scenario where more data than necessary is loaded into into memory (and tracked in the identity map) and then the final projection is done in memory.

I know comparing against ADO.NET is an Apples to Oranges comparison, but it is valid because it lets you know the 'goal' performance for an ORM. The goal of an ORM should be to perform the queries as fast as possible, and try to approach the speed of ADO.NET. Clearly it will never be possible to match the speed, because the ORM does stuff that is not done in an ADO.NET layer (kind of like comparing C to hand coded assembler). But it is a good goal to have in mind, because as a developer, if I am faced wit a tool that can make me more productive and allow us to write code faster, but it takes a 40x performance hit, we simply cannot use it. If the performance hit is more reasonable, say 2-3x, now we are in the ballpack.

The way I look at it, the ORM has three things that it strives to do:

1. Simplify modelling the database and translating the database tables into object model entities

2. Simplify writing CRUD operations to the database

3. Simplify writing queries against the database by allowing you to query the object space, rather than the physical tables

Although sometimes I am not entirely sure if LINQ syntax actually makes things clearer, or just makes them more confusing!

Most important though, is that if we have to take a 40x performance hit to do those three things above, the solution is not going to be very useable. We need it to be faster. So the way I look at it, there are three things internally that a LINQ provider ORM does where performance is critical:

1. Compiles LINQ queries into SQL statements

Clearly as the support for complex LINQ statements and complex model mappings get added, the complexity of turning LINQ statements into SQL grows and the performance suffers. Compiled LINQ queries with a query cache will probably make a big different here, and eliminate a lot of the overhead of building the SQL for subsqequent queries. Lightspeed is already one of the fastest ORM's I have tested at building the LINQ queries, and I expect it will be really good with compiled LINQ.

It would be nice to get some kind of warning though if something causes a LINQ statement to perform a memory operation, rather than doing it silently. If my LINQ statements cannot be translated entirely to SQL, I don't want them executing. If something needs to run in memory, I want to be the one that controls that so I know the performance overheads associated with it.

2. Execute the SQL from the LINQ query and materialize the entities

This is something where Lightspeed is pretty fast (much faster than Entity Framework), but I still think there is room for improvement in this area using techniques similar to what PetaPoco does. The goal would be to get the objects out of the database as quickly as possible, and building a compiled entity factory for each LINQ statement would make a lot of sense. Perhaps a compiled entity factory could be compiled and cached alongside the compiled LINQ query in the new compiled query cache, so the second time a q

Posted on Aug 09 2011

KendallB
97 posts

Grr. Not sure what happened there. I guess I hit a wrong key and submitted my post! Anyway. to finished up:

Perhaps a compiled entity factory could be compiled and cached alongside the compiled LINQ query in the new compiled query cache, so the second time a query is called and it is in the cache, both the compiled SQL along with a compiled entity factory can be used to both execute the SQL quickly and return the entities quickly.

3. Managing entities in the identity map. The primary purpose of the identity map is to keep track of what entities are already loaded, so that you can make changes to them and them simply save the unit of work and those changes will get updated. This is pretty important, and a fundamental part of the way the ORM's work. But I think it is a mistake to materialize entities and stick them into the identity map when the RESULT of a query is a projection, and not native entities. In this case, I am really trying to just run a SQL query against the database and get results out, and I want it done as fast as possible. So I want the LINQ compiled into SQL, executed, and then my projected entities returned to me without the identity map involved. Otherwise we are just adding a whole bunch of extra overhead to manage the identity map when the entities being tracked in it simply won't be touched.

Yes, if I run a query to find all the entities that match a particular query, and I select real entities out of my query, then I would want them tracked in the identity map because it is likely I am doing this specifically to make changes to each entity as it is loaded, and probably want to then submit them back.

I can understand that there might be situations where I might then run a query against the database in a loop where some of the entities I am querying are in the identity map, but the extra overhead of making sure the same entity values are returned in my projection is simply too high IMHO. I would rather require that I have to call SaveChanges() to flush the changes to the database than incur the overhead of the entity map being managed in this case.

Posted on Aug 09 2011

jeremy
3,400 posts

In LightSpeed the LINQ provider is a translator between LINQ expressions and our underlying querying API so it is not going to translate directly into SQL. You will naturally gain some performance by using the underlying querying API rather than going through LINQ - that said it can be a lot more natural to express your queries using LINQ so its a bit of a trade-off for developer productivity.

In general we have avoided where possible the use of client side execution but there has to be a balance struck when it comes to projections. Because LightSpeed is entity centric it is intending to operate over entities, so expressing projections in terms of entities may often mean we need to perform that projection client-side to take into account lazy-loaded collections and the like. Not doing so would cause most queries to fail which would not be an acceptable outcome.

We are doing some work on upping the profiling in LightSpeed over the next wee while so the suggestion of raising a warning when a client side projection occurs is a good one and we could raise this as part of the profiling information which would allow you to keep track of things. That said, just watching the SQL is also a good way of tracking this as you are already doing.

In terms of the identity map, yes - any entities which are materialized will be added to the identity map so they can be tracked. From LightSpeeds perspective this work is done within the querying engine (in Mindscape.LightSpeed.dll) which is not aware of the LINQ provider, so while you may have used it as a projection an entity was still loaded so it still needs to be tracked. There is a deliberate disconnect between the LINQ provider and the core querying engine while we maintain .NET 2.0 compatibility.

Hope that provides a bit more info!

Jeremy

Posted on Aug 09 2011

KendallB 97 posts	I guess maybe what I need is a really fast LINQ to SQL engine, combined with a really fast object materializer and no identity map. That would get me the performance I need, without losing too much of the benefits of an ORM (relationships would need to be managed directly with foreign keys and entities saved separately). Posted on Aug 09 2011

john-daniel
728 posts

Hi Kendall,

Just to note - you might not want an identity map in this situation but you'll be throwing away a HUGE amount of performance if you don't have one. Querying every single time you need an object is extremely wasteful. We do agree in this specific situation that we can improve the query, so please don't take this reply as saying we won't look at improving it. However typically performance gains in software development do not come from micro optimisations through saving milliseconds at the query level - it comes from the overall architecture of the solution.

Throwing away an identity map and just issuing queries exactly as you want them will not give you the performance you need. The caching layers on top are where the real performance gains are to be made. Large scale sites do not handle load by simply improving the query cost - they get rid of the queries all together if possible by caching those objects in things like an identity map and second level caches.

My concern here is that you're taking a very focused look at the millisecond count on a query and not enough on the overall performance picture. You should be investigating what makes sense for entity caching, page output caching, how long your unit of work runs for (if it's per request as recommended for web apps I'd be alarmed if you demanded to not use the identity map). Those areas are where you will dramatically improve your system performance - not just shave a few milliseconds off a query.

As you've seen to date (and this will only improve with compiled queries) - LightSpeed has a fast LINQ to SQL translation. It also has crazy fast object materalization. I suspect if we didn't have the identity map and other caching options it would only be a matter of time before you saw the benefits of having them and would be wanting support for that also.

If you have very special situations where you absolutely must have a query the way you need it and you want to ignore the identity map then you can use a FindBySql or Stored Procedure.

John-Daniel Trask

Posted on Aug 09 2011

KendallB
97 posts

I have thought about this a lot over the weekend, and you hit the nail on the head. This IS about performance, and the arguments both for and against and identity map will also revolve around performance. Getting rid of it gains me performance right now, in that my queries right much faster against the DB, at the expense of always running the query when you need an entity. Having an identity map means you gain performance later, when you need to materialize the same entity directly using a FindById() call for instance.

But the key is that with any large scale application, you are going to get the larger performance gains with caching, period. But you will get the largest performance gains using caching by information at the application level, rather than trying to do it at the ORM level. We already have quite a few layers of caching built into our code at the application level, and especially with web sites, if you identify something that is slow you can work out what components of what is being displayed are invariant across page requests, and build a system to cache those values. But I am not entirely convinced that caching at the ORM level is going to be win in the long run, when individual queries take 40x as long to execute? We need to get that performance difference down.

However bear in mind that with all the ORM's I have seen to date, and as I mentioned before, I don't think queries are going to get executed against the identity map entities before it gets executed in SQL. From what I have seen, the SQL is always generated and always executed, but the resulting data that is pulled out of the database is matched up against the entity map. I have yet to see a query short circuit and not actually hit the database. I am not sure if that will even happen with a 2nd level cache implementation either, at least not in OpenAccess (EF does not have a 2nd level cache yet). So in that case, there is no caching advantage to having the identity map there if you are running LINQ queries, since it is going to hit the database with SQL anyway.

Maybe that is something that is different in Lightspeed, in that the query will be executed against the 2nd level cache if it is present? I can't see how a query could be completely eliminated from hitting the database, unless the 2nd level cache is doing some aggressive caching on entire tables?

I have not yet played with stored procs and Lightspeed; I assume that when I map a stored proc to Lightspeed and execute it, it will just materialize the results into an entity that maps to the stored proc columns and never hits the identity map? If so, I will make a stored proc to execute this particular query that is causing me problems and see how it stacks up performance wise. That might be a good compromise to put the important queries like this into stored procs.

But of course this particular query can be a whole lot better if you guys can work out why it is doing the projection in memory somehow? Or is that simply not possible due to the translation from LINQ to the Lightspeed query language? Maybe you need a LINQ to SQL implementation for better performance in the LINQ case, rather than building a query to run against the older query language?

Posted on Aug 09 2011

jeremy 3,400 posts	Its performing the projection in memory because there we are making the determination that the projection cannot be supported, which is indeed the case, because the projection builder does not know how to handle conditional expressions. I will be having a look at adding in some support for conditionals and will update you once we have some progress on this. Jeremy Posted on Aug 09 2011

KendallB 97 posts	Those are necessary because of the left joins. Originally I just selected the values but I was getting null exceptions. Is there another way to express this without the conditionals to project the results of the left joins? I also tried navigation properities but this crashed also. Posted on Aug 09 2011

jeremy 3,400 posts	Hi Kendall, Just a quick update to let you know we have added in the support for handling conditionals for server side projections and this will be available in the next nightly build. Jeremy Posted on Aug 10 2011

KendallB 97 posts	Excellent. I am out of the office until Monday so I will give this a spin then. Posted on Aug 10 2011

KendallB 97 posts	I tried the most recent nightly build, but it made no difference to this query. It still cannot be projected natively? Posted on Aug 16 2011

jeremy 3,400 posts	Are you able to send over a small repro of your model and the associated query? A query with the same syntax is working correctly here in our test suite so perhaps there is some difference associated with your model that is affecting this. Jeremy Posted on Aug 16 2011

KendallB 97 posts	Ivan should already have my model and test database, and the query was posted at the start of the thread that can be run against that model. My model has not really changed at all from what I sent him before, but I can send an updated test case with that specific query in it if you would like? Posted on Aug 16 2011

jeremy 3,400 posts	The model we currently have doesnt build because of the circular references, presumably you have corrected this so could you send through the updated one you are working off just so we are working off the same model :) Thanks! Jeremy Posted on Aug 16 2011

Excessive select data in Lightspeed for LINQ query?

Data Products

DevOp Tools

Visual Controls

Popular Products

Quick Links