Hi,

We have performance issues with the following pattern, we don't understand why this is very slow in DataObjects and if it's a bug in DataObjects or if we are not using it properly.

The pattern we are trying to understand :

Document doc = Query.All<Document>().Single();
List<Fact> allFacts = doc.Facts.ToList();
foreach (var extract in doc.Extracts)
{
  var myFact = allFacts.Where(f => f.Extract == extract && f.Index == 2).Single();
}

Here, allFacts is a list, so we suppose all entities have already been read from database (even the field [Extract.Id]). The issue is with f.Extract == extract which triggers entity materializations (seen with a profiler) but we think it shouldn't be necessary, as we already have loaded it in doc.Facts.ToList(). Even using Prefetch doesn't solve the issue (doc.Facts.Prefetch(f=>f.Extract).ToList();). you'll see that in the following tests.

Could you please have a look and let us now what is the issue here ? Please note that we would really like to avoid having to use a join or a groupby pattern, as in the last tests, we, developers shouldn't have to worry about this.

Duration of the following tests :

   test1_subqueries_with_list                     [0:10.039] Success
   test2_subqueries_with_prefetch                 [0:10.986] Success
   test3_subqueries_with_prefetch_id              [0:11.227] Success
   test4_subqueries_with_key_compare              [0:10.454] Success
   test5_subqueries_with_id_compare               [0:18.222] Success
   test6_subqueries_with_group_by_with_list       [0:00.722] Success
   test7_subqueries_with_group_by_with_dictionary [0:00.170] Success
   test8_subqueries_with_join                     [0:00.192] Success

using System.Collections.Generic;
using System.Linq;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using Xtensive.Orm;
using Xtensive.Orm.Configuration;

namespace UnitTests
{
  [TestClass]
  public class SubQueriesPerfs
  {
    private static string connectionString = @"sqlserver://localhost\SQL2014/DO50-Tests";

    #region Model
    [HierarchyRoot]
    public class Document : Entity
    {
      [Field, Key]
      public long Id { get; private set; }

      [Field,Association(PairTo = "Document", OnOwnerRemove = OnRemoveAction.Cascade, OnTargetRemove = OnRemoveAction.Clear)]
      public EntitySet<Extract> Extracts { get; set; }

      [Field,Association(PairTo = "Document", OnOwnerRemove = OnRemoveAction.Cascade, OnTargetRemove = OnRemoveAction.Clear)]
      public EntitySet<Fact> Facts { get; set; }

      [Field]
      public string Name { get; set; }
    }

    [HierarchyRoot]
    public class Extract : Entity
    {
      [Field, Key]
      public long Id { get; private set; }

      [Field]
      public Document Document { get; set; }

      [Field,Association(PairTo = "Extract", OnOwnerRemove = OnRemoveAction.Cascade, OnTargetRemove = OnRemoveAction.Clear)]
      public EntitySet<Fact> Facts { get; set; }

      [Field]
      public string Name { get; set; }
    }

    [HierarchyRoot]
    public class Fact : Entity
    {
      [Field, Key]
      public long Id { get; private set; }

      [Field]
      public Document Document { get; set; }

      [Field]
      public Extract Extract { get; set; }

      [Field]
      public int Index { get; set; }
    }

    private static Domain CreateDomain(SessionOptions sessionOptions)
    {
      DomainConfiguration config = new DomainConfiguration(connectionString);
      config.UpgradeMode = DomainUpgradeMode.Recreate;
      config.Types.Register(typeof(Document).Assembly);

      SessionConfiguration defaultSessionConfig = config.Sessions.Default;
      if (defaultSessionConfig == null)
      {
        defaultSessionConfig = new SessionConfiguration(WellKnown.Sessions.Default);
        config.Sessions.Add(defaultSessionConfig);
      }

      defaultSessionConfig.Options = sessionOptions;

      return Domain.Build(config);
    }

    #endregion

    private static Domain domain;
    [ClassInitialize]
    public static void ClassInitialize(TestContext testContext)
    {
      SessionOptions sessionOptions = SessionOptions.ValidateEntities | SessionOptions.AutoActivation | SessionOptions.AutoSaveChanges | SessionOptions.NonTransactionalReads;
      domain = CreateDomain(sessionOptions);

      using (Session session = domain.OpenSession())
      {
        using (TransactionScope ts = session.OpenTransaction())
        {
          Document doc = new Document();

          for (int i = 0; i < 1000;i++)
          {
            Extract extract = new Extract {Document = doc};
            for (int j = 0; j < 10; j++)
            {
              Fact fact = new Fact(){Document = doc, Extract = extract, Index = j};
            }
          }
          ts.Complete();
        }
      }
    }

    [TestMethod]
    public void test1_subqueries_with_list()
    {
      using (Session session = domain.OpenSession())
      {
        using (TransactionScope ts = session.OpenTransaction())
        {
          Document doc = Query.All<Document>().Single();
          List<Fact> allFacts = doc.Facts.ToList();
          foreach (var extract in doc.Extracts)
          {
            var myFact = allFacts.Where(f => f.Extract == extract && f.Index == 2).Single();
          }
        }
      }
    }

    [TestMethod]
    public void test2_subqueries_with_prefetch()
    {
      using (Session session = domain.OpenSession())
      {
        using (TransactionScope ts = session.OpenTransaction())
        {
          Document doc = Query.All<Document>().Single();
          var allFacts = doc.Facts.Prefetch(f=>f.Extract).ToList();
          foreach (var extract in doc.Extracts)
          {
            var myFact = allFacts.Where(f => f.Extract == extract && f.Index == 2).Single();
          }
        }
      }
    }

    [TestMethod]
    public void test3_subqueries_with_prefetch_id()
    {
      using (Session session = domain.OpenSession())
      {
        using (TransactionScope ts = session.OpenTransaction())
        {
          Document doc = Query.All<Document>().Single();
          var allFacts = doc.Facts.Prefetch(f=>f.Extract.Id).ToList();
          foreach (var extract in doc.Extracts)
          {
            var myFact = allFacts.Where(f => f.Extract == extract && f.Index == 2).Single();
          }
        }
      }
    }

    [TestMethod]
    public void test4_subqueries_with_key_compare()
    {
      using (Session session = domain.OpenSession())
      {
        using (TransactionScope ts = session.OpenTransaction())
        {
          Document doc = Query.All<Document>().Single();
          var allFacts = doc.Facts.Prefetch(f=>f.Extract.Id).ToList();
          foreach (var extract in doc.Extracts)
          {
            var extractKey = extract.Key;
            var myFact = allFacts.Where(f => f.Extract.Key == extractKey && f.Index == 2).Single();
          }
        }
      }
    }

    [TestMethod]
    public void test5_subqueries_with_id_compare()
    {
      using (Session session = domain.OpenSession())
      {
        using (TransactionScope ts = session.OpenTransaction())
        {
          Document doc = Query.All<Document>().Single();
          var allFacts = doc.Facts.Prefetch(f=>f.Extract.Id).ToList();
          foreach (var extract in doc.Extracts)
          {
            var myFact = allFacts.Where(f => f.Extract.Id == extract.Id && f.Index == 2).Single();
          }
        }
      }
    }

    [TestMethod]
    public void test6_subqueries_with_group_by_with_list()
    {
      using (Session session = domain.OpenSession())
      {
        using (TransactionScope ts = session.OpenTransaction())
        {
          Document doc = Query.All<Document>().Single();
          var allFacts = doc.Facts.GroupBy(f => f.Extract).Select(f => new {Key = f.Key, Facts = f.Select(a=>a)}).ToList();
          foreach (var extract in doc.Extracts)
          {
            var extractFacts = allFacts.Single(f => f.Key == extract).Facts;
            var myFact = extractFacts.Single(f => f.Index == 2);
          }
        }
      }
    }

    [TestMethod]
    public void test7_subqueries_with_group_by_with_dictionary()
    {
      using (Session session = domain.OpenSession())
      {
        using (TransactionScope ts = session.OpenTransaction())
        {
          Document doc = Query.All<Document>().Single();
          var allFacts = doc.Facts.GroupBy(f => f.Extract).ToDictionary(g=>g.Key.Id, g=>g.ToList());
          foreach (var extract in doc.Extracts)
          {
            var entry = allFacts[extract.Id];
            var myFact = entry.Single(f => f.Index == 2);
          }
        }
      }
    }

    [TestMethod]
    public void test8_subqueries_with_join()
    {
      using (Session session = domain.OpenSession())
      {
        using (TransactionScope ts = session.OpenTransaction())
        {
          Document doc = Query.All<Document>().Single();
          var allFacts = 
             (from e in doc.Extracts
              join f in doc.Facts on e equals f.Extract 
              into facts
            select new {Extract=e, Facts=facts}).ToList();

          foreach (var extract in doc.Extracts)
          {
            var extractFacts = allFacts.Single(f => f.Extract == extract).Facts;
            var myFact = extractFacts.Single(f => f.Index == 2);
          }
        }
      }
    }
  }
}

asked Mar 12 '18 at 10:01

Benoit%20Nesme's gravatar image

Benoit Nesme
43202024

edited Mar 13 '18 at 04:52


2 Answers:

Hello Benoit Nesme,

If you look carefully you will see that in tests from 1 to 5, which are really slow, you use .Where() extension for IEnumerable<fact> instance (List<fact> actually). Filtration using .Where() for Enumerables will give convenient results on small numbers of elements and from some numbers it becomes a nightmare.

You fill the database with 1000 instances of Extract entity and per each Extract entity you create 10 instances of Fact entity. Overall, you have 10 000 Fact entities.

Now lets look at the first test, for example

    Document doc = Query.All<document>().Single();
    List<fact> allFacts = doc.Facts.ToList();// 10 000 items
    foreach (var extract in doc.Extracts) { // 1 000 iterations
      // 10 000 iterations
      var myFact = allFacts.Where(f => f.Extract == extract && f.Index == 2).Single();
    }

This piece of code performs 10 000 000 iterations to find actually 1000 Fact instances! It is enormous waste of time. That's why you have such long test execution time. By the way, only 3 queries executed.

Getting rid of List makes it ~10x faster (at lease on my PC) - it dramatically decreases iterations count but increases queries to database. For example

    Document doc = Query.All<document>().Single();
    foreach (var extract in doc.Extracts) { // 1 000
      // one query instead of 10 000 iterations
      var myFact = doc.Facts
        .Where(f => f.Extract == extract && f.Index == 2).Single();
    } 

But the fastest way is to use queries and let SQL Server perform filtration and give you only data you need. It also allows to save some memory.

answered Mar 14 '18 at 06:47

Alexey%20Kulakov's gravatar image

Alexey Kulakov
77225

Thanks Alexey, Exactly, here I am using ToList on purpose, trying to solve the slowness of many SQL subqueries call we had in our code. However, you're right, I didn't chose properly the number of items, and the .Where here in list is the problem.

I got mislead by the profiler, it showed the time was spent in calling get_Extract() (from your weaving code), then building the key to use the Where.

What I can conclude is that sometimes using .ToList() and .Dictionary() can be much faster than many subqueries, it depends on the number of Extract, Facts and subqueries done in the loop.

(Mar 14 '18 at 10:39) Benoit Nesme Benoit%20Nesme's gravatar image

The tests do not show real processing of data. Depending on query results processing, ways of getting data from database may vary. Of course, 1000 queries in the loop of Extracts is not a good idea because it is a lot of queries and number will grow with new Extracts. The ideal case is getting all needed data by one query and then processing results in foreach body one by one. Sometimes it is hard to do and extra queries appear in favor of convenience.

(Mar 15 '18 at 09:05) Alexey Kulakov Alexey%20Kulakov's gravatar image

Yes you are right. What happens is we spent years without caring much about performances using DO.NET, but we realise now that there is a cost, and some of our code has to be udpated to minimize round trips to database, because we need performances in some scenarios. We do have some portions of code where we are retrieving the same values from the database over and over, in a loop iteration (hidden in successives function calls) ; or two columns of the same table in two queries.

(Mar 15 '18 at 09:16) Benoit Nesme Benoit%20Nesme's gravatar image

Here the benchmarks, showing when it's interesting to preload data in a dictionary instead of executing many subqueries in the loop alt text

answered Mar 14 '18 at 11:09

Benoit%20Nesme's gravatar image

Benoit Nesme
43202024

edited Mar 14 '18 at 11:11

Your answer
Please start posting your answer anonymously - your answer will be saved within the current session and published after you log in or create a new account. Please try to give a substantial answer, for discussions, please use comments and please do remember to vote (after you log in)!
toggle preview

Subscription:

Once you sign in you will be able to subscribe for any updates here

Tags:

×574
×13
×11

Asked: Mar 12 '18 at 10:01

Seen: 2,626 times

Last updated: Mar 15 '18 at 09:16

powered by OSQA