For a unique identifier DO used primary key as the CLUSTERED index. Maybe this is not correct, because identifiers may not require clustering. And when using primary key GUID (random 128 bit value) clustered index is DECREAS performance. I think you should either: 1. Not use clustering of primary key OR 2. Either learn how to manage it, pointing out which index will be clustered.

This thread was imported from our support forum. The original discussion may contain more detailed answer. Original topic by joshua.

asked May 24 '10 at 07:41

Editor's gravatar image


One Answer:

Alex (Xtensive) wrote:

As I remember, non-clustered PK means SQL Server will use some surrogate key internally (like ROWID in Oracle) to identify table rows, and primary index, in fact, turns into regular index mapping PK to ROWID. So in this case any fetch (lookup of a row by PK) requires 2 lookups instead of one (PK -> ROWID -> row); so joins by PK are also noticeably less effecient.

I know about issue with indexing GUIDs - but much better solution is sequential GUID generation:

Now let's return back to case with ROWID (non-clustered PK):


  • If you'd replace it (hidden ROWID) by your own sequential PK, you'd use it everywhere and avoid e.g. issue with joins;

  • Likely, it will be smaller then original non-sequential PK - that's good as well;

  • On contrary, when non-clustered PK is used, you're loosing both these benefits.


  • Most likely, you'll need unique index for your original PK, that will affect on CRUD performance.

So IMHO it's quite desirable to avoid non-clustered PKs - at least from the point of performance. That's why we implemented just this option :twisted:

But I fully agree that e.g. in legacy schemas (created by DBAs, not by DO) it must be supported. But in Legacy upgrade mode DO4 really just uses your schema - so there can be non-clustered PKs. Resume:

  • If you're building model & schema relying on DO4, most likely, you simply don't need non-clustered PKs

  • Otherwise you use Legacy mode, where it is fully supported (= allowed).

otto wrote:

On the other hand, clustered indexes are not handled the same way on each DBMS. For example in PostgreSQL clustering is a separate operation: It is just sorting a table phisically according to an index order. This operation should be performed periodically, because this phisical order is not maintained automatically. So not using clustering for PK here does not have similar consequences. But using it for other indexes may be advantagous.

xumix wrote:

  • If you're building model & schema relying on DO4, most likely, you simply don't need non-clustered PKs Nope. For eg. Master table -> Dependant one. ~100 master records ~100000 dependant So i want clustering on the Master.Id column, not the Id one

Alex (Xtensive) wrote:

Agree about usefulness of clustering over any index, although implementation of this feature in PostgreSQL seems a bit strange to me. The fact that clustering isn't maintained in time, although you can cluster the table as you want means they always keep the data separately from indexes (i.e. in fact, any index is non-clustered), although pages containing actual rows can be organized in any fashion you like.

We'll consider adding support for clustering by arbitrary index, as well as disabling default clustering by PK.

Alex (Xtensive) wrote:

Btw, if physical clustering by arbitrary key is really important (as in your example), it can be achieved by replacing clustered PK as follows: NewPrimaryKey = (ClusteringKey, OldPrimaryKey). On the other hand, this implies you must use new PK instead of the old one, that isn't always convenient.

answered May 25 '10 at 01:29

Editor's gravatar image


Your answer
Please start posting your answer anonymously - your answer will be saved within the current session and published after you log in or create a new account. Please try to give a substantial answer, for discussions, please use comments and please do remember to vote (after you log in)!
toggle preview

powered by OSQA