Best Practices for Join Operations

  • Joins are well optimized in the join engine. They are normally faster than Joins in the Calculation Engine. JOIN should be performed in Analytic/Attribute view instead of Calculation View (though in some cases that’s the only way).
  • CE_JOIN is limited to join only two tables. Cascaded Joins are very expensive and not always lead to right results.
  • Left outer join is normally faster than inner join.
  • Left outer join is almost equally fast as Referential Join.
  • Currently one single join operation is executed by 1 core at maximum (Multiple Join are parallelized across multiple cores).
  • Avoid joins on calculated fields, e.g. concatenating two fields into one and execute a join on it. Materialize the concatenated field (if not done by the engine)
  • Avoid non-equijoin predicates on column table as column engine does not natively support join predicates other than equality condition.

No comments:

Post a Comment