PeterK
Xpert
Status: New Idea

Given a query like this: 

 

SELECT 1
FROM nonVirtual.t1
JOIN virtual.t2 ON t1.id = t2.id
JOIN virtual.t3 ON t2.id = t3.id

will result it 2 separate pushdowns to the virtual schema:

 

SELECT id FROM virtual.t2;
SELECT id FROM virtual.t3;

This is ok for local virtual schemas however it is bad performance for remote virtual schema. It is possible, for example, that the join between t2 and t3 greatly reduces the number of rows and we want that row reduction to happen on the remote side, not on the local side.

So the idea here is for an new virtual schema metadata flag that tells the optimizer whether or not the virtual schema prefers fewer consolidated push downs.

So in this case the virtual schema could indicate that it prefers consolidated pushdowns and the optimizer would send:

 

SELECT t2.id, t3.id FROM virtual.t2 JOIN virtual.t3 ON t2.id = t3.id

Currently we have to tell users to work around this by writing the query as

 

SELECT 1
FROM nonVirtual.t1 (
  SELECT t2.id
  FROM virtual.t2
  JOIN virtual.t3 ON t2.id = t3.id
  ORDER BY FALSE
) ON t1.id = t2.id

however this rewriting is cumbersome and in the case of tools sometimes it is not possible.

1 Comment
drumcircle
Single Node Master

Agreed this would be high value wherever virtual schema is used.  It would seem like this would be a relatively minor tweak.