12-05-2021 12:56 PM - last edited on 16-11-2021 7:39 PM by exa-Chris
I'm using Exasol 7.0.9.
The following query:
select
a,
first_value(a) over (),
last_value(a) over ()
from (values(1), (2), (3), (4)) as t(a)
order by a
Produces:
|A |FIRST_VALUE(T.A) OVER()|LAST_VALUE(T.A) OVER()|
|---|-----------------------|----------------------|
|1 |1 |1 |
|2 |1 |1 |
|3 |1 |1 |
|4 |1 |1 |
I think the LAST_VALUE
function should produce 4, assuming the semantics of empty OVER()
clauses for these functions is ORDER BY <function argument> ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
It would also be more consistent with NTH_VALUE
, which still produces 2 e.g. for NTH_VALUE(a, 2) OVER ()
Solved! Go to Solution.
14-05-2021 4:40 PM
select
a,
first_value(a) over (),
last_value(a) over ()
from (values(1), (2), (3), (4)) as t(a)
order by a
select
a,
(SELECT first_value(a) FROM (values(1), (2), (3), (4)) as t(a) GROUP BY NULL),
(SELECT first_value(a) FROM (values(1), (2), (3), (4)) as t(a) GROUP BY NULL)
from (values(1), (2), (3), (4)) as t(a)
order by a;
OVER()
in the query.However, the bug is not the existence of this optimization, but that it is used while FIRST_VALUE OVER ()
orNTH_VALUE OVER()
are present in the query together with LAST_VALUE OVER()
.LAST_VALUE/FIRST_VALUE
with an empty OVER()
clause.OVER()
with OVER(ORDER BY (SELECT 1))
.OVER(ORDER BY (SELECT 1) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
would be slightly faster in Exaol.ORDER BY
in an OVER
clause without window frame automatically leads to the default window frame: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
OVER()
clause need to have the same order.ORDER BY <function argument> ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
for OVER()
. In fact, as the computations in Exasol happen in parallel, there is no consistent order between two executions of the same query. You can see that if you execute your query on a larger data set multiple times:CREATE SCHEMA test;
CREATE OR REPLACE TABLE t1(a int);
CREATE OR REPLACE TABLE ten(col int);
INSERT INTO ten VALUES 0,1,2,3,4,5,6,7,8,9;
INSERT INTO t1 select a.col+10*b.col+100*c.col+1000*d.col from ten a,ten b,ten c,ten d;SELECT COUNT(*) FROM t1;
SELECT FIRST_VALUE(a), LAST_VALUE(a) FROM t1 GROUP BY NULL;
select
a,
first_value(a) over (order by (select 1)) f,
nth_value(a, 1) over (order by (select 1)) n1,
nth_value(a, 2) over (order by (select 1)) n2,
nth_value(a, 3) over (order by (select 1)) n3,
nth_value(a, 4) over (order by (select 1)) n4,
last_value(a) over (order by (select 1)) l
from t1
order by a limit 3;
There are several possible results. Here are just two, to illustrate the problem:
Result 1:
|A |F |N1 |N2 |N3 |N4 |L |
|---|---|---|---|---|---|------|
|0 |0 |0 |1 |2 |3 |4163 |
|1 |0 |0 |1 |2 |3 |4163 |
|2 |0 |0 |1 |2 |3 |4163 |
Result 2:
|A |F |N1 |N2 |N3 |N4 |L |
|---|------|------|------|------|------|------|
|0 |9182 |9182 |9183 |9184 |9185 |8227 |
|1 |9182 |9182 |9183 |9184 |9185 |8227 |
|2 |9182 |9182 |9183 |9184 |9185 |8227 |
The results are consistent but not deterministic.
As usual I will add a bug ticket in our internal issue tracker :).
Best wishes
Georg
12-05-2021 1:06 PM
Hi @lukaseder ,
why would you assume the stated semantic ? It feels a bit like indeterministic territory here because if you don´t specify an order ( order by in OVER() clause is always something else compared to order by for the RESULT set ) you don´t get an order ... or am I wrong here ?
Cheers,
Malte
12-05-2021 1:11 PM
just to add on that:
was a bit surprised with the last_value not being taken from the entire set...need to think on that, seems strange....
12-05-2021 1:20 PM - edited 12-05-2021 1:26 PM
https://docs.exasol.com/sql_references/functions/alphabeticallistfunctions/last_value.htm
The passage above the marked one also clears up my surprise with the last_values not considering the entire set...they are actually computed locally...in all honesty I´d have expected otherwise, but here we are.
12-05-2021 1:37 PM - edited 12-05-2021 1:42 PM
My expectation is by intuition here. Of course, it is better to be explicit about ordering, but since it isn't required by the parser (in some RDBMS it is, but the presence of ORDER BY
does not necessarily guarantee determinism), then I'd expect it to be at least consistent with the ordering of FIRST_VALUE
and NTH_VALUE
:
select
a,
first_value(a) over () f,
nth_value(a, 1) over () n1,
nth_value(a, 2) over () n2,
nth_value(a, 3) over () n3,
nth_value(a, 4) over () n4,
last_value(a) over () l
from (values(1), (2), (3), (4)) as t(a)
order by a
Yields:
|A |F |N1 |N2 |N3 |N4 |L |
|---|---|---|---|---|---|---|
|1 |1 |1 |2 |3 |4 |1 |
|2 |1 |1 |2 |3 |4 |1 |
|3 |1 |1 |2 |3 |4 |1 |
|4 |1 |1 |2 |3 |4 |1 |
So, to me, intuitively, either the NTH_VALUE
calculations are inconsistent, or the LAST_VALUE
one is. If the 4th value is 4, and the 5th value would be NULL
, then the 4th value appears to be the "last" value, no?
Also, that's exactly what PostgreSQL is doing 🙂 https://dbfiddle.uk/?rdbms=postgres_13&fiddle=0d2f2ccd0b5139a51ad624550aa4d3fc, and MySQL as well: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=80eb53041ace73c36602caf2024d99da and Oracle as well: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=ab548eebbd709da7d73abb4a3af5a657
12-05-2021 3:48 PM
The only argument I´d see there is that from an EXA perspective:
"If the 4th value is 4, and the 5th value would be NULL
, then the 4th value appears to be the "last" value, no?"
nth_value(a, 4) over () n4
=> this will not (necessarily) yield the 4th value but some value , so comparing this with an equally non-deterministic call to last_value(a) over () seems misleading.
I´m with you on the "intuition" part, but still, implementation trumps intuition 🙂
Your line of thought seems to assume an order where there is none as far as I can see.
@Exa-* : "Also, that's exactly what(...)" => here´s a chance to increase compatibility to other systems 🙂
12-05-2021 5:52 PM
My line of thought assumes that irrespective of explicit or implicit (non-deterministic) order, there is an enumeration going on, just like with ROW_NUMBER
, and each ordinal is assigned to each row exactly once. In fact, I'm even assuming that for any window specification w
, NTH_VALUE(a, n) OVER w
will evaluate the expression a
for the row where ROW_NUMBER() OVER w = n
Or in other words, even if an order is non-deterministic, there's an accidental order, and ordinals can be assigned non-ambiguously per query execution. If this weren't the case explicit, but non-deterministic ordering also wouldn't work, such as ROW_NUMBER() OVER (ORDER BY (SELECT 1))
12-05-2021 6:36 PM
To pick up on your statement: "It would also be more consistent with NTH_VALUE
, which still produces 2 e.g. for NTH_VALUE(a, 2) OVER ()"
I think I did not change the general structure of your sql ( just added a few things left and right ), and nth_value(a,2) over() is by no means guaranteed to deliver "2":
I think the semantics here are more along the lines of "pick a card, any card" , as described here:
https://docs.exasol.com/sql_references/functions/analyticfunctions.htm
"If the analytic function does not contain an order_clause, Exasol does not sort the data within each partition. Without sorted data, the use of some analytic functions can cause non-deterministic results."
I hope I made sense here ? Sorry, in all honesty I did not entirely follow your last post, will have to read that a couple of times and maybe someone from EXA could chime in here 🙂
13-05-2021 10:01 AM
> and nth_value(a,2) over() is by no means guaranteed to deliver "2":
I did not imply that it is guaranteed to deliver "2". That was just the example I made. It produces the value of a at the 2nd position in the window frame, and in my example, the data happened to be ordered already (by accident).
Anyway, I just noticed this difference to almost all other RDBMS that we're testing jOOQ with (a total of 30, 22 of which support LAST_VALUE
). Among the ones that don't simply reject empty OVER ()
clauses in this case (and even in those cases, we generate OVER (ORDER BY (SELECT 1))
or something similar, so the non-deterministic semantics is ultimately the same), Exasol is the only one I've noticed to exhibit this behaviour, so I thought it might be a bug somewhere, and I'd let you know. The bug being that consistency is still possible and desired in the event of nondeterminism. E.g. in your latest screenshot, I can't reconcile how n1-n4 produce all the 4 distinct values, yet FIRST_VALUE
/ LAST_VALUE
are the same (the "last value, this time", i.e. the value of NTH_VALUE(a, 4)
). That just seems inconsistent to me.
13-05-2021 10:44 AM
> "The bug being that consistency is still possible and desired in the event of nondeterminism"
I always thought of nondeterminism as being unable to produce consistency , possible consistency ( i.e. accidental consistency ) isn´t what I´d call consistency.
> "I can't reconcile how n1-n4 produce all the 4 distinct values, yet FIRST_VALUE
/ LAST_VALUE
are the same (the "last value, this time", i.e. the value of NTH_VALUE(a, 4)
). That just seems inconsistent to me."
But it´s only inconsistent as long as we assume that the OVER (ORDER BY (SELECT 1)) yields the same order accross those functions, right ? In any case, I´m with you that all in all FIRST / LAST_VALUE shouldn´t yield the same value if there are more than 1 discrete values.
Just for me to get this ( sorry if I´m getting on your nerves with this ) :
Would you expect FIRST / LAST_VALUE to equal N1 & N8 , the first and last value of "something" ( because of the identical window clause ) or just "at least" different values given the structure of the data set as a whole ?
Someone at EXA told me they are going to post something in this thread tomorrow, looking forward to it 🙂
Then it's time to become part of a unique family! Discover helpful tips and support other Community members with your knowledge.
Sign In\n\t\t\t\t\t\tSorry, unable to complete the action you requested.\n\t\t\t\t\t<\/p>\n\t\t\t\t\n\n\t\t\t\t\n\n\t\t\t\t\n\n\t\t\t\t\n\t\t\t<\/div>\n\n\t\t\t\n\t\t<\/div>"; LITHIUM.AjaxSupport({"ajaxOptionsParam":{"event":"LITHIUM:lightboxRenderComponent","parameters":{"componentParams":"{\n \"triggerSelector\" : {\n \"value\" : \"#loginPageV2_299ff487eaf19d\",\n \"class\" : \"lithium.util.css.CssSelector\"\n }\n}","componentId":"authentication.widget.login-dialog-content"},"trackableEvent":true},"tokenId":"ajax","elementSelector":"#loginPageV2_299ff487eaf19d","action":"lightboxRenderComponent","feedbackSelector":false,"url":"https://community.exasol.com/t5/forums/v5/forumtopicpage.loginpagev2:lightboxrendercomponent?t:ac=board-id/GeneralDiscussions/message-id/1136&t:cp=authentication/contributions/actions","ajaxErrorEventName":"LITHIUM:ajaxError","token":"7OqM7oxWfCAXCMaEvwAEKieqAboib83r5KeHL3Ll44s."}); LITHIUM.Dialog.options['-622891615'] = {"contentContext":"authentication.widget.login-dialog-content","dialogOptions":{"trackable":true,"resizable":true,"autoOpen":false,"minWidth":710,"dialogClass":"lia-content lia-panel-dialog lia-panel-dialog-modeless-advanced","title":"","minHeight":200,"fitInWindow":true,"draggable":true,"maxHeight":600,"width":710,"position":["center","center"],"modal":false,"maxWidth":710},"contentType":"ajax"}; LITHIUM.Dialog({ "closeImageIconURL" : "https://community.exasol.com/skins/images/846A2A0EB0D4BBA1CCA2564AC77177B7/responsive_peak/images/button_dialog_close.svg", "closeEvent" : "LITHIUM:lightboxCloseEvent", "activecastFullscreen" : false, "defaultAriaLabel" : "", "clientId" : "loginPageV2_299ff487eaf19d", "accessibility" : false, "buttonDialogCloseAlt" : "Close", "triggerSelector" : ".lia-panel-dialog-trigger-event-click", "ajaxEvent" : "LITHIUM:lightboxRenderComponent", "dialogContentCssClass" : "lia-panel-dialog-content", "triggerEvent" : "click", "dialogKey" : "dialogKey" }); LITHIUM.Form.resetFieldForFocusFound(); LITHIUM.InformationBox({"updateFeedbackEvent":"LITHIUM:updateAjaxFeedback","componentSelector":"#informationbox","feedbackSelector":".InfoMessage"}); LITHIUM.InformationBox({"updateFeedbackEvent":"LITHIUM:updateAjaxFeedback","componentSelector":"#informationbox_0","feedbackSelector":".InfoMessage"}); LITHIUM.InformationBox({"updateFeedbackEvent":"LITHIUM:updateAjaxFeedback","componentSelector":"#informationbox_1","feedbackSelector":".InfoMessage"}); LITHIUM.InformationBox({"updateFeedbackEvent":"LITHIUM:updateAjaxFeedback","componentSelector":"#informationbox_2","feedbackSelector":".InfoMessage"}); LITHIUM.AjaxFeedback(".lia-inline-ajax-feedback", "LITHIUM:hideAjaxFeedback", ".lia-inline-ajax-feedback-persist"); LITHIUM.Placeholder(); LITHIUM.AutoComplete({"options":{"triggerTextLength":0,"updateInputOnSelect":true,"loadingText":"Searching...","emptyText":"No Matches","successText":"Results:","defaultText":"Enter a search word","disabled":false,"footerContent":[{"scripts":"\n\n;(function($){LITHIUM.Link=function(params){var $doc=$(document);function handler(event){var $link=$(this);var token=$link.data('lia-action-token');if($link.data('lia-ajax')!==true&&token!==undefined){if(event.isPropagationStopped()===false&&event.isImmediatePropagationStopped()===false&&event.isDefaultPrevented()===false){event.stop();var $form=$('