Fully managed cloud. 30-day FREE full featured trial. Start Now
cancel
Showing results for 
Search instead for 
Did you mean: 

Extract the first name before a space.

3_stripes
Contributor

Hello everyone.

First apologies, but I am not sure how to create an example here.

Anyways, I believe my question to be fairly simple.

Let's say I have multiple names:

 

Cristiano Ronaldo

Leonel Messi

Zidane

 

Now, when I am doing the query, I want to have a temporary column that only captures the first name.

So I want to have

Cristiano

Leonel

Zidane.

 

So the code should look something like this:

Select footballer_name, "query to give me first name only" as first_name

from fifa_table

I tried to make use of charindex, but it seems it is not available on Exasol. I tried to use regex and substring but to my avail, I did not manage. 

Help me Obi-wan, you are my only hope.

 

Best regards

1 ACCEPTED SOLUTION

Accepted Solutions

PeterK
Xpert

Hi @3_stripes ,

This should do it: 

 

 

SELECT  REGEXP_REPLACE(c,'\s.*','')
FROM (VALUES 'Cristiano Ronaldo','Leonel Messi','Zidane') t(c)

The force be with you 🙂

Peter

View solution in original post

4 REPLIES 4

PeterK
Xpert

Hi @3_stripes ,

This should do it: 

 

 

SELECT  REGEXP_REPLACE(c,'\s.*','')
FROM (VALUES 'Cristiano Ronaldo','Leonel Messi','Zidane') t(c)

The force be with you 🙂

Peter

mwellbro
Xpert

Hi all,
just because @exa-Aleksandr reminded me a few posts back ( here  ) that regexp* versions of functions might turn out more expensive in terms of performance than alternative solutions I just ran an example to compare something like that for a case similar to the one stated here:

mwellbro_0-1644444842293.png

From bottom to top:
3) create table for the test data, always one first name, blank , one last name ( probably more simplistic than in the fifa_table )
2) first test with the regexp-based approach
1) second test using a substr + instr combination

While CPU usage is roughly the same, the throughput in terms of rows per time is quite something else - wouldn´t have expected that much of a difference to be honest but...if you don´t know, now you know 😉
Was run on a 2node m5.large AWS cluster, using EXA 7.1.4 ( as per my last downloaded exa cloud tools template ).

 

Cheers,
Malte




PeterK
Xpert

Hi @mwellbro 

Good idea to compare performance. 
Interestingly on our setup (3 node & 64cpu/node) they are roughly the same although substr does consistently run a bit faster.

STMT DURATION ROW_COUNT  CPU  SQL_TEXT 
---- -------- ---------- ---- ----------------------------------------------------------------------------------------------------------
1 19.498 1000000000 29.0 create or replace table t1 as select insert(left(hash_md5(v),20),10,0,' ') name from values between 1 and 1e9 t(v)
4 13.566 1000000000 39.6 create or replace table t2 as select regexp_replace(name,'\s.*','') first_name from t1
6 12.481 1000000000 28.3 create or replace table t3 as select substr(name,1,instr(name,' ')-1) as first_name from t1

 The regexp clearly uses more CPU so if the system is CPU-bound it will run slower. I'm not entirely sure what it's bound by on this system... both net and hdd_read were zero and resources was 100% but cpu was ~30%    :shrug:

Cheers

Peter

P.S. It still blows my mind that Exasol can process 1 billion regexp substitutions in 13s. That's almost 100M per second! ‌‌

mwellbro
Xpert

Hi @PeterK ,

I like your test - tried to re-create your setup ( i.e. 3 nodes, albeit 36 CPUs per node in my case ) to see if I can match your timings ( as my personal AWS budget is limited I wanted to do this slightly different then a simple cloudformation and use AWS Spot instances...the problem with that approach will be detailed here .

The 100% resources to ~30% CPU would seem to indicate that there is some limit as to how the processing is distributed (coincidence that it´s roughly a third on a 3 node cluster ? I´ll check that as well once I´ve solved the Spot instance "problem" ) or 
how the units of work can be processed on the given nodes...

Well, anyhow, thanks again for adding to the "knowledge pool" 😊

Cheers,
Malte