下面是BigQuery标准SQL
有太多的方法可以做到这一点,下面是快速(首先想到)一:
#standardSQL
WITH temp AS (
SELECT "john.doe@company-Y.com" AS email_id UNION ALL
SELECT "hello.world@company-X.com" AS email_id
)
SELECT
SPLIT(SPLIT(email_id, '@')[SAFE_OFFSET(0)], '.')[SAFE_OFFSET(0)] firstname,
SPLIT(SPLIT(email_id, '@')[SAFE_OFFSET(0)], '.')[SAFE_OFFSET(1)] lastname,
SPLIT(SPLIT(email_id, '@')[SAFE_OFFSET(1)], '.')[SAFE_OFFSET(0)] company
FROM temp
以结果
Row firstname lastname company
1 john doe company-Y
2 hello world company-X
但真正正确的解决方案将取决于数据的性质和模式,以及明显的个人偏好等。
另一个快速的选择是
WITH temp AS (
SELECT "john.doe@company-Y.com" AS email_id UNION ALL
SELECT "hello.world@company-X.com" AS email_id
)
SELECT
REGEXP_EXTRACT(email_id, r'^(.*?)[.@]') firstname,
REGEXP_EXTRACT(email_id, r'\.(.*?)@') lastname,
REGEXP_EXTRACT(email_id, r'@(.*?)\.') company
FROM temp
结果相同
只需稍微扩展一点—这样您就可以看到改进的方向—例如,如果名称是由bu分隔的,
.
或
-
WITH temp AS (
SELECT "john.doe@company-Y.com" AS email_id UNION ALL
SELECT "hello.world@company-X.com" AS email_id UNION ALL
SELECT "hello-world@company-X.com" AS email_id UNION ALL
SELECT "hello@company-X.com" AS email_id
)
SELECT email_id,
REGEXP_EXTRACT(email_id, r'^(.*?)[-.@]') firstname,
REGEXP_EXTRACT(email_id, r'[-.](.*?)@') lastname,
REGEXP_EXTRACT(email_id, r'@(.*?)\.') company
FROM temp
以结果
Row email_id firstname lastname company
1 john.doe@company-Y.com john doe company-Y
2 hello.world@company-X.com hello world company-X
3 hello-world@company-X.com hello world company-X
4 hello@company-X.com hello null company-X