Hive/Presto/标准SQL实现行转列和列转行_soaring0121的专栏-程序员宅基地

程序员宅基地 · · 2794 次点击 · · 开始浏览    
这是一个创建于 的文章,其中的信息可能已经有所发展或是发生改变。

     假设有以下两个表格,分别为vtable和htable,期望使用SQL实现相互转化,本文将展示如何分别使用标准SQL、Presto、和Hive实现。

vtable
uid key value
101 c1 11
101 c2 12
101 c3 13
102 c1 21
102 c2 22
102 c3 23
htable
uid c1 c2 c3
101 11 12 13
102 21 22 23

一、标准SQL实现

1.列转行(htable => vtable)

SELECT uid,
   max(CASE WHEN key = 'c1' THEN value END) AS c1,
   max(CASE WHEN key = 'c2' THEN value END) AS c2,
   max(CASE WHEN key = 'c3' THEN value END) AS c3
FROM vtable
GROUP BY uid

 2. 行转列(vtable => htable)

SELECT uid, 'c1' AS key, c1 AS value FROM htable
UNION ALL
SELECT uid, 'c2' AS key, c2 AS value FROM htable
UNION ALL
SELECT uid, 'c3' AS key, c3 AS value FROM htable

二、Hive实现

1. 列转行(vtable => htable)

SELECT
  uid,
  kv['c1'] AS c1,
  kv['c2'] AS c2,
  kv['c3'] AS c3
FROM (
  ----一般情况下是用',' ':'分隔符生成{k1:v1,k2:v2}样式的map
  ----这里使用\002,\003作为分割符可以防止key value中的 , :影响map的正确分隔
  SELECT uid,
    str_to_map(concat_ws('\002', collect_list(concat(key, '\003', value))), '\002', '\003') kv
  FROM vtable
  GROUP BY uid
) t

2. 行转列(htable => vtable)

SELECT t1.uid, t2.key, t2.value
FROM htable t1
LATERAL VIEW explode (map(
  'c1', c1,
  'c2', c2,
  'c3', c3
)) t2 as key, value

三、Presto实现

 1. 列转行(vtable => htable)

SELECT
  uid,
  kv['c1'] AS c1,
  kv['c2'] AS c2,
  kv['c3'] AS c3
FROM (
  SELECT uid, map_agg(key, value) kv
  FROM vtable
  GROUP BY uid
) t

2. 行转列(htable => vtable)

SELECT t1.uid, t2.key, t2.value
FROM htable t1
CROSS JOIN unnest (
  array['c1', 'c2', 'c3'],
  array[c1, c2, c3]
) t2 (key, value)

关于PRESTO的UNNEST

UNNEST 用于将 ARRAY 或 MAP 扩展到关系表中。ARRAY中得元素会被放到一个单独的列中,而MAP中的元素会被放到两个列中(key, value)。 UNNEST可以选择性的添加WITH ORDINALITY语句,这样可以在关系表的后面增加一个序号列。

1. 单个数组列:

SELECT student, score
FROM (
  VALUES
    ('LiMing', ARRAY[20, 50]),
    ('WangGang', ARRAY[70,80,90])
) AS x (student, scores)
CROSS JOIN UNNEST(scores) AS t (score);
  student  | score
-----------+---
 LiMing    | 20 
 LiMing    | 50 
 WangGang  | 70 
 WangGang  | 80 
 WangGang  | 90
(5 rows)

2. 多个数组列:

SELECT numbers, animals, n, a
FROM (
  VALUES
    (ARRAY[2, 5], ARRAY['dog', 'cat', 'bird']),
    (ARRAY[7, 8, 9], ARRAY['cow', 'pig'])
) AS x (numbers, animals)
CROSS JOIN UNNEST(numbers, animals) AS t (n, a);
  numbers  |     animals      |  n   |  a
-----------+------------------+------+------
 [2, 5]    | [dog, cat, bird] |    2 | dog
 [2, 5]    | [dog, cat, bird] |    5 | cat
 [2, 5]    | [dog, cat, bird] | NULL | bird
 [7, 8, 9] | [cow, pig]       |    7 | cow
 [7, 8, 9] | [cow, pig]       |    8 | pig
 [7, 8, 9] | [cow, pig]       |    9 | NULL
(6 rows)

3. WITH ORDINALITY 语句:

SELECT numbers, n, a
FROM (
  VALUES
    (ARRAY[2, 5]),
    (ARRAY[7, 8, 9])
) AS x (numbers)
CROSS JOIN UNNEST(numbers) WITH ORDINALITY AS t (n, a);
  numbers  | n | a
-----------+---+---
 [2, 5]    | 2 | 1
 [2, 5]    | 5 | 2
 [7, 8, 9] | 7 | 1
 [7, 8, 9] | 8 | 2
 [7, 8, 9] | 9 | 3
(5 rows)

4. Map+ARRAY:

SELECT
    animals, a, n
FROM (
    VALUES
        (MAP(ARRAY['dog', 'cat', 'bird'], ARRAY[1, 2, 0])),
        (MAP(ARRAY['dog', 'cat'], ARRAY[4, 5]))
) AS x (animals)
CROSS JOIN UNNEST(animals) AS t (a, n);
           animals          |  a   | n
----------------------------+------+---
 {"cat":2,"bird":0,"dog":1} | dog  | 1
 {"cat":2,"bird":0,"dog":1} | cat  | 2
 {"cat":2,"bird":0,"dog":1} | bird | 0
 {"cat":5,"dog":4}          | dog  | 4
 {"cat":5,"dog":4}          | cat  | 5
(5 rows)

 

2794 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传