SELECT count(1) as cnt, toDate(toStartOfDay(toDateTime(event_ms))) as date, age FROM event_analytics WHERE event = “view_picture_book_home_page” AND city = “beijing” AND event_ms >= 1613923200000 AND event_ms <= 1614528000000 GROUP BY (date, age);
SELECT sum(ret[1]) AS original, sum(ret[2]) AS next_day_ret, sum(ret[3]) AS seven_day_ret FROM (SELECT user_id, retention( event = “register_picture_book” AND toDate(event_ms) = toDate(‘2021-03-01’), event = “view_picture_book” AND toDate(event_ms) = toDate(‘2021-03-02’), event = “view_picture_book” AND toDate(event_ms) = toDate(‘2021-03-08’) ) as ret FROM event_analytics WHERE event_ms >= 1614528000000 AND event_ms <= 1615132800000 GROUP BY user_id);
SELECT array( sumIf(count, level >= 1), sumIf(count, level >= 2) ) AS funnel_uv, FROM ( SELECT level, count() AS count FROM ( SELECT uid, windowFunnel(172800000)( event_ms, event = “view_picture_book” AND event_ms >= 1613923200000 AND event_ms <= 1614009600000, event = “buy_picture_book”) AS level FROM event_analytics WHERE event_ms >= 1613923200000 AND event_ms <= 1614182400000 GROUP BY uid ) GROUP BY level )
我们只需首先找到城市为北京的用户人群(用 bitmap 表示),然后找到性别为女的用户人群,然后对它们进行 AND 操作即可。具体查询如下:
1 2 3 4 5 6 7 8 9 10 11 12
WITH ( SELECT groupBitmapMergeState(users) FROM user_tag_bitmap_all WHERE p_day = ‘2021_06_01’ AND tag = ‘city’ AND tag_item = ‘beijing’ ) AS user_group_1, ( SELECT groupBitmapMergeState(users) FROM user_tag_bitmap_all WHERE p_day = ‘2021_06_01’ AND tag = ‘gender’ AND tag_item = ‘female’ ) AS user_group_2 SELECT bitmapToArray(bitmapAnd(user_group_1, user_group_2))
其中,groupBitmapMergeState 函数对通过 WHERE 筛除得到的任意个数的 bitmap (users) 进行 AND 操作,而 bitmapAnd 只能对两个 bitmap 进行 AND 操作。
WITH ( SELECT groupBitmapMergeState(users) FROM user_tag_bitmap_all WHERE p_day = ‘2021_06_01’ AND tag = ‘city’ AND tag_item = ‘beijing’ ) AS user_group_1, ( SELECT groupBitmapMergeState(users) FROM user_tag_bitmap_all WHERE p_day = ‘2021_06_01’ AND tag = ‘gender’ AND tag_item = ‘female’ ) AS user_group_2, ( SELECT bitmapAnd(user_group_1, user_group_2) ) AS filter_users SELECT bitmapCardinality(bitmapAnd(filter_users, group_by_users)) AS count, tag, tag_item FROM (SELECT groupBitmapMergeState(users) AS group_by_users, tag, tag_item FROM user_tag_bitmap_all WHERE tag = “device_type” GROUP BY (tag, tag_item));
doris on es 实现,列存,本地优先扫描,不排序(es 本身会排序,节省了很多计算开销),性能解决了,并且可以直接写sql了,不用写dsl了,扩展性,用户可以直接写sql了。中台必要的能力
更新场景,doris 更新不好,提升查询性能,需要做很多预计算,预计算需要有规则的。分析场景,维度跟指标要定义好,可以通过维度可以把指标预计算出来,画像场景,主键或者叫维度,更新大量是指标,查的时候要根据指标去查,某个年龄,金额大于xxx的,某个地域是xxx的人群是多少个,属性来查询。doris需要索引,排序,查询速度非常快,因为会预聚合,更新的结果应不应该在结果集中,性能会掉的很厉害,但是更新满足不了,
es 通过docid就可以更新了,性能一般,但是好歹是支持的。并且不影响读取。
19/12/28 16:17:25 ERROR tool.ImportTool: Error during import: --merge-key or --append is required when using --incremental lastmodified and the output directory exists.
注意:可以看到--merge-key or --append is required when using --incremental lastmodified意思是,这种基于时间导入模式,需要指定--merge-key或者--append参数,表示根据时间戳导入,数据是直接在末尾追加(append)还是合并(merge),这里使用merge方式,根据id合并: