我有一个JSON数据库更改日志,输出
wal2json
{"xid":1190,"timestamp":"2018-07-19 17:18:02.905354+02","change":[
{"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update AA",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}},
{"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update BB",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}}]}
...
每个顶层条目(
xid
)是一个事务,每个项目
change
要导入到具有有限功能集的OLAP系统,我需要明确说明顺序。所以我需要加一个
sn
而且,每个更改都必须是顶级条目—OLAP不能在一个条目中迭代子条目。
{"xid":1190, "sn":1, "kind":"update", "data":{"id":401,"name":"Update AA","age":20} }
{"xid":1190, "sn":2, "kind":"update", "data":{"id":401,"name":"Update BB","age":20} }
{"xid":1191, "sn":1, "kind":"insert", "data":{"id":625,"name":"Inserted","age":20} }
{"xid":1191, "sn":2, "kind":"delete", "data":{"id":625} }
(原因是OLAP在导入过程中转换数据的能力有限,并且没有将顺序作为参数。)
所以,我用
jq
:
function transformJsonDataStructure {
## First let's reformat it to XML, then transform using XPATH, then back to JSON.
## Example input:
# {"xid":1074,"timestamp":"2018-07-18 17:49:54.719475+02","change":[
# {"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update AA",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}},
# {"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update BB",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}}]}
cat "$1" | while read -r LINE ; do
XID=`echo "$LINE" | jq -c '.xid'`;
export SN=0;
#serr "{xid: $XID, changes: $CHANGES}";
echo "$LINE" | jq -c '.change[]' | while read -r CHANGE ; do
SN=$((SN+=1))
KIND=`echo "$CHANGE" | jq -c --raw-output .kind`;
TABLE=`echo "$CHANGE" | jq -c --raw-output .table`;
DEST_FILE="$TARGET_PATH-$TABLE.json";
case "$KIND" in
update|insert)
MAP=$(convertTwoArraysToMap "$(echo "$CHANGE" | jq -c ".columnnames")" "$(echo "$CHANGE" | jq -c ".columnvalues")") ;;
delete)
MAP=$(convertTwoArraysToMap "$(echo "$CHANGE" | jq -c ".oldkeys.keynames")" "$(echo "$CHANGE" | jq -c ".oldkeys.keyvalues")") ;;
esac
#echo "{\"xid\":$XID, \"table\":\"$TABLE\", \"kind\":\"$KIND\", \"data\":$MAP }" >> "$DEST_FILE"; ;;
echo "{\"xid\":$XID, \"sn\":$SN, \"kind\":\"$KIND\", \"data\":$MAP }" | tee --append "$DEST_FILE";
done;
done;
return;
}
问题是性能。我在打电话
每次输入几次。这是相当慢的,大约1000倍慢于没有转换。
如何仅使用一个过程就可以执行上述转换?
(
jq公司
不是必须的,其他工具也可以使用,但应该在CentOS包中。我想避免为此编写额外的工具。
从
man jq
似乎它可以一次性处理整个文件(每行JSON条目)。我可以用XSLT来做,但我不能把我的头绕过去
jq公司
. 尤其是迭代过程
改变
columnnames
和
columnvalues
-
map
或
map_values
可以使用。
-
from_entries
和
with_entries
功能,但无法使其工作。
任何
jq公司
师傅四处指点?