如果允许的话,我想再发布一个后续答案——我的原因是目前在Neo4j中重建代谢途径的研究很少,下面将对此进行完整的总结
StackOverflow标题/主题,“在Neo4j中创建代谢途径”。
就像我的
糖酵解
上面的路径,我在Neo4j中重新创建
TCA公司
(
柠檬酸循环
|
克雷布循环
)途径:
[TCA循环图像源:
https://metabolicpathways.stanford.edu/]
在创建TCA路径图的过程中出现的一个问题是,其中一个节点(酶,“乌头酸酶”)被使用了两次,因此在创建图的过程中
MERGE
合并了公共节点
aconitase
作为单个实体,导致此布局,
。。。不是这个,
我对该问题的解决方案是使用节点属性创建“TCA图”,以临时区别标记受影响的源节点和目标节点(稍后在正确创建图后删除这些标记)。
我还添加了
:Metabolism
标签,以便我可以选择各个路径(
:Glycolysis
|
:TCA
)或完整的代谢网络(
:新陈代谢
),根据需要。
最后,我需要连接这两条路径(
:糖酵解
|
:TCA
)通过其公共节点,
pyruvate
,我可以通过APOC程序完成(这里,附在我的
glycolysis.cql
(Cypher)脚本。
这是我的CSV数据文件*。cql密码脚本、脚本执行和结果图。
糖酵解。csv:
source,relation,target
α-D-glucose,substrate_of,hexokinase
hexokinase,yields,glucose 6-phosphate
glucose 6-phosphate,substrate_of,glucose-6-phosphatase
glucose-6-phosphatase,yields,α-D-glucose
glucose 6-phosphate,substrate_of,phosphoglucose isomerase
phosphoglucose isomerase,yields,fructose 6-phosphate
fructose 6-phosphate,substrate_of,phosphofructokinase
phosphofructokinase,yields,"fructose 1,6-bisphosphate"
"fructose 1,6-bisphosphate",substrate_of,"fructose-bisphosphate aldolase, class I"
"fructose-bisphosphate aldolase, class I",yields,D-glyceraldehyde 3-phosphate
D-glyceraldehyde 3-phosphate,substrate_of,glyceraldehyde-3-phosphate dehydrogenase
D-glyceraldehyde 3-phosphate,substrate_of,triosephosphate isomerase (TIM)
triosephosphate isomerase (TIM),yields,dihydroxyacetone phosphate
glyceraldehyde-3-phosphate dehydrogenase,yields,"1,3-bisphosphoglycerate"
"1,3-bisphosphoglycerate",substrate_of,phosphoglycerate kinase
phosphoglycerate kinase,yields,3-phosphoglycerate
3-phosphoglycerate,substrate_of,phosphoglycerate mutase
phosphoglycerate mutase,yields,2-phosphoglycerate
2-phosphoglycerate,substrate_of,enolase
enolase,yields,phosphoenolpyruvate
phosphoenolpyruvate,substrate_of,pyruvate kinase
pyruvate kinase,yields,pyruvate
tca。csv:
source,relation,target,tag1,tag2
pyruvate,substrate_of,pyruvate dehydrogenase,,
pyruvate dehydrogenase,yields,acetyl CoA,,
acetyl CoA,substrate_of,citrate synthase,,
oxaloacetate,substrate_of,citrate synthase,,
citrate synthase,yields,citrate,,
citrate,substrate_of,aconitase,,1
aconitase,yields,cis-aconitate,1,
cis-aconitate,substrate_of,aconitase,,2
aconitase,yields,isocitrate,2,
isocitrate,substrate_of,isocitrate dehydrogenase,,
isocitrate dehydrogenase,yields,α-ketoglutarate,,
α-ketoglutarate,substrate_of,α-ketoglutarate dehydrogenase,,
α-ketoglutarate dehydrogenase,yields,succinyl-CoA,,
succinyl-CoA,substrate_of,succinyl-CoA synthetase,,
succinyl-CoA synthetase,yields,succinate,,
succinate,substrate_of,succinate dehydrogenase,,
succinate dehydrogenase,yields,fumarate,,
fumarate,substrate_of,fumarase,,
fumarase,yields,S-malate,,
S-malate,substrate_of,malate dehydrogenase,,
malate dehydrogenase,yields,oxaloacetate,,
“tsv.csv”中的“tag1”和“tag”2用于在通过“tca.cql”脚本创建源节点和目标节点时唯一地使用它们:
tca。cql:
// CREATE INDICES:
CREATE INDEX ON :Metabolism(name);
CREATE INDEX ON :TCA(name);
// CREATE GRAPH:
// USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS FROM "file:/mnt/Vancouver/Programming/data/metabolism/tca.csv" AS row
MERGE (s:Metabolism:TCA {name: row.source, tag:COALESCE(row.tag1, '')})
MERGE (t:Metabolism:TCA {name: row.target, tag:COALESCE(row.tag2, '')})
WITH s, t, row
CALL apoc.merge.relationship(s, row.relation, {}, {}, t) YIELD rel
REMOVE s.tag, t.tag
RETURN COUNT(*);
糖酵解。cql:
// CREATE INDICES:
CREATE INDEX ON :Metabolism(name);
CREATE INDEX ON :Glycolysis(name);
// CREATE GRAPH:
//USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS FROM "file:/mnt/Vancouver/Programming/data/metabolism/glycolysis.csv" AS row
MERGE (s:Metabolism:Glycolysis {name: row.source})
MERGE (t:Metabolism:Glycolysis {name: row.target})
WITH s, t, row
CALL apoc.merge.relationship(s, row.relation, {}, {}, t) YIELD rel
RETURN COUNT(*);
// MERGE COMMON NODE (GLYCOLYSIS: PYRUVATE; TCA: PYRUVATE):
// As presented, run "tca.cql" first, then "glycolysis.cql"
MATCH (g:Glycolysis), (t:TCA) WHERE g.name = t.name
CALL apoc.refactor.mergeNodes([g,t]) YIELD node
RETURN node;
脚本执行:
$ cat tca.cql | cypher-shell -u *** -p ***
COUNT(*)
21
$ cat glycolysis.cql | cypher-shell -u *** -p ***
COUNT(*)
22
node
(:Metabolism:TCA:Glycolysis {name: "pyruvate"})
$
Neo4j图(
:新陈代谢
视图):