代码之家 › 专栏 › 技术社区 › Hassan Mokdad

MongoDB在预定义大小的行集合上聚合

aggregate mongodb-query group-by mongodb

Hassan Mokdad · 技术社区 · 6 年前

一个较小的例子是下表,我想计算每4个连续行的平均评级:

所以我的不安应该是这样的:

[{"ItemName":"Item1","Rating":4},
{"ItemName":"Item2","Rating":4},
{"ItemName":"Item2","Rating":4},
{"ItemName":"Item3","Rating":2},
{"ItemName":"Item4","Rating":5},
{"ItemName":"Item5","Rating":4},
{"ItemName":"Item6","Rating":2},
{"ItemName":"Item7","Rating":4},
{"ItemName":"Item8","Rating":1},
{"ItemName":"Item9","Rating":4},
{"ItemName":"Item10","Rating":3},
{"ItemName":"Item11","Rating":2},
{"ItemName":"Item12","Rating":2}]

1 回复 | 直到 6 年前

Alex Blex 6 年前

没有简单的方法。您需要将整个集合分组到可能需要的数组中 allowDiskUse 对于具有巨大性能影响的大型数据集。

db.collection.aggregate([
    // count all documents
    { $group: {
        _id: null,
        cnt: { $sum: 1},
        docs: { $push: "$$ROOT" }
    } },
    // add _batch field to group documents by
    { $project: {
        _id: 0,
        docs: { $map: { 
            // add a sequential number to each
            input: { $zip: {
                inputs: [ "$docs",  { $range: [ 0, "$cnt" ] } ]
            } }, 
            as: "doc", 
            in: { $mergeObjects: [ 
                { $arrayElemAt: [ "$$doc", 0 ] }, 
                // split it in batches by 4 based on the sequential number
                { _batch: { $cond: [ 
                    { $eq: [ { $arrayElemAt: [  "$$doc", 1 ] }, 0 ] }, 
                    1, 
                    { $ceil: { $divide: [ { $arrayElemAt: [  "$$doc", 1 ] }, 4 ] } } 
                ] } }
            ] } 
        } }
    } },    
    { $unwind: "$docs" },
    { $replaceRoot: { newRoot: "$docs" } },
    // ensure original order, only if you need ItemRange as a string
    { $sort: { _id: 1 } },
    // calculate averages per batch
    { $group: { 
        _id: "$_batch",
        start: { $first: "$ItemName" }, // only if you need ItemRange as a string
        end: { $last: "$ItemName" }, // only if you need ItemRange as a string
        RatingAvg: {$avg: "$Rating"} 
    } },
    // only if you need them in order
    { $sort: { _id: 1 } },
    // calculate ItemRange, only if you need ItemRange as a string
    { $project: {
        _id: 0,
        ItemRange: { $concat: [ "$start", "-", "$end" ] },
        RatingAvg: 1 
    } },
])

不确定实际用例,因为删除第一个文档时,所有平均值都会改变。

不管怎样,如果您不需要格式为“FirstName LastName”的ItemRange,而是可以使用批号,那么您可以跳过2个last in memory排序,这将提高性能。

推荐文章

John Smith · Java按值对映射进行分组,其中值为列表

2 年前

The Great · 拆分并存储数据帧,但名称基于特定列中的唯一值

3 年前

Programming Noob · AttributeError:“SeriesGroupBy”对象没有属性“tolist”

3 年前

Flo · 分组依据中的SQL大小写

3 年前

user13663655 · 我可以使用条件分组并返回源类型值吗

3 年前

pinkiBet · 使用MAX和UNION分组,还是加入?

3 年前

Shubham · 如何在R中查找数据表的两个多类别列之间的差异/setdiff()

3 年前

user7052482 · 将来自联合的SQL结果分组(SQL Server)

7 年前

lugger1 · 如何获取Postgres中时间间隔的平均值

7 年前

JustCallMeGary · 如何使用R中的一个因子操作两个变量[重复]

7 年前