MongoDB

Aggregation Framework

Derek Harmel

@derekharmel

What's it for?

Aggregation, computation, and data-reshaping

Why not Map-reduce?

  • Simpler
  • Faster
  • Better document access

How does it work?

  1. Specify a collection to pass through pipeline
  2. Declare a pipeline of operations
  3. Collection passes through pipeline
  4. Results returned as a document

The Command


db.runCommand({
  aggregate: '<collection name>',
  pipeline: [
    { $<operation>: ...},
    { $<operation>: ...},
    { $<operation>: ...}
]})

db.<collection name>.aggregate(
  { $<operation>: ...},
  { $<operation>: ...},
  { $<operation>: ...}
)
            

The Pipeline

Think Unix pipes

cat logfile | grep FATAL | cut -f2-4 | sort -k -n

$match

Usual MongoDB find syntax

*nix's grep

SQL's WHERE

Use early in the pipeline if possible

$project

*nix's awk

SQL's SELECT

Reshape the document

Perform computations

$unwind

SQL's JOIN (kinda)

Creates N documents for an array of length N

Used often with $group operator

$group

SQL's GROUP BY

Groups documents by matching field(s)

Performs computations and array operations

$sort

*nix's sort

SQL's ORDER BY

Use early in the pipeline if possible

$limit and $skip

*nix's head

SQL's LIMIT & OFFSET

$out (future)

Outputs data to a collection

Located anywhere within a pipeline

jira.mongodb.org/browse/SERVER-3253

$???

New operations easy to add

Request at jira.mongodb.org

Computed Expressions

Comparison Operators

$cmp $eq $gt $gte $lt $lte $ne

Accept an array of two values, most return a boolean


$gt: ['$IQ', '$bowlingScore'] // => true

// only because I'm a terrible bowler
          

Arithmetic Operators

$add $divide $mod $multiply $substract

Most accept an array of two values, return a value

$add & $multiply accept N values


$add: ['$hoursOfXbox', '$hoursOfTv', '$hoursOfReddit'] // => 24

// completely fictional, I swear
          

String Operators

$strcasecmp $substr $toLower $toUpper $add

Date Operators

$dayOfMonth $dayOfWeek $dayofYear $hour $minute $month $second $week $year

Extract values from a Date object

Multi-Expressions

$cond similar to a ternary operator

$ifNull similar to NVL, IFNULL, COALESCE

Examples

// good presenters
db.presentations.aggregate(
  { $match: {rating: {$gt: 4}} },
  { $project: {presenter: 1} }
).result // =>

[{_id: 2, presenter: 'Ryan McGeary'},
 {_id: 3, presenter: 'Nathen Harvey'}]
// presenters sorted by presenter score
db.presentations.aggregate(
  { $project: {
      presenter: 1,
      score: { $multiply: ['$attendees', '$rating'] }
  } },
  { $sort: {score: -1} }
).result // =>

[{_id: 3, presenter: "Nathen Harvey", score: 200},
 {_id: 1, presenter: "Derek Harmel",  score: 128},
 {_id: 2, presenter: "Ryan McGeary",  score: 115}]
// attendee stats by day for great talks
db.presentations.aggregate(
  { $match: {rating: {$gte: 4.5}} },
  { $group: {
      _id: '$date',
      total: {$sum: '$attendees'},
      avg:   {$avg: '$attendees'}
  } }
).result // =>

[{_id: ISODate("2012-08-15T04:00:00Z"), total: 63, avg: 31.5}]
// number of talks by topic
db.presentations.aggregate(
  { $unwind: '$topics' },
  { $group: {
      _id: '$topics',
      total: {$sum: 1},
  } }
).result // =>

[{_id: "kitten mittens", total: 1},
 {_id: "badassery",      total: 1},
 {_id: "devops",         total: 1},
 {_id: "mongodb",        total: 2}]
// total attendees by topic by day
db.presentations.aggregate(
  { $project: {
      attendees: 1,
      topics:    1,
      year:      {$year:       '$date'},
      month:     {$month:      '$date'},
      day:       {$dayOfMonth: '$date'}
  } },
  { $unwind: '$topics' },
  { $group: {
      _id: {year: 1, month: 1, day: 1, topics: 1},
      total: {$sum: '$attendees'}
    }
  }
).result

[{_id: {year: 2012, month: 8, day: 15, topics: "mongodb"},
  total: 55}]
 {_id: {year: 2012, month: 8, day: 15, topics: "devops"},
  total: 40},
 {_id: {year: 2012, month: 8, day: 15, topics: "kitten mittens"},
  total: 23},
 {_id: {year: 2012, month: 8, day: 15, topics: "badassery"}
  total: 32}]

Resources

Thanks!

Derek Harmel

Freelance Ruby on Rails Developer

@derekharmel