A deep dive into cursor-based pagination in MongoDB
Exploring cursor-based pagination in MongoDB and the tricky parts.
One of the improvements we had to make to Engage as we continue to scale is changing the way we paginate some result sets from offset-based pagination to cursor-based pagination. This Mixmax article is a good introduction but in this piece, I explore more details and talk about navigating some tricky parts of cursor-based pagination.
Outline
- What is offset-based pagination?
- What is cursor-based pagination?
- Why cursor-based pagination?
- Challenges of cursor-based pagination
- Implementing offset-based pagination
- Getting total
- Implementing cursor-based pagination
- Sorting and paginating with a non-unique field
- There is an existing $or operator
- Ascending and descending order sorting
- All together now
- A final note on indexing
Let’s start by defining the two types of pagination and why cursor-based pagination.
What is offset-based pagination?
Offset-based pagination lets us paginate by skipping a number of records (the offset). If we have a table with 100 items and want to show 20 items per page (this is called the limit), our first result page will be 1 to 20. To show the next page, i.e 21 to 40, we skip the first 20 items. This is what that will look like in SQL
select * from items limit 20 offset 20;
And in MongoDB:
db.items.find({}, { limit: 20, skip: 20 })
What is cursor-based pagination?
Instead of skipping results to move across pages, cursor-based pagination lets us use a record as a pointer (this is called the cursor) so we can get items after or before that record. Using our 100 items example, our first result page is 1 to 20. To get the next page, we use item 20 (the last item of the page) as our pointer (i.e, the cursor) and get items after it.
select * from items where id > 20 limit 20;
db.items.find({ id: { $gt: 20 } }, { limit: 20 })
But this is putting it in a simple way. We are making the assumption that our items
have an id
field that is 1 to 100 so we can get items with ids greater than or less than the pointer id (20). This probably won’t be the case in a real-world scenario. But it doesn’t matter. As long as the field is immutable, sortable and unique, we are fine. IDs and date fields are good candidates for this. Even though MongoDB IDs are not auto-increment IDs, you can use them because they are [vaguely] based on timestamp.
// Where 507f191e810c19729de860ea is the _id
// of last item in previous page
db.items.find({
_id: { $gt: new ObjectId('507f191e810c19729de860ea') }
}, {
limit: 20
})
Why cursor-based pagination?
Scalability
The challenge with offset-based pagination is that the database needs to scan skipped records before getting to the requested data. This means that on page 5 of our example, the database will still scan items 1 to 80 before returning results for 80 to 100. Scanning 80 items may look small but as the offset increases, the number of records to scan increases.
With cursor-based pagination, the database can directly jump to the “position” of the needed result set. Note that to achieve this desired advantage though, the cursor field has to be indexed. IDs are indexed by default so that is fine. If you are using another field, a date field, for example, ensure it is indexed.
Guarantees accurate results
Another challenge with offset-based pagination is that if there are changes to the dataset (a data addition or removal) during pagination, it may cause wrong results to be shown.
Let’s look at another example with a smaller dataset of 1 to 10 items at 5 items per page.
If between when I was on page 1 and navigating to page 2, items 4 and 5, were removed, page 2 will only show me 8 to 10 instead of 6 to 10
This is because I specified an offset of 5 (as I should) and that tells the database to skip 5 items – 1, 2, 3, 6, 7.
The same thing happens if new items are added. This doesn’t happen in cursor-based pagination as the exact pointer to the last or previous item is specified.
Challenges of cursor-based pagination
Cursor-based pagination is not without its challenges. With offset-based pagination, you can list out the number of pages and allow people easily jump across the pages. In fact, the URL can also be directly modified to jump to a particular page.
You can’t do this with cursor-based pagination. This makes it difficult to have an idea of the current position of a result set in the entire dataset. Are we close to the end of the dataset or just at the beginning? Or at the middle?
Implementing offset-based pagination
Getting data for offset-based pagination is simple. This is one of its pros.
const data = await db.collection('items').find({}, {
limit: itemsPerPage,
skip: offset
}).toArray()
Considering how the pagination on the client or frontend will be built, we will need to return more than just data
. We would at least need to send:
- The
offset
that returned the current data so we know our current page and if there is a next or previous page. - The
total
number of items so we know if there is a next page. Ifoffset
+ count of current items is less than the total number of items, then there is a next page - The
limit
used. The currentoffset
minus limit will be used to get data for the previous page. This is also useful if we need to show the total number of pages (i.e,total
divided bylimit
).
Putting it together, we will have something like this:
getPaginatedResult = async (query) => {
query = query || {}
// If a limit is specified, use it. Else default to 20
let limit = +query.limit || 20
if (limit < 1) { limit = 20 }
if (limit > 50) { limit = 50 }
// If an offset is specified, use it. Else default to 0
// i.e no offset -> first page
let skip = +query.offset || 0
if (skip < 1) { skip = 0 }
const data = await db.collection('items').find({}, {
limit, // short for limit: limit
skip // short for skip: skip
}).toArray()
const total = await db.collection('items').countDocuments({})
return {
offset: skip,
limit,
total,
data
}
}
On the frontend or client, we can then do this:
<div v-if="result.offset > 0">
<a :href="'?offset=' + (result.offset - result.limit) ">Previous</a>
</div>
<div v-if="(result.offset + result.data.length) < result.total">
<a :href="'?offset=' + (result.offset + result.limit) ">Next</a>
</div>
A form of offset pagination uses page
for navigation instead of offset
. Behind the scene, it’s the same thing.
getPaginatedResult = async (query) => {
// ...
let page = +query.page || 1
if (page < 1) { page = 1 }
// Convert page to offset
const offset = limit * page
// ...
return {
page,
limit,
total,
data
}
}
<div v-if="result.page > 1">
<a :href="'?page=' + (result.page - 1) ">Previous</a>
</div>
<div v-if="(result.page * result.limit) < result.total">
<a :href="'?page=' + (result.page + 1) ">Next</a>
</div>
Getting total
Let’s digress a bit to take a look at getting total
. Here is how to count data in a MongoDB collection.
const total = await db.collection('items').countDocuments({ query })
This statement counts all data that matches the query in the collection [by running an aggregation]. It looks simple, but on a collection with a lot of data, this will cause performance issues.
There is a count alternative that uses the collection's metadata for count but has some limitations. For one, you cannot use it with a query. It has to be for the whole collection. It is also not always accurate.
To avoid the performance issue with countDocument
, one trick is to limit the count to a number that is not too large. (That number will depend on your database’s performance, so you will need to experiment to know what won’t choke your DB).
let total = await db.collection('items').countDocuments({}, { limit: 10000 })
if (total === 10000) {
total = -1
}
<div>Showing {{ result.offset + 1 }} to {{ result.offset + result.data.length }} of {{ result.total == -1 ? 'many' : result.total }}</div>
<div class="flex justify-between">
<div v-if="result.offset > 0">
<a :href="'?offset=' + (result.offset - result.limit) ">Previous</a>
</div>
<div v-if="result.total === -1 || (result.offset + result.data.length) < result.total">
<a :href="'?offset=' + (result.offset + limit) ">Next</a>
</div>
</div>
Implementing cursor-based pagination
const data = await db.collection('items').find({}, {
sort: { _id: -1 },
limit
}).toArray()
It can be this simple. The frontend can figure out what next cursor to pass to get to the next page or previous page.
<a :href="'?prev_cursor=' + result.data[0]._id ">Previous</a>
<a :href="'?next_cursor=' + result.data[result.data.length - 1]._id ">Next</a>
But we can do better. We can help the frontend know if there is a previous page or if there is a next page. There are a couple of ways to do this. One common pattern is returning a previous_cursor
parameter if there is a previous page and returning a next_cursor
parameter if there is a next page.
Let’s do that.
getPaginatedResult = async (query) => {
query = query || {}
let limit = +query.limit || 20
if (limit < 1) { limit = 20 }
if (limit > 50) { limit = 50 }
const data = await db.collection('items').find({}, {
sort: { _id: -1 },
limit
}).toArray()
let hasNext, hasPrev, lastItem, firstItem
if (data.length) {
lastItem = data[data.length - 1]._id
firstItem = data[0]._id
// If there is an item with id less than last item (remember, sort is in desc _id), there is a next page
const q = { _id: { $lt: lastItem } }
const r = await db.collection('items').findOne(q)
if (r) {
hasNext = true
}
// Short form:
// hasNext = !!await db.collection('items').findOne(q)
// If there is an item with id greater than first item (remember, sort is in desc _id), there is a previous page
q._id = {
$gt: firstItem
}
hasPrev = !!await db.collection('items').findOne(q)
}
const response = {
data
}
if (hasNext) {
response.next_cursor = `${lastItem}`
}
if (hasPrev) {
response.previous_cursor = `${firstItem}`
}
return response
}
You can add total
to response
if you want to. If you are dealing with a large data set, you probably don’t need to unless showing the total number of results is important.
On the frontend:
<div v-if="result.previous_cursor">
<a :href="'?prev_cursor=' + result.previous_cursor ">Previous</a>
</div>
<div v-if="result.next_cursor">
<a :href="'?next_cursor=' + result.next_cursor ">Next</a>
</div>
Shaping up well but getPaginatedResult
has no support for the cursor parameters yet. Let’s add it.
getPaginatedResult = async (query) => {
// ...
const q = {}
const sort = { _id: -1 }
if (query.previous_cursor) {
q._id = { $gt: new ObjectId(query.previous_cursor) }
sort._id = 1
} else if (query.next_cursor) {
q._id = { $lt: new ObjectId(query.next_cursor) }
}
const data = await db.collection('items').find(q, {
sort,
limit
}).toArray()
if (query.previous_cursor) data.reverse()
// ...
}
Note that if it is previous_cursor
, we sort _id
in ascending order and reverse the result.
Sorting and paginating with a non-unique field
Sometimes you want to allow users sort the results by a specific field that is not the id field. For example, “last used date” or “event count”.
Let’s take a look at an example where we sort by last_used
.
getPaginatedResult = async (query) => {
// ...
const sort = {
last_used: -1
}
// ...
const data = await db.collection('items').find(q, {
sort,
limit
}).toArray()
if (query.previous_cursor) data.reverse()
// ...
if (hasNext) {
response.next_cursor = new Date(data[data.length - 1].last_used).toISOString()
}
if (hasPrev) {
response.previous_cursor = new Date(data[0].last_used).toISOString()
}
// ...
}
(I converted the date to ISO format so that there are no spaces. Makes it easier to work with at the frontend/client-side).
There is a small issue here. If last_used
is not unique, our results will be inaccurate. If for example the value of our cursor is 22/6/6
and there are 2 records with that value, our pagination query will omit records with that value.
To solve this, we use two fields as our cursor – in this case _id
and last_used
.
getPaginatedResult = async (query) => {
// ...
if (hasNext) {
response.next_cursor = data[data.length - 1]._id + '_' + new Date(data[data.length - 1].last_used).toISOString()
}
if (hasPrev) {
response.previous_cursor = data[0]._id + '_' + new Date(data[0].last_used).toISOString()
}
// ...
}
This also means we need to sort and query by both fields.
getPaginatedResult = async (query) => {
// ...
if (query.previous_cursor) {
const [cursorId, cursorDate] = query.previous_cursor.split('_')
q.$or = [
{ last_used: { $lt: new Date(cursorDate) } },
{
last_used: new Date(cursorDate),
_id: { $lt: new ObjectId(cursorId) }
}
]
} else if (query.next_cursor) {
const [cursorId, cursorDate] = query.next_cursor.split('_')
q.$or = [
{ last_used: { $gt: new Date(cursorDate) } },
{
last_used: new Date(cursorDate),
_id: { $gt: new ObjectId(cursorId) }
}
]
}
const sort = {
last_used: -1,
_id: -1
}
// ...
const data = await db.collection('items').find(q, {
sort,
limit
}).toArray()
if (query.previous_cursor) data.reverse()
// ...
}
There is an existing $or operator
We have seen that paginating on two fields requires adding an $or operator to our query. What happens if the query already has an $or operator? For example, imagine we want to paginate this:
const q = {
$or: [
{ 'owner': user },
{ 'added_by': user }
]
}
const data = await db.collection('items').find(q).toArray()
What we do in this case is to use $and to combine the existing $or and the $or for our pagination.
const queryOr = [
{ 'owner': user },
{ 'added_by': user }
]
const paginationOr = [
{ last_used: { $gt: new Date(cursorDate) } },
{
last_used: new Date(cursorDate),
_id: { $gt: new ObjectId(cursorId) }
}
]
const q = {
$and: [
{ $or: queryOr },
{ $or: paginationOr }
]
}
const data = await db.collection('items').find(q).toArray()
Ascending and descending order sorting
We have been sorting in one direction all along. Can we allow users sort the result by a custom field in both descending and ascending order? Why not?
getPaginatedResult = async (query) => {
// ...
const order = {
query: 'desc',
key: '',
sort_order: -1
}
if (query.order === 'asc') {
order.query = 'asc'
}
let cursor
if (query.previous_cursor) {
cursor = query.previous_cursor
if (order.query === 'desc') {
order.key = '$gt'
order.sort_order = 1
} else {
order.key = '$lt'
order.sort_order = -1
}
} else if (query.next_cursor) {
cursor = query.next_cursor
if (order.query === 'desc') {
order.key = '$lt'
order.sort_order = -1
} else {
order.key = '$gt'
order.sort_order = 1
}
}
if (cursor) {
const [cursorId, cursorDate] = cursor.split('_')
q.$or = [
{ last_used: { [order.key]: new Date(cursorDate) } },
{
last_used: new Date(cursorDate),
_id: { [order.key]: new ObjectId(cursorId) }
}
]
}
const sort = {
last_used: order.sort_order,
_id: order.sort_order
}
// ...
const data = await db.collection('items').find(q, {
sort,
limit
}).toArray()
if (query.previous_cursor) data.reverse()
const response = {
data,
order: order.query
}
// ...
}
<div v-if="result.previous_cursor">
<a :href="'?order=' + result.order + '&prev_cursor=' + result.previous_cursor ">Previous</a>
</div>
<div v-if="result.next_cursor">
<a :href="'?order=' + result.order + '&next_cursor=' + result.next_cursor ">Next</a>
</div>
All together now
Multiple field sorting in both directions? Let's do it.
getPaginatedResult = async (query) => {
const q = {}
query = query || {}
let limit = +query.limit || 20
if (limit < 1) { limit = 20 }
if (limit > 50) { limit = 50 }
// What field is our cursor?
let cursorField = '_id'
const allowedFields = ['_id', 'last_used', 'pageviews']
if (query.field && allowedFields.includes(query.field)) {
cursorField = query.field
}
const order = {
query: 'desc',
key: '',
sort_order: -1
}
if (query.order === 'asc') {
order.query = 'asc'
}
let cursor
if (query.previous_cursor) {
cursor = query.previous_cursor
if (order.query === 'desc') {
order.key = '$gt'
order.sort_order = 1
} else {
order.key = '$lt'
order.sort_order = -1
}
} else if (query.next_cursor) {
cursor = query.next_cursor
if (order.query === 'desc') {
order.key = '$lt'
order.sort_order = -1
} else {
order.key = '$gt'
order.sort_order = 1
}
}
const sort = {}
if (cursor) {
let [cursorId, cursorParam] = cursor.split('_')
cursorId = new ObjectId(cursorId)
if (cursorParam) {
if (cursorField === 'last_used') {
cursorParam = new Date(cursorParam)
} else if (cursorField === 'pageviews') {
cursorParam = +cursorParam
}
q.$or = [
{ cursorField: { [order.key]: cursorParam } },
{
cursorField: cursorParam,
_id: { [order.key]: cursorId }
}
]
sort[cursorField] = order.sort_order
} else {
q._id = { [order.key]: cursorId }
}
sort._id = order.sort_order
}
const data = await db.collection('items').find(q, {
sort,
limit
}).toArray()
if (query.previous_cursor) data.reverse()
let hasNext, hasPrev
if (data.length) {
// Has next?
order.key = (order.query === 'desc') ? '$lt' : '$gt'
cursorParam = data[data.length - 1][cursorField]
if (cursorField === '_id') {
q._id = { [order.key]: new ObjectId(cursorParam) }
} else {
if (cursorField === 'last_used') {
cursorParam = new Date(cursorParam)
} else if (cursorField === 'pageviews') {
cursorParam = +cursorParam
}
q.$or = [
{ cursorField: { [order.key]: cursorParam } },
{
cursorField: cursorParam,
_id: { [order.key]: new ObjectId(data[data.length - 1]._id) }
}
]
}
hasNext = !!await dbo.db().collection('items').findOne(q)
// Has previous page
order.key = (order.query === 'desc') ? '$lt' : '$gt'
cursorParam = data[0][cursorField]
if (cursorField === '_id') {
q._id = { [order.key]: new ObjectId(cursorParam) }
} else {
if (cursorField === 'last_used') {
cursorParam = new Date(cursorParam)
} else if (cursorField === 'pageviews') {
cursorParam = +cursorParam
}
q.$or = [
{ cursorField: { [order.key]: cursorParam } },
{
cursorField: cursorParam,
_id: { [order.key]: new ObjectId(data[0]._id) }
}
]
}
hasPrev = !!await dbo.db().collection('items').findOne(q)
}
const response = {
data,
order: order.query
}
if (hasNext) {
response.next_cursor = data[data.length - 1]._id
if (cursorField === '_id') {
response.next_cursor += '_' + data[data.length - 1][cursorField]
}
}
if (hasPrev) {
response.next_cursor = data[0]._id
if (cursorField === '_id') {
response.next_cursor += '_' + data[0][cursorField]
}
}
return response
}
A final note on indexing
Always ensure your cursor fields are indexed. This is because you are using them for sorting and filtering. If the cursor is _id
, no extra action is needed; ids are indexed by default.
db.collection('items').createIndex('last_used')
If your cursor is multiple fields like with our last_used
example, you need to create a compound index based on both fields.
db.collection('items').createIndex({ last_login: -1, _id: -1 })
Both fields are indexed in descending order to support sorting both fields in that order. However, the index will also support sorting both fields in ascending order. In order words, it's the same as:
db.collection('items').createIndex({ last_login: 1, _id: 1 })