maildir-sync.c revision d8b77aef97e89f1ccc5cbdaef77be9052279e35f
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen/* Copyright (C) 2004 Timo Sirainen */
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen Here's a description of how we handle Maildir synchronization and
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen it's problems:
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen We want to be as efficient as we can. The most efficient way to
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen check if changes have occured is to stat() the new/ and cur/
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen directories and uidlist file - if their mtimes haven't changed,
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen there's no changes and we don't need to do anything.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen Problem 1: Multiple changes can happen within a single second -
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen nothing guarantees that once we synced it, someone else didn't just
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen then make a modification. Such modifications wouldn't get noticed
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen until a new modification occured later.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen Problem 2: Syncing cur/ directory is much more costly than syncing
b5a084602a3dc2c118345f6d2990f47c1ff66264Timo Sirainen new/. Moving mails from new/ to cur/ will always change mtime of
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen cur/ causing us to sync it as well.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen Problem 3: We may not be able to move mail from new/ to cur/
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen because we're out of quota, or simply because we're accessing a
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen read-only mailbox.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen MAILDIR_SYNC_SECS
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen -----------------
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen Several checks below use MAILDIR_SYNC_SECS, which should be maximum
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen clock drift between all computers accessing the maildir (eg. via
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen NFS), rounded up to next second. Our default is 1 second, since
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen everyone should be using NTP.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen Note that setting it to 0 works only if there's only one computer
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen accessing the maildir. It's practically impossible to make two
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen clocks _exactly_ synchronized.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen It might be possible to only use file server's clock by looking at
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen the atime field, but I don't know how well that would actually work.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen cur directory
83c21c990eb2a370f0da56240e73dac846f4acc3Timo Sirainen -------------
83c21c990eb2a370f0da56240e73dac846f4acc3Timo Sirainen We have dirty_cur_time variable which is set to cur/ directory's
83c21c990eb2a370f0da56240e73dac846f4acc3Timo Sirainen mtime when it's >= time() - MAILDIR_SYNC_SECS and we _think_ we have
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen synchronized the directory.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen When dirty_cur_time is non-zero, we don't synchronize the cur/
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen directory until
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen a) cur/'s mtime changes
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen b) opening a mail fails with ENOENT
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen c) time() > dirty_cur_time + MAILDIR_SYNC_SECS
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen This allows us to modify the maildir multiple times without having
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen to sync it at every change. The sync will eventually be done to
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen make sure we didn't miss any external changes.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen The dirty_cur_time is set when:
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen - we change message flags
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen - we expunge messages
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen - we move mail from new/ to cur/
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen - we sync cur/ directory and it's mtime is >= time() - MAILDIR_SYNC_SECS
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen It's unset when we do the final syncing, ie. when mtime is
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen older than time() - MAILDIR_SYNC_SECS.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen new directory
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen -------------
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen If new/'s mtime is >= time() - MAILDIR_SYNC_SECS, always synchronize
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen it. dirty_cur_time-like feature might save us a few syncs, but
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen that might break a client which saves a mail in one connection and
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen tries to fetch it in another one. new/ directory is almost always
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen empty, so syncing it should be very fast anyway. Actually this can
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen still happen if we sync only new/ dir while another client is also
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen moving mails from it to cur/ - it takes us a while to see them.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen That's pretty unlikely to happen however, and only way to fix it
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen would be to always synchronize cur/ after new/.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen Normally we move all mails from new/ to cur/ whenever we sync it. If
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen it's not possible for some reason, we mark the mail with "probably
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen exists in new/ directory" flag.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen If rename() still fails because of ENOSPC or EDQUOT, we still save
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen the flag changes in index with dirty-flag on. When moving the mail
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen to cur/ directory, or when we notice it's already moved there, we
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen apply the flag changes to the filename, rename it and remove the
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen dirty flag. If there's dirty flags, this should be tried every time
65b7beb7cefce89e175920ef6c16118b1b0dbfb3Timo Sirainen after expunge or when closing the mailbox.
65b7beb7cefce89e175920ef6c16118b1b0dbfb3Timo Sirainen This file contains UID <-> filename mappings. It's updated only when
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen new mail arrives, so it may contain filenames that have already been
65b7beb7cefce89e175920ef6c16118b1b0dbfb3Timo Sirainen deleted. Updating is done by getting uidlist.lock file, writing the
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen whole uidlist into it and rename()ing it over the old uidlist. This
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen means there's no need to lock the file for reading.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen Whenever uidlist is rewritten, it's mtime must be larger than the old
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen one's. Use utime() before rename() if needed. Note that inode checking
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen wouldn't have been sufficient as inode numbers can be reused.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen This file is usually read the first time you need to know filename for
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen given UID. After that it's not re-read unless new mails come that we
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen don't know about.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen broken clients
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen --------------
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen Originally the middle identifier in Maildir filename was specified
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen only as <process id>_<delivery counter>. That however created a
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen problem with randomized PIDs which made it possible that the same
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen PID was reused within one second.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen So if within one second a mail was delivered, MUA moved it to cur/
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen and another mail was delivered by a new process using same PID as
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen the first one, we likely ended up overwriting the first mail when
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen the second mail was moved over it.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen Nowadays everyone should be giving a bit more specific identifier,
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen for example include microseconds in it which Dovecot does.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen There's a simple way to prevent this from happening in some cases:
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen Don't move the mail from new/ to cur/ if it's mtime is >= time() -
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen MAILDIR_SYNC_SECS. The second delivery's link() call then fails
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen because the file is already in new/, and it will then use a
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen different filename. There's a few problems with this however:
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen - it requires extra stat() call which is unneeded extra I/O
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen - another MUA might still move the mail to cur/
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen - if first file's flags are modified by either Dovecot or another
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen MUA, it's moved to cur/ (you _could_ just do the dirty-flagging
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen but that'd be ugly)
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen Because this is useful only for very few people and it requires
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen extra I/O, I decided not to implement this. It should be however
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen quite easy to do since we need to be able to deal with files in new/
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen It's also possible to never accidentally overwrite a mail by using
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen link() + unlink() rather than rename(). This however isn't very
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen good idea as it introduces potential race conditions when multiple
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen clients are accessing the mailbox:
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen Trying to move the same mail from new/ to cur/ at the same time:
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen a) Client 1 uses slightly different filename than client 2,
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen for example one sets read-flag on but the other doesn't.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen You have the same mail duplicated now.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen b) Client 3 sees the mail between Client 1's and 2's link() calls
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen and changes it's flag. You have the same mail duplicated now.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen And it gets worse when they're unlink()ing in cur/ directory:
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen c) Client 1 changes mails's flag and client 2 changes it back
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen between 1's link() and unlink(). The mail is now expunged.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen d) If you try to deal with the duplicates by unlink()ing another
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen one of them, you might end up unlinking both of them.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen So, what should we do then if we notice a duplicate? First of all,
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen it might not be a duplicate at all, readdir() might have just
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen returned it twice because it was just renamed. What we should do is
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen create a completely new base name for it and rename() it to that.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen If the call fails with ENOENT, it only means that it wasn't a
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen duplicate after all.
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen struct maildir_uidlist_sync_ctx *uidlist_sync_ctx;
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainenstatic int maildir_expunge(struct index_mailbox *ibox, const char *path,
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainenstatic int maildir_sync_flags(struct index_mailbox *ibox, const char *path,
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen struct maildir_index_sync_context *ctx = context;
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen (void)maildir_filename_get_flags(path, &flags, keywords);
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen mail_index_sync_flags_apply(&ctx->sync_rec, &flags8, keywords);
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen newpath = maildir_filename_set_flags(path, flags8, keywords);
5fb3bff645380804c9db2510940c41db6b8fdb01Timo Sirainen if (mail_index_sync_set_dirty(ctx->sync_ctx, ctx->seq) < 0)
5fb3bff645380804c9db2510940c41db6b8fdb01Timo Sirainenstatic int maildir_sync_record(struct index_mailbox *ibox,
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen struct mail_index_sync_rec *sync_rec = &ctx->sync_rec;
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen for (seq = sync_rec->seq1; seq <= sync_rec->seq2; seq++) {
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen if (mail_index_lookup_uid(view, seq, &uid) < 0)
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen if (maildir_file_do(ibox, uid, maildir_expunge,
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen for (; ctx->seq <= sync_rec->seq2; ctx->seq++) {
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen if (mail_index_lookup_uid(view, ctx->seq, &uid) < 0)
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainen if (maildir_file_do(ibox, uid, maildir_sync_flags,
a27e065f1a1f91c7fbdf7c2ea1c387441af0cbb3Timo Sirainenint maildir_sync_last_commit(struct index_mailbox *ibox)
if (ret > 0) {
if (ret == 0) {
return ret;
static struct maildir_sync_context *
return ctx;
const char *old_fname)
int ret = 0;
t_push();
t_pop();
return ret;
const char *dir;
flags = 0;
if (move_new) {
} else if (new_dir) {
if (ret <= 0) {
if (ret < 0)
const char *filename;
int ret;
seq = 0;
seq++;
goto __again;
INDEX_KEYWORDS_BYTE_COUNT) != 0) {
if (ret < 0)
if (ret == 0) {
return ret;
if (cur_changed) {
if (ret == 0)
return ret;
int ret;
return ret;
int ret;
return ret;
int ret;
if (ret < 0)