Some email deliveries delayed

Posted at 2021-09-16T03:16:00-07:00 by Scott

Impact

Some users may have only just received emails sent 3-7 days ago. Most of these emails are probably spam, but some may be legitimate.

Timeline

  • The underlying issue was introduced via code change on Sep 10
  • Staff noticed the problem on Sep 15

Technical details

A code change was introduced to log the Message-IDs (which are unique identifier headers) of mails when delivering them. However, if the mail did not have a Message-ID, delivery would fail. Almost all messages will have Message-IDs, with only low-effort spammers lacking them.

Normally this kind of code oversight (a “null” Message-ID) would be detected when the code was compiled, but the third-party library parsing the Message-IDs was written in Java (which does not properly account for nulls in its type system) and accessed in Kotlin code (which does account for nulls, and when interfacing with Java code, raises errors at runtime if they are present when not allowed).

Normally this kind of trivial error would be detected and fixed immediately, as it’s very evident in error logs. However, staff were relying on a convenient error-logging mechanism that was broken in the same code change, due to a runtime cyclic dependency injection error that only affected the utility server running support code.

Remediation

  • The immediate issue (null/not-null mixup) was both trivial and relatively unavoidable in the programming language we use. Not much can be currently done to prevent the same mistake in the future. As we use less flawed Java code, the risk of its recurrence should go down.
  • We’ve added additional alarms for the utility server, to notify staff when it’s experiencing problems.
  • Additionally, we’ve added tests for the specific cyclic dependency injection error to ensure that the utility server won’t fail to start.