Back to overview

Email Delivery Degradation

Aug 17 at 06:09am CEST
Affected services

Aug 18 at 02:37pm CEST

Post Mortem

Recently at ImprovMX, we encountered a challenge within our email forwarding service. An unexpected update to a server OS package on AWS disrupted our system's ability to automatically scale instances as demand increased.

Resolution and Recovery:

Our dedicated team responded quickly with a two-step plan:

  1. Quick Fix: We redeployed our servers with a targeted fix to the issue caused by the unexpected update, enabling us to resume our service quickly.

  2. Final Resolution: To ensure that such an incident does not recur, we revamped our deployment strategy. Now, instead of deploying new server instances with a fresh OS, we deploy them pre-installed. This approach not only eliminates nearly all deployment issues but also boosts deployment speed, resulting in much faster auto-scaling whenever there is increased demand or usage of our service.

Moving Forward:

We have learned from this incident and made significant improvements to our systems. We are more robust and resilient than ever before, and we remain committed to offering seamless service.

Transparency and Trust:

At ImprovMX, we believe in honest communication and accountability. We understand the importance of trust, and we assure you that we have taken all necessary steps to prevent such an occurrence in the future.

Thank you for your continued support and trust in ImprovMX.

Aug 17 at 10:56am CEST

Email forwarding is now resuming. Because of the massive amount of servers re-trying sending we are forwarding slower than usual, but we are actively monitoring our system to ensure things get back to normal levels.

We'll be sharing the post mortem here when it is ready.

Aug 17 at 10:25am CEST

A fix has been implemented and we are monitoring the results.

Aug 17 at 09:20am CEST

The issue has been identified and we are working on fixing it.

Aug 17 at 06:09am CEST

We are experiencing a temporary degradation of email delivery.

Our team is currently conducting a detailed investigation to identify and resolve this matter. Please be assured that this situation is being treated with the highest priority, and we're committed to restoring optimal performance as quickly as possible.

UPDATE: The issue has been resolved, and the post mortem is now available.