Email Delivery Degradation

Resolved
Aug 18, 2023 at 12:37pm UTC

Post Mortem

Recently at ImprovMX, we encountered a challenge within our email forwarding service. An unexpected update to a server OS package on AWS disrupted our system's ability to automatically scale instances as demand increased.

Resolution and Recovery:

Our dedicated team responded quickly with a two-step plan:

Quick Fix: We redeployed our servers with a targeted fix to the issue caused by the unexpected update, enabling us to resume our service quickly.
Final Resolution: To ensure that such an incident does not recur, we revamped our deployment strategy. Now, instead of deploying new server instances with a fresh OS, we deploy them pre-installed. This approach not only eliminates nearly all deployment issues but also boosts deployment speed, resulting in much faster auto-scaling whenever there is increased demand or usage of our service.

Moving Forward:

We have learned from this incident and made significant improvements to our systems. We are more robust and resilient than ever before, and we remain committed to offering seamless service.

Transparency and Trust:

At ImprovMX, we believe in honest communication and accountability. We understand the importance of trust, and we assure you that we have taken all necessary steps to prevent such an occurrence in the future.

Thank you for your continued support and trust in ImprovMX.

Updated
Aug 17, 2023 at 8:56am UTC

Email forwarding is now resuming. Because of the massive amount of servers re-trying sending we are forwarding slower than usual, but we are actively monitoring our system to ensure things get back to normal levels.

We'll be sharing the post mortem here when it is ready.

Updated
Aug 17, 2023 at 8:25am UTC

A fix has been implemented and we are monitoring the results.

Updated
Aug 17, 2023 at 7:20am UTC

The issue has been identified and we are working on fixing it.

Created
Aug 17, 2023 at 4:09am UTC

We are experiencing a temporary degradation of email delivery.

Our team is currently conducting a detailed investigation to identify and resolve this matter. Please be assured that this situation is being treated with the highest priority, and we're committed to restoring optimal performance as quickly as possible.

UPDATE: The issue has been resolved, and the post mortem is now available.