Service interruption DRO1

We've now resolved the incident. Thanks for your patience. We apologize for any inconvenience caused by this.

We use Ceph as the underlying storage platform. Ceph is designed to storage data three times on three different servers. In short, if a device fails, it is thrown out of the cluster and Ceph stores the data it contained elsewhere. Normally, this does not have any impact.

In this case, Ceph recovered the data quickly. However, the node where the NVME drive was physically located, became unresponsive. Normally, Ceph should throw out this node from the cluster as well, however, for some reason Ceph decided to not do so.

We put a lot of effort in service availability and designed the system to prevent an issue like this. We will research why Ceph did not act like it should have and continue to offer stable services.

For now, all services are back in normal operation.

Thank you for your patience.

Find Your Subscription

Subscribe to Status Updates