We implemented a fix and are currently monitoring the result.
Identified
Identified
We have made some changes and are currently rebooting the hostsystem in order to apply the changes.
Additionally, the NVMe cache will be disabled to test whether this issue is caused by the caching method we are currently using.
Investigating
Investigating
Any IO operations on the HDD pool are not possible again. We are currently investigating the issue and trying to get the HDD pool operational.
Monitoring
Monitoring
A routine Raid-Check resulted in a bug which locked up read/write operations.
We have managed to restore normal operation, but with degraded speed of the HDD-Pool in order to prevent another issue. As soon as we have found a solution, full speed will be restored.
In order to prevent data-loss, all servers will be stopped while this maintenance is conducted.
In progress
16 April, 2025 at 6:05 PM
In progress
16 April, 2025 at 6:05 PM
We are going to migrate all servers onto another hostsystem which we've deployed today. Thanks to the new hostsystem being an Intel Xeon with enough ressources, it won't be necessary to stop any servers. Therefore, we don't expect any significant outages caused by the maintenance. There may be a reboot of the servers, which shouldn't last longer than a few minutes.
Planned
16 April, 2025 at 12:00 PM
Planned
16 April, 2025 at 12:00 PM
The issue doesn't seem to be solved permanently and occurs from time to time, causing downtimes.
We are planning a maintenance for 17.04.2025, where all VMs will be migrated onto another hostsystem temporarily in order to reconfigure the Storage layout of the Intel Xeon HDD Pool.
All affected customers with Storage Servers will be awarded 5 days of runtime and all customers with Intel Xeon servers 2 days of runtime. Please open a ticket after the maintenance has finished in order to receive the compensation. All tickets opened before the maintenance has finished will be put on hold and processed once we have finished the migration.
We are extremely sorry for the inconvenience caused by these incidents and hope you can keep your trust in us after we've finished this maintenance, which will solve the problems permanently in order to deliver the quality and reliability we always strive to achieve.