Rivery US Console Outage
Incident Report for Rivery
Postmortem

The SingleStore Helios® service utilizes Kubernetes infrastructure to host database clusters. As part of the Managed Service, SingleStore constantly modernizes their networking infrastructure to keep up with the current versions of EKS.

On 28-Aug-2024, SingleStore detected an incompatibility in networking configurations in a small number of cells related to supporting customers that have a large number of IP addresses/endpoints, and network setup in less-than-modern network infrastructure in those cells. The impact was limited to 10/65 of their cells, all of which were running on AWS only. The Rivery US console utilized services hosted on one of the impacted cells, resulting in network connectivity issues.

SingleStore has restored networking connectivity for the affected clusters and are taking steps to standardize the networking infrastructure on these cells to ensure resilience, and minimize connectivity issues in the future.

The Rivery engineering team is also exploring failover solutions in order to minimize distruptions if an underlying system component experiences disruptions in the future.

Posted Aug 29, 2024 - 13:57 UTC

Resolved
We are marking this issue as resolved as the SingleStore team has confirmed that our regions are fully operational and we have seen the US console activity return to normal.

Additionally, we have cancelled any river runs that were in progress when the SingleStore outage began to ensure that the next run of your river will proceed as expected. The next scheduled river run will resume from the last successful run, ensuring that no data is missed. Please note that if you are using the “Append Only” loading method, you may notice duplicates in your target system.

Thank you again for your patience as we worked to resolve this issue. If you need any assistance from our team, please open up a ticket in the console via Help --> Contact Support and our support team will be happy to help you.
Posted Aug 28, 2024 - 21:52 UTC
Update
We are continuing to monitor for any further issues.
Posted Aug 28, 2024 - 20:14 UTC
Monitoring
We are seeing that service has been restored to the the US Rivery platform (console.rivery.io) and rivers are running as expected as of 19:25 UTC. We are continuing to monitoring the SingeStore Status Page as there is still an indicated service disruption (see: https://status.singlestore.com/)

The SingleStore incident started at 17:05 UTC and ended 19:25 UTC on 28-Aug-2024. Due to the outage, river runs that were in progress when the incident began may or may not have completed successfully.

In order to ensure all data is sent to your target system, we are cancelling any river runs that were in progress and the next scheduled river run will pick up all data from the last successful run.

Note that you may see duplicates in your Target if you are using the Append Only loading method.

We will provide additional details about this incident once we have more information from the SingleStore team.
Thank you for your patience as we worked to resolve this issue. If you need any assistance from our team, please open up a ticket in the console via Help --> Contact Support and our support team will be happy to help you.
Posted Aug 28, 2024 - 20:14 UTC
Identified
The Rivery US platform is experiencing issues due to an incident impacting one of our underlining system components (see https://status.singlestore.com).

There is a high probability that rivers may fail during this period.

We are in contact with SingleStore and are actively working to restore all services.
Posted Aug 28, 2024 - 18:17 UTC
Investigating
We are currently investigating an issue with the Activities page in the US Rivery platform (console.rivery.io) returning the error "Something went wrong, please try again later." We will update as soon as we have more information.
Posted Aug 28, 2024 - 17:43 UTC
This incident affected: Rivery Console (console.rivery.io).