Skip to content

Fix rollback disk snapshots on instance snapshot failure#12949

Open
sureshanaparti wants to merge 1 commit intoapache:4.22from
shapeblue:fix-rollback-disk-snapshots-on-vm-snapshot-failure
Open

Fix rollback disk snapshots on instance snapshot failure#12949
sureshanaparti wants to merge 1 commit intoapache:4.22from
shapeblue:fix-rollback-disk-snapshots-on-vm-snapshot-failure

Conversation

@sureshanaparti
Copy link
Copy Markdown
Contributor

@sureshanaparti sureshanaparti commented Apr 2, 2026

Description

This PR fixes rollback disk snapshots on instance snapshot failure.

Fixes #12927

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@sureshanaparti
Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2026

Codecov Report

❌ Patch coverage is 27.27273% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.60%. Comparing base (4708121) to head (bd1ca29).

Files with missing lines Patch % Lines
.../storage/vmsnapshot/StorageVMSnapshotStrategy.java 27.27% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               4.22   #12949   +/-   ##
=========================================
  Coverage     17.60%   17.60%           
+ Complexity    15677    15676    -1     
=========================================
  Files          5918     5918           
  Lines        531681   531686    +5     
  Branches      65005    65006    +1     
=========================================
+ Hits          93623    93624    +1     
- Misses       427498   427502    +4     
  Partials      10560    10560           
Flag Coverage Δ
uitests 3.70% <ø> (ø)
unittests 18.68% <27.27%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17339

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes cleanup/rollback of per-volume disk snapshots when an instance (VM) snapshot operation fails, preventing orphaned snapshots and incorrect snapshot resource counts (issue #12927).

Changes:

  • Track created disk snapshots for rollback using a Map<volumeId, SnapshotInfo> to ensure snapshots are rolled back even if creation fails mid-way.
  • Harden rollback logic with null checks and improve rollback logging.
  • Update the KVM VM snapshot strategy unit test to match the new createDiskSnapshot method signature.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
engine/storage/snapshot/src/main/java/org/apache/cloudstack/storage/vmsnapshot/StorageVMSnapshotStrategy.java Track snapshots for rollback via volume-id map; ensure rollback attempts occur on failures; add null-safety/logging improvements.
engine/storage/snapshot/src/test/java/org/apache/cloudstack/storage/vmsnapshot/VMSnapshotStrategyKVMTest.java Adjust unit test to pass the new rollback map parameter to createDiskSnapshot.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

snapshotInfo = snapshotStrategy.takeSnapshot(snapshotInfo);
if (snapshotInfo == null) {
throw new CloudRuntimeException("Failed to create snapshot");
} else {
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

volumeToSnapshotInfoMapForRollback.put(vol.getId(), snapshotInfo) is performed twice (before and after takeSnapshot). If the intent is to ensure rollback coverage when takeSnapshot throws, keep the pre-takeSnapshot put, but drop the else block and only overwrite the map entry if takeSnapshot returns a different SnapshotInfo instance. This removes redundant writes and makes the intent clearer.

Suggested change
} else {
}
if (snapshotInfo != volumeToSnapshotInfoMapForRollback.get(vol.getId())) {

Copilot uses AI. Check for mistakes.
Comment on lines 454 to 466
snapshot = snapshotDao.persist(snapshot);
vol.addPayload(setPayload(vol, snapshot, quiescevm));
SnapshotInfo snapshotInfo = snapshotDataFactory.getSnapshot(snapshot.getId(), vol.getDataStore());
snapshotInfo.addPayload(vol.getpayload());
volumeToSnapshotInfoMapForRollback.put(vol.getId(), snapshotInfo);
SnapshotStrategy snapshotStrategy = storageStrategyFactory.getSnapshotStrategy(snapshotInfo, SnapshotOperation.TAKE);
if (snapshotStrategy == null) {
throw new CloudRuntimeException("Could not find strategy for snapshot uuid:" + snapshotInfo.getUuid());
}
snapshotInfo = snapshotStrategy.takeSnapshot(snapshotInfo);
if (snapshotInfo == null) {
throw new CloudRuntimeException("Failed to create snapshot");
} else {
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new rollback behavior relies on adding the snapshot to the rollback map before snapshotStrategy.takeSnapshot(...) runs so that failures/exceptions still get cleaned up. There’s no unit test covering the failure path (e.g., takeSnapshot throws/returns null) and verifying that rollback deletes the persisted snapshot record. Please add a test for this scenario to prevent regressions of #12927.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing some context, hence my questions?!?

vol.addPayload(setPayload(vol, snapshot, quiescevm));
SnapshotInfo snapshotInfo = snapshotDataFactory.getSnapshot(snapshot.getId(), vol.getDataStore());
snapshotInfo.addPayload(vol.getpayload());
volumeToSnapshotInfoMapForRollback.put(vol.getId(), snapshotInfo);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it might be added twice right? so why the else at line 466?

}
if (!result) {
for (SnapshotInfo snapshotInfo : forRollback) {
for (SnapshotInfo snapshotInfo : volumeToSnapshotInfoMapForRollback.values()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems the only usage, why not a HashSet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

snapshot resourcecount not updated when deleting a failed VM snapshot

4 participants