Sunday, May 31, 2026

Part 3: Lessons Learned – Upgrading a Solaris Non-Global Zone Using the Wrong Zone Image

 

Introduction

In Part 2 of this series, I described how I upgraded several Solaris Non-Global Zones using an out-of-place upgrade method. Because the zones did not have enough temporary working space for a direct IPS upgrade, I copied the zone filesystems to a larger storage area, performed the package update there, and then synchronized the updated files back to the original zones.

The approach worked, but it also introduced additional manual steps compared to a standard upgrade. As with many administrative tasks, the technical procedure itself was not the biggest challenge. The real challenge was making sure each step was performed against the correct zone and the correct filesystem.

During the upgrade process, I made two mistakes that led to some unexpected troubleshooting. First, I accidentally synchronized files from the wrong upgraded zone image, causing one Non-Global Zone to boot with another zone's hostname, IP address, and listener configuration. Later, while correcting the network settings, I mistakenly executed an IP configuration command in the Global Zone instead of the target Non-Global Zone, which immediately disconnected my SSH session and required recovery through the server's ILOM console.

Fortunately, neither issue resulted in data loss, and both were recoverable. This article documents what happened, how the problems were identified, the recovery steps I followed, and the lessons I took away from the experience.

1. Correcting the Zone Configuration

After the zone booted, it became obvious that the operating system configuration had been copied from the wrong zone image.

The database files, control files, and application data still belonged to test2, but several operating system settings belonged to test1. The hostname was incorrect, the IP address did not match the expected configuration, and the listener configuration was pointing to the wrong environment.

The first step was to verify the network configuration inside the affected zone.

zlogin test2
ipadm show-addr
dladm show-vnic

The output confirmed that the zone was not using the correct network configuration.


1.1. Recreating the Correct IP Configuration

To correct the network settings, I removed the existing IP configuration and recreated it using the proper address assigned to test2.

A quick warning based on my own mistake: double-check that you are inside the correct Non-Global Zone before running any ipadm commands using the zonename command. While fixing test2, I accidentally ran similar commands in the Global Zone and immediately lost network connectivity to the server. I was able to recover the system through ILOM, and I cover that experience later in this post.

# double-check that you are inside the correct Non-Global Zone

# double-check that you are inside the correct Non-Global Zone
zonename
ipadm delete-ip net0 2>/dev/null
ipadm create-ip net0
ipadm create-addr -T static -a 10.1.1.128/22 net0/v4
ipadm show-addr

I also found an old disabled address object that no longer belonged to the zone and removed it.

ipadm delete-addr znet0/v4
ipadm delete-ip znet0
ipadm show-addr

At this point, the zone was reachable using the correct IP address.


1.2. Correcting the Hostname

Although the network was fixed, the zone still identified itself as test1 because the hostname information had been copied from the wrong image.

To correct this, I updated the Solaris identity service configuration.

svccfg -s system/identity:node setprop config/nodename = astring: test2
svccfg -s system/identity:node refresh
svcadm restart system/identity:node

To immediately update the current session, I also changed the runtime hostname.

hostname test2
#Verification:
hostname

The hostname is now correctly reported as test2.

1.3. Updating /etc/hosts

The final step was updating the local hostname resolution.

vi /etc/hosts

I replaced the incorrect hostname entries copied from test1 and verified that the correct IP address and hostname for test2 were present.

This ensured that local name resolution, listener configuration, and application services would reference the correct server identity.


2. A Second Mistake: Accidentally Changing the Global Zone IP

While correcting the network configuration, I made another mistake that caused a much larger problem.

Instead of creating the IP address inside the Non-Global Zone, I accidentally executed the commands in the Global Zone:

ipadm create-ip net0
ipadm create-addr -T static -a 10.1.1.128/22 net0/v4

As soon as the command completed, my SSH session disconnected.

At that moment I realized I had overwritten the Global Zone network configuration instead of modifying the Non-Global Zone. Since the public IP address of the server had changed, remote connectivity was immediately lost.

Fortunately, the server's ILOM interface was still accessible.

2.1. Recovering Through ILOM

I connected to the server through the ILOM management interface and opened the system console.

After logging in as root, I inspected the network configuration.

ipadm show-addr 
#The incorrect address was visible on the Global Zone interface.
#I removed the mistakenly created address.
ipadm delete-addr net0/v4
#Then I recreated the original address using the correct production IP.
ipadm create-addr -T static -a 10.1.1.126/22 net0/v4
#To verify the repair:
ipadm show-addr net0/v4

Finally, I tested connectivity by pinging the gateway and confirmed that network access had been restored.

Only after verifying connectivity did I disconnect from the ILOM console and reconnect through SSH.

Conclusion

In the end, the upgrade was successful, but during the process I made a couple of mistakes that created additional work. First, I synchronized the files from the wrong upgraded zone image, which caused the zone to come up with the wrong hostname, IP address, and listener configuration. Then, while fixing the network settings, I accidentally ran the IP configuration commands in the Global Zone and lost my connection to the server.

Fortunately, both issues were recoverable and no application data was lost. The biggest lesson I learned from this experience is to always double-check the source and destination paths before running any synchronization command and make sure you are working in the correct zone before making network changes. A few seconds of verification can save a lot of troubleshooting time later.

No comments:

Post a Comment

Part 3: Lessons Learned – Upgrading a Solaris Non-Global Zone Using the Wrong Zone Image

  Introduction In Part 2 of this series, I described how I upgraded several Solaris Non-Global Zones using an out-of-place upgrade method....