Introduction
In Part 2 of this series, I described how I upgraded several Solaris Non-Global Zones using an out-of-place upgrade method. Because the zones did not have enough temporary working space for a direct IPS upgrade, I copied the zone filesystems to a larger storage area, performed the package update there, and then synchronized the updated files back to the original zones.
The approach worked, but it also introduced additional manual steps compared to a standard upgrade. As with many administrative tasks, the technical procedure itself was not the biggest challenge. The real challenge was making sure each step was performed against the correct zone and the correct filesystem.
During the upgrade process, I made two mistakes that led to some unexpected troubleshooting. First, I accidentally synchronized files from the wrong upgraded zone image, causing one Non-Global Zone to boot with another zone's hostname, IP address, and listener configuration. Later, while correcting the network settings, I mistakenly executed an IP configuration command in the Global Zone instead of the target Non-Global Zone, which immediately disconnected my SSH session and required recovery through the server's ILOM console.
Fortunately, neither issue resulted in data loss, and both were recoverable. This article documents what happened, how the problems were identified, the recovery steps I followed, and the lessons I took away from the experience.
1. Correcting the Zone Configuration
After the zone booted, it became obvious that the operating system
configuration had been copied from the wrong zone image.
The database files, control files, and application data still
belonged to test2, but several operating system settings belonged to test1.
The hostname was incorrect, the IP address did not match the expected
configuration, and the listener configuration was pointing to the wrong
environment.
The first step was to verify the network configuration inside the
affected zone.
zlogin test2
ipadm show-addr
dladm show-vnic
The output confirmed that the zone was not using the correct
network configuration.
1.1. Recreating the Correct IP Configuration
To correct the network settings, I removed the existing IP
configuration and recreated it using the proper address assigned to test2.
A quick warning based on my own mistake: double-check that you are
inside the correct Non-Global Zone before running any ipadm commands using the zonename
command. While fixing test2, I accidentally ran similar commands in the Global
Zone and immediately lost network connectivity to the server. I was able to
recover the system through ILOM, and I cover that experience later in this
post.
# double-check that you are inside the correct Non-Global Zone
# double-check that you are inside the correct Non-Global Zone
zonename
ipadm delete-ip net0 2>/dev/null
ipadm create-ip net0
ipadm create-addr -T static -a 10.1.1.128/22 net0/v4
ipadm show-addr
I also found an old disabled address object that no longer belonged
to the zone and removed it.
ipadm delete-addr znet0/v4
ipadm delete-ip znet0
ipadm show-addr
At this point, the zone was reachable using the correct IP address.
1.2. Correcting the Hostname
Although the network was fixed, the zone still identified itself as test1 because the hostname information had been copied from the wrong image.
To correct this, I updated the Solaris identity service
configuration.
svccfg -s system/identity:node setprop config/nodename = astring: test2
svccfg -s system/identity:node refresh
svcadm restart system/identity:node
To immediately update the current session, I also changed the
runtime hostname.
hostname test2
#Verification:
hostname
The hostname is now correctly reported as test2.
1.3. Updating /etc/hosts
The final step was updating the local hostname resolution.
vi /etc/hostsI replaced the incorrect hostname entries copied from test1 and
verified that the correct IP address and hostname for test2 were present.
This ensured that local name resolution, listener configuration,
and application services would reference the correct server identity.
2. A Second Mistake: Accidentally Changing the Global Zone IP
While correcting the network configuration, I made another mistake
that caused a much larger problem.
Instead of creating the IP address inside the Non-Global Zone, I
accidentally executed the commands in the Global Zone:
ipadm create-ip net0
ipadm create-addr -T static -a 10.1.1.128/22 net0/v4As soon as the command completed, my SSH session disconnected.
At that moment I realized I had overwritten the Global Zone network
configuration instead of modifying the Non-Global Zone. Since the public IP
address of the server had changed, remote connectivity was immediately lost.
Fortunately, the server's ILOM interface was still accessible.
2.1. Recovering Through ILOM
I connected to the server through the ILOM management interface and opened the system console.
After logging in as root, I inspected the network configuration.
ipadm show-addr
#The incorrect address was visible on the Global Zone interface.
#I removed the mistakenly created address.
ipadm delete-addr net0/v4
#Then I recreated the original address using the correct production IP.
ipadm create-addr -T static -a 10.1.1.126/22 net0/v4
#To verify the repair:
ipadm show-addr net0/v4
Finally, I tested connectivity by pinging the gateway and confirmed
that network access had been restored.
Only after verifying connectivity did I disconnect from the ILOM
console and reconnect through SSH.
Conclusion
In the end, the upgrade was successful, but during the process I
made a couple of mistakes that created additional work. First, I synchronized
the files from the wrong upgraded zone image, which caused the zone to come up
with the wrong hostname, IP address, and listener configuration. Then, while
fixing the network settings, I accidentally ran the IP configuration commands
in the Global Zone and lost my connection to the server.
Fortunately, both issues were recoverable and no application data
was lost. The biggest lesson I learned from this experience is to always
double-check the source and destination paths before running any
synchronization command and make sure you are working in the correct zone
before making network changes. A few seconds of verification can save a lot of
troubleshooting time later.
.png)
No comments:
Post a Comment