Saturday, June 13, 2026

Real-World Oracle OCI PDB Refresh: Solving Clone Performance Bottlenecks Across Peered VCNs

 

Introduction

Recently, I had a task to refresh our UAT environment using a PDB from the production database hosted in Oracle Cloud Infrastructure (OCI).

At first, the activity looked straightforward. The plan was to perform a remote PDB clone from Production to UAT. However, there was one challenge from the beginning: the source and target databases were located in different OCI VCNs.

After establishing connectivity between the environments, I started the cloning process. The clone started successfully, but after approximately one hour, it appeared to stall, and no significant progress was observed.

After some investigation and testing, I identified two important factors that affected the cloning process:

  • The target DBCS was running with only 1 OCPU.
  • The production database had an hourly archive log backup and a delete job running through Commvault.

In this article, I will share the architecture, troubleshooting process, and lessons learned from this refresh activity.

1. Environment Overview

Source Environment

  • Production Oracle Database Cloud Service (DBCS)
  • Production DBCS is hosted in the Production VCN

Target Environment

  • UAT Oracle Database Cloud Service (DBCS)
  • Target DBCS is hosted in a separate Non-Production VCN

Refresh Method

  • Remote PDB Clone using Database Link

2. Challenge #1 – Different OCI VCNs

The first challenge was that the Production and UAT databases were deployed in different OCI VCNs.

Since remote PDB cloning requires communication between the source and target databases, I first needed to establish network connectivity.

To achieve this, I configured Local Peering Gateways (LPGs) between the two VCNs.

The implementation included:

2.1. Creating LPGs in both VCNs

2.1.1. Add LPG to the Non-Production (NPRD) VCN.

Log in to your tenancy in OCI, then go to the navigation menu → Networking → Virtual Cloud Network.

Select the correct compartment and click on the Non-Prod VCN.

Go to the Gateways → click Create Local Peering Gateway.


Select the correct compartment and click Create.

Now, our LPG has been created in the Non-Prod VCN, and its peering status is New - Not connected to a peer.



2.1.2. Add LPG to the Production (PRD) VCN. 

Go to Navigation Menu → Networking → Virtual Cloud Network. Select the Prod VCN, then go to Gateways and click Create Local Peering Gateway. Repeat the same steps as in the Non-Prod VCN.

Now, our LPG has been created in Prod VCN and its Peering status is New-Not connected to a PEER


2.2. Establishing peering between them

In prod, LPG click Establish Peering Connection


Then select the Non-Prod VCN and configure the Non-Prod LPG as an Unpeered Peer Gateway.

Now, the Peering status of both LPGs will change to Peered-Connected to a peer

2.3. Updating route tables

2.3.1. Modify the route tables on Prod VCN

Go to Navigation Menu → Networking → Virtual Cloud Network. Select the Prod VCN, then go to the Routing tab.


Go to the Route Rules tab, then click Add Route Rules button.


Enter the CIDR block for the Non-Prod VCN and the name of the Non-Prod LPG, and click Add Route Rules.

2.3.2. Modify the route tables on non-prod VCN

Do the same steps as in 2.3.1 on the Non-Prod VCN. However, this time enter the CIDR block for the Prod VCN and the name of the Prod LPG, then click Add Route Rules.

2.4. Verifying security rules

Establishing a Local Peering Gateway (LPG) between the production and non-production VCNs creates the network path, but traffic will still be blocked by default at the database layer. To allow the UAT database to initiate the clone, I needed to add the following rule to my Prod VCN.


3. Creating the Database Link

Once the network connectivity was in place, I created a database link from the target CDB to the production database.

3.1. Create the User in the Source (and optional dest) CDB

-- Run this on Source (Prod) CDB$ROOT
CREATE USER C##CLONE_USER IDENTIFIED BY "PASSWORD" CONTAINER=ALL;
GRANT CREATE SESSION, SELECT ANY DICTIONARY TO C##CLONE_USER CONTAINER=ALL;
GRANT SYSOPER TO C##CLONE_USER CONTAINER=ALL;

3.2. Create the Database Link in the Destination CDB

To create the DB link, I copied the Long CDB connection string from OCI.
 -- Run this on Destination (UAT) CDB$ROOT
CREATE DATABASE LINK CLONE_LINK 
CONNECT TO C##CLONE_USER IDENTIFIED BY "Cl12ON$$EU34$Re" 
USING '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=*****.oraclevcn.com)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME= *****.oraclevcn.com)))';

3.3. Check if the database link is working correctly

Run this on Destination (UAT) CDB$ROOT
 select name from v$database@clone_link;
After testing the database link and confirming successful connectivity, the environment was ready for the refresh operation.

4. First Clone Attempt

The UAT DBCS was configured with only 1 OCPU because it was mainly used for testing purposes.

I started the remote PDB clone operation. Initially, everything looked normal. The clone started successfully, and data transfer began.

However, after approximately one hour, the process appeared to stop making noticeable progress. The session remained active, but the clone was taking much longer than expected.

At this stage, I started investigating possible bottlenecks.

4.1. Clone command

The command used for the clone operation was:

 -- Run this on Destination (UAT) CDB$ROOT
[oracle@uat script]$ cat clone_pdb.sql
-- clone_pdb.sql
SET ECHO ON
SET TIME ON
SET TIMING ON
SET PAGESIZE 0
SET LINESIZE 200
PROMPT *** Starting PDB Clone at:
SELECT TO_CHAR(SYSDATE, 'YYYY-MM-DD HH24:MI:SS') FROM DUAL;

CREATE PLUGGABLE DATABASE UATPDB FROM PRODPDB@clone_link
  REFRESH MODE NONE
  PARALLEL 2
  KEYSTORE IDENTIFIED BY "PasswordForWallet";
  
PROMPT *** Opening PDB PRODPDB...
ALTER PLUGGABLE DATABASE UATPDB OPEN;
PROMPT *** Saving PDB State...
ALTER PLUGGABLE DATABASE UATPDB SAVE STATE;
PROMPT *** Clone Completed at:
SELECT TO_CHAR(SYSDATE, 'YYYY-MM-DD HH24:MI:SS') FROM DUAL;
EXIT;

To ensure the clone process continued even after disconnecting from the session, I executed the script in the background using nohup.

[oracle@uat ~]$ nohup sqlplus / as sysdba @/home/oracle/script/clone_pdb.sql > /home/oracle/script/pdb_clone_full.log 2>&1 & 

Explanation of Key Parameters in the script:

  • CREATE PLUGGABLE DATABASE ... FROM ...@clone_link
    Executes the remote PDB clone over the database link.
  • REFRESH MODE NONE
    Indicates a one-time clone (not refreshable).
  • PARALLEL
    Used to speed up the clone. In the first attempt, since the target had only 1 OCPU, I used PARALLEL 2.
  • KEYSTORE IDENTIFIED BY "PasswordForWallet"
    Required in OCI environments with TDE to open the wallet during the clone.

After creation, the script opens the PDB, saves its state, and logs start/end timestamps for tracking execution time.

5. Investigating the Target Database Resources

One of the first things I reviewed was the compute configuration of the target database.

The UAT database was running on:

  • Shape: VM.Standard3.Flex
  • OCPUs: 1
  • Memory: 16 GB
  • Network Bandwidth: 1 Gbps

The OCI console clearly showed that the VM was limited to approximately 1 Gbps network bandwidth.


The OCI console clearly showed that the VM was limited to approximately 1 Gbps network bandwidth.

Since a remote PDB clone involves transferring database blocks over the network and writing them to storage on the target system, this immediately became a potential bottleneck.

To test this theory, I temporarily increased the OCPUs from 1 to 8. Note that changing the shape (increasing OCPUs) will cause the DBCS to reboot.

To do this, I went to the navigation menu → Oracle AI Database → Oracle Base Database Service


In the DB System page, I selected the UAT database. Then I went to the Nodes tab, clicked the Actions button, and clicked Change Shape.


Then click ... in Configure OCPU and click Update OCPU Count.


I changed the OCPU number to 8 and clicked Update.

As the OCPUs increased, the shape resources scaled accordingly. 

The clone operation was significantly faster, and the overall database performance improved. This was a good reminder that in OCI, increasing OCPUs affects more than CPU capacity. It also increases available network bandwidth and storage performance, which can have a direct impact on large database operations.

6. Investigating Archive Log Management

While reviewing the production environment, I identified a potential risk related to archive log handling.

The production database was protected by Commvault with an hourly archive log backup job that also deletes archive logs after backup, similar to BACKUP ARCHIVELOG ALL DELETE INPUT;

Oracle requires an active archive log stream to finalize a remote PDB clone. If Commvault truncates these logs during the process, the clone may hang or fail with ORA errors because the required SCN sequence is no longer available for synchronization.

Since the clone runs for an extended period and includes a final sync phase, this behavior was identified as a potential risk. Increasing the CPU helped confirm this, as the clone progressed further and did not appear stuck.

To eliminate this risk, I temporarily stopped Commvault on production using:

commvault stop
--and verified it using:
commvault status

7. Second Clone Attempt

I did the second clone attempt after the follwoing changes in comparison with the first attempt:

  • Scaling the UAT DBCS from 1 OCPU to 8 OCPUs
  • Temporarily disabling the archive log backup-and-delete job
  • Changing parallel to 8 in the clone script

I executed the clone operation again. This time, the refresh completed successfully without any issues. The overall performance was significantly better compared to the original attempt.

8.  Decrease the number of OCPUs to 1

After the refresh was completed successfully, the additional resources were no longer required. To avoid unnecessary costs, I scaled the UAT DBCS back from 8 OCPUs to 1 OCPU.

Conclusion

Although this started as a routine PDB refresh, it became a valuable troubleshooting exercise involving OCI networking, infrastructure sizing, and operational processes.

By configuring Local Peering Gateways, scaling the database from 1 to 8 OCPUs, and reviewing archive log management, the refresh was successfully completed.

This experience highlighted that PDB clone performance issues are not always database-related—network, compute, storage, and operational factors can all have an impact.


No comments:

Post a Comment

Real-World Oracle OCI PDB Refresh: Solving Clone Performance Bottlenecks Across Peered VCNs

  Introduction Recently, I had a task to refresh our UAT environment using a PDB from the production database hosted in Oracle Cloud Infrast...