Sunday, January 21, 2018

Backup and Recovery : #03


Whole database backup:

The whole backup that is taken when the database is closed (after the database is shut down using the NORMAL, IMMEDIATE, or TRANSACTIONAL options) is called a consistent backup. In such a backup, all the database file headers are consistent with the control file, and when restored completely, the database can be opened without any recovery. When the database is operated in Noarchivelog mode, only a consistent whole database backup is valid for restore and recovery.

When the database is open and operational, the datafile headers are not consistent with the control file unless the database is open in read-only mode. When the database is shut down with the ABORT option this inconsistency persists. Backups of the database in such a state are termed as an inconsistent backup. Inconsistent backups need recovery to bring the database into a consistent state. When databases need to be available 7 days a week, 24 hours a day, you have no option but to use an inconsistent backup, and this can be performed only on databases running in Archivelog mode.



Tablespace Backup:
A tablespace backup is a backup of the datafiles that make up a tablespace. Tablespace backups are valid only if the database is in Archivelog mode because redo entries will be required to make the datafiles consistent with the rest of the database. You can make tablespace backups when the tablespace is read-only or offline-normal in Noarchivelog mode.

Datafile Backups:
You can make backups of a single datafile if your database is in Archivelog mode.

Control File Backups:
You can configure RMAN for automatic backups of the control file after a BACKUP or COPY command is issued. The control file can also be backed up through SQL commands.

User-Managed Backup and Recovery

User-managed backup and recovery does not use Recovery Manager. Operating system commands are used to make backups of the database files and to restore them in a recovery situation. The recovery commands are issued in a SQL*Plus session.

Querying Views to Obtain Database File Information:
Obtain information about the files of the database by querying the V$DATAFILE, V$CONTROLFILE, V$LOGFILE, and V$TABLESPACE views

Backup Method:
A user-managed database backup is an operating system backup of database files while the database is open or closed.

Physical Backup Methods
Operating system backup without archiving is used to recover to the point of the last backup after a media failure.
Operating system backup with archiving is used to recover to the point of failure after a media failure

Consistent Whole Database Backup
A consistent whole database backup, also known as a closed database backup, is a backup that is taken of all the datafiles and control files that constitute an Oracle database while the database is closed. It can also include the online redo log files, parameter file, and the password file.Ensure that the complete pathnames of the files are noted and used appropriately in the backup.

* It is not necessary to include the online redo log files as part of a whole database backup, if the database has been shut down cleanly, by using a normal, transactional, or immediate option. However, in cases where it is necessary to restore the entire database, the process is simplified if they have been backed up.

Advantages of Making Consistent Whole Database Backups:

A consistent whole database backup is conceptually simple because all you need to do is:

– Shut down the database
– Copy all required files to the backup location
– Open the database

Disadvantages of Making Consistent Whole Database Backups:
  • For business operations where the database must be continuously available, a consistent whole database backup is unacceptable because the database is shutdown and unavailable during the backup
  • The amount of time that the database is unavailable is affected by the size of the database,the number of datafiles, and the speed with which the copy operations on the data files can be performed. If this amount of time exceeds the allowable down time, you must choose another type of backup.
  • The recovery point is only to the last full consistent whole database backup, and lost transactions may have to be entered manually following a recovery operation

Performing a Consistent Whole Database Backup:

Perform a consistent whole database backup while the Oracle server instance is shut down.

1. Compile an up-to-date listing of all relevant files to back up.
2. Shut down the Oracle instance with the SHUTDOWN NORMAL, SHUTDOWN IMMEDIATE, or SHUTDOWN TRANSACTIONAL command.
3. Back up all datafiles and control files by using an operating system backup utility. You can also include the redo log files although it is not required. You should also backup the parameter file and the password file
4. Restart the Oracle instance.

Backup and Recovery : #02

Instance and Media Recovery Structures



Oracle Instance

An Oracle instance consists of memory areas (mainly System Global Area [SGA]) and background processes, namely PMON, SMON, DBWn, LGWR, and CKPT. An instance is created during the nomount stage of the database startup after the parameter file has been read.If any of these processes terminate, the instance shuts down.

Memory Structures


Database buffer cache:
Memory area used to store blocks read from data files.Data is read into the blocks by server processes and written out by DBWn asynchronously.

Shared pool:
Stores parsed versions of SQL statements, PL/SQL procedures, and data dictionary information

Log buffer:
Memory containing before and after image copies of changed data to be written to the redo logs.

Large pool:
An optional area in the SGA that provides large memory allocations for backup and restore operations, I/O server processes, and session memory for the shared server and Oracle XA.

Background Processes


Database writer(DBWn):
Writes dirty buffers from the data buffer cache to the data files. This activity is asynchronous.

Log writer (LGWR):
Writes data from the redo log buffer to the redo log files

System monitor(SMON):
Performs automatic instance recovery. Recovers space in temporary segments when they are no longer in use. Merges contiguous areas of free space depending on parameters that are set.

Process monitor(PMON):
Cleans up the connection/server process dedicated to an abnormally terminated user process. Performs rollback and releases the resources held by the failed process.

Checkpoint (CKPT):
Synchronizes the headers of the data files and control files with the current redo log and checkpoint numbers.

Archiver (ARCn)(optional):
A process that automatically copies redo logs that have been marked for archiving.

The User Process:
The user process is created when a user starts a tool such as SQL*Plus, Oracle Forms Developer, Oracle Reports Developer, Oracle Enterprise Manager, and so on. This process might be on the client or server, and provides an interface for the user to enter commands that interact with the database.

The Server Process
The server process accepts commands from the user process and performs steps to complete user requests. If the database is not in a shared server configuration, a server process is created on the machine containing the instance when a valid connection is established.

Oracle Database
An Oracle database consists of the following physical files:

Datafiles(Binary):
Physical storage of data. At least one file is required per database. This file stores the system tablespace.

Redo logs file (Binary):
Contain before and after image copies of changed data, for recovery purposes. At least two groups are required

Control files(Binary):
Record the status of the database, physical structure, and RMAN meta data

Parameter file (Text):
Store parameters required for instance startup

Server parameter file(Binary):
Store persistent parameters required for instance startup

Password file(optional)(Binary):
Store information on users who can start, stop, and recover the database

Archive logs(optional)(Binary):
Physical copies of the online redo log files.Created when the database is set in Archivelog mode. Used in recovery.


Large Pool


The large pool is used to allocate sequential I/O buffers from shared memory. For I/O slaves and Oracle backup and restore.
Recovery Manager (RMAN) uses the large pool for backup and restore when you set the DBWR_IO_SLAVES or BACKUP_TAPE_IO_SLAVES parameters to simulate asynchronous I/O.

Sizing the Large Pool:

If LARGE_POOL_SIZE is set, then Oracle attempts to get memory from the large pool
If the LARGE_POOL_SIZE initialization parameter is not set, then the Oracle server attempts to allocate shared memory buffers from the shared pool in the SGA.

DBWR_IO_SLAVES: This parameter specifies the number of I/O slaves used by the DBWn process. The DBWn process and its slaves always write to disk. By default, the value is 0 and I/O slaves are not used.

BACKUP_TAPE_IO_SLAVES: It specifies whether I/O slaves are used by the Recovery Manager to backup, copy, or restore data to tape

To determine how the large pool is being used, query V$SGASTAT


Database Buffer Cache, DBWn and Datafiles:



Function of the Database Buffer Cache:

The database buffer cache is an area in the SGA that is used to store the most recently used data blocks.
The server process reads tables, indexes, and rollback segments from the data files into the buffer cache where it makes changes to data blocks when required.
The Oracle server uses a least recently used (LRU) algorithm to determine which buffers can be overwritten to accommodate new blocks in the buffer cache.

Function of the DBWn Background Process:

The database writer process (DBWn) writes the dirty buffers from the database buffer cache to the data files. It ensures that sufficient numbers of free buffers— buffers that can be overwritten when server processes need to read in blocks from the data files—are available in the database buffer cache.

The database writer regularly synchronizes the database buffer cache and the data files, this is the checkpoint event triggered in various situations

Although one database writer process is adequate for most systems, you can configure additional processes to improve write performance if your system modifies data heavily

Data Files
Data files store both system and user data on a disk. This data may be committed or uncommitted

Data Files Containing Only Committed Data:
This is normal for a closed database, except when failure has occurred or the "shutdown abort" option has been used. If the instance is shutdown using the normal, immediate or transactional option, the data files contain only committed data. This is because all uncommitted data is rolled back, and a checkpoint is issued to force all committed data to a disk.

Data Files Containing Uncommitted Data:
While an instance is started, datafiles can contain uncommitted data. This happens when data has been changed but not committed (the changed data is now in the cache), and more space is needed in the cache, forcing the uncommitted data off to disk. Only when all users eventually commit will the data files contain only committed data. In the event of failure, during subsequent recovery, the redo logs and rollback segments are used to synchronize the datafiles.

Configuring Tablespaces:
Tablespaces contain one or more datafiles. It is important that tablespaces are created carefully to provide a flexible and manageable backup and recovery strategy


System: Backup and recovery is more flexible if system and user data is contained in different tablespaces.
Temporary: If the tablespace containing temporary segments (used in sort, and so on) is lost, it can be re-created, rather than recovered.
Undo: The procedures for backing up undo tablespaces are exactly the same as for backing up any other read/write tablespace. Because the automatic undo tablespace is so important for recovery and for read consistency, you should back it up frequently.
Read-only data: Backup time can be reduced because a tablespace must be backed up only when the tablespace is made read-only.
Highly volatile data: This tablespace should be backed up more frequently, also reducing recovery time.
Index data: Tablespaces to store only index segments should be created. These tablespaces can often be re-created instead of recovered.


Redo Log Buffer, LGWR and Redo Log Files:



Function of the Redo Log Buffer
The redo log buffer is a circular buffer that holds information about changes made to the database. This information is stored in redo entries.
Redo entries contain the information necessary to reconstruct, or redo, changes made to the database by INSERT, UPDATE, DELETE, CREATE, ALTER, or DROP operations. Redo entries are used for database recovery, if necessary.
Redo entries are copied by Oracle server processes from the user’s memory space to the redo log buffer.

Function of the LGWR Background Process
The log writer (LGWR) writes redo entries from the redo log buffer to the redo log files as follows:
  • When the redo log buffer is one-third full
  • When a timeout occurs (every three seconds)
  • When there is 1 MB of redo
  • Before DBWn writes modified blocks in the database buffer cache to the data files 
  • When a transaction commits


Redo Log Files:

Redo log files store all changes made to the database. If the database is to be recovered to a point in time when it was operational, redo logs are used to ensure that all committed transactions are committed to disk, and all uncommitted transactions are rolled back. The important points relating to redo log files are as follows:
LGWR writes to redo log files in a circular fashion. This behavior results in all members of a log file group being overwritten.
Although it is mandatory to have at least two log groups to support the cyclic nature, in most cases, you would need more than two redo log groups.

Redo Log Switches
  • At a log switch, the current redo log group is assigned a log sequence number that identifies the
  • information stored in that redo log group and is also used for synchronization.
  • log switch occurs when LGWR stops writing to one redo log group and begins writing to another.
  • log switch occurs when LGWR has filled one log file group.
  • DBA can force a log switch by using the ALTER SYSTEM SWITCH LOGFILE command.
  • Checkpoint occurs automatically at a log switch.


CKPT Process

Database checkpoints ensure that all modified database buffers are written to the database files.
The database header files are then marked current, and the checkpoint sequence number is recorded in the control file. Checkpoints synchronize the buffer cache by writing all buffers to disk whose corresponding redo entries were part of the log file being checkpointed.

Checkpoint Process (CKPT) Features
  • The CKPT process is always enabled.
  • The CKPT process updates file headers at checkpoint completion.
  • More frequent checkpoints reduce the time needed for recovering from instance failure at the possible expense of performance.

Synchronization

  • At each checkpoint, the checkpoint number is updated in every database file header and in the control file.
  • The checkpoint number acts as a synchronization marker for redo, control, and data files. If they have the same checkpoint number, the database is considered to be in a consistent state.
  • Information in the control file is used to confirm that all files are at the same checkpoint number during database startup. Any inconsistency between the checkpoint numbers in the various file headers results in a failure, and the database cannot be opened. Recovery is required.

Saturday, January 20, 2018

Backup and Recovery : #01

Prepare Backup and Recovery Strategy


Whatever backup strategy you choose, it is important to obtain agreement from all appropriate levels of management.
  • Business requirements
  • Operational requirements
  • Technical considerations
  • Management concurrence
Business Impact:
==================

You should understand the impact that down time has on the business. Management must quantify the cost of down time and the loss of data and compare this with the cost of reducing down time and minimizing data loss.

MTTR Database availability is a key issue for a DBA. In the event of a failure the DBA should strive to reduce the Mean-Time-To-Recover (MTTR). This strategy ensures that the database is unavailable for the shortest possible amount of time. Anticipating the types of failures that can occur and using effective recovery strategies, the DBA can ultimately reduce the MTTR.

MTBF Protecting the database against various types of failures is also a key DBA task. To do this, a DBA must increase the Mean-Time-Between-Failures (MTBF). The DBA must understand the backup and recovery structures within an Oracle database environment and configure the database so that failures do not often occur.

Evolutionary Process A backup and recovery strategy evolves as business, operational, and technical requirements change. It is important that both the DBA and appropriate management review the validity of a backup and recovery strategy on a regular basis.


Operational Requirements:
=================================
24-Hour Operations :In a situation where a database must be available 24 hours a day, 7 days a week for continuous operation. Proper database configuration is necessary to support these operational requirements because they directly affect the technical aspects of the database environment


Technical Considerations:
=============================

  • Resources: hardware, software, manpower, and time 
  • Physical image copies of the operating system files
  • Logical copies of the objects in the database
  • Transaction volume which affects desired frequency of backups


Here are some questions to consider when selecting a backup strategy:

How much data do you have?
Do you have the machine power and capacity to support backups?
Is the data easily recreated?
Can you reload the data into the database from a flat file?
Does the database configuration support resiliency to different types of failures?


Disaster Recovery Issues:
==============================

data is so important that you must ensure resiliency even in the event of a complete system failure. Natural disasters and other issues can affect the availability of your data and must be considered when creating a disaster recovery plan. Here are some questions to consider when selecting a backup and recovery strategy:

What will happen to your business in the event of a serious disaster such as:
– Flood, fire, earthquake, or hurricane
– Malfunction of storage hardware or software
If your database server fails, will your business be able to operate during the hours, days, or even weeks it might take to get a new hardware system?
Do you store backups at an off-site location?



Solutions:

Off-site backups
Data Guard which protects critical data by automating the creation, management, and monitoring aspects of a standby database environment.


Loss of Key Personnel

In terms of key personnel, consider the following questions:
How will a loss of personnel affect your business?
If your DBA leaves the company or is unable to work, will you be able to continue to run the database system?
Who will handle a recovery situation if the DBA is unavailable?

Introduction of Backup & Recovery


Why we need backup & recovery ?

  • Protected the database from failure
  • Minimize data loss
  • Increase mean time between failure
  • decrease mean time to recover


One of a database administrator’s (DBA) major responsibilities is to ensure that the database is available for use.
To protect the data from the various types of failures that can occur, the DBA must back up the database regularly. Without a current backup, it is impossible for the DBA to get the database up and running if there is a file loss, without losing data.

Categories of Failures:
=======================================

Different types of failures may occur in an Oracle database environment. These include:

  • Statement failure
  • User process failure
  • User error
  • Instance failure
  • Media failure
  • Network failure


Each type of failure requires a varying level of involvement by the DBA to recover effectively from the situation. In some cases, recovery depends on the type of backup strategy that has been implemented.



Statement Failure:
=======================================

Statement failure occurs where there is a logical failure in the handling of a statement in an Oracle program. Types of statement failures include:

  • A logical error occurs in the application.
  • The user attempts to enter invalid data into the table, perhaps violating integrity constraints.
  • The user attempts an operation with insufficient privileges, such as an insert on a table using only SELECT privileges.
  • The user attempts to create a table but exceeds the user’s allotted quota limit.
  • The user attempts an INSERT or UPDATE on a table, causing an extent to be allocated, but insufficient free space is available in the tablespace.




Resolutions for Statement Failures
=============================================

DBA intervention after statement failures will vary in degree, depending on the type of failure, and may include the following:

  • Fix the application so that logical flow is correct. Depending on your environment this may be an application developer task rather than a DBA task.
  • Modify the SQL statement and reissue it. This may also be an application developer task rather than a DBA task.
  • Provide the necessary database privileges for the user to complete the statement successfully.
  • Issue the ALTER USER command to change the quota limit.
  • Add file space to the tablespace. Technically, the DBA should make sure this does not happen; however, in some cases it may be necessary to add file space. A DBA can also use the RESIZE and AUTOEXTEND options for data files.


When a statement failure is encountered, it is likely that the Oracle server or the operating system will return an error code and a message. The failed SQL statement is automatically rolled back, then control is returned to the user program. The application developer or DBA can use the Oracle error codes to diagnose and help resolve the failure.


oracle provide later resuming, the execution of large database operations in the event of space allocation failures. This enables an administrator to take corrective action, instead of the Oracle database server returning an error to the user. After the error condition is corrected, the suspended operation automatically resumes. This feature is called resumable space allocation and the statements that are affected are called resumable statements



User Process Failures
===============================

A user’s process may fail for a number of reasons; however, the more common causes include:

The user performed an abnormal disconnect in the session. For example, a user issues a [Ctrl] + [Break] in SQL*Plus while connected to a database in a client-server configuration.
• The user’s session was abnormally terminated. One possible scenario is the user rebooted the client while connected to a database in a client-server configuration.
• The user’s program raised an address exception which terminated the session. This is common if the application does not properly handle exceptions when they are raised.

Resolution of User Process Failures
================================

The DBA will rarely need to take action to resolve user process errors. The user process cannot continue to work, although the Oracle server and other user processes will continue to function.

PMON Background Process


  • The PMON background process is usually sufficient for cleaning up after an abnormally terminated user process.
  • When the PMON process detects an abnormally terminated server process, it rolls back the transaction of the abnormally terminated process, and releases any resources and locks it has acquired.



User Errors:
=============

DBA intervention is usually required to recover from user errors. Common Types of User Errors
  • The user accidentally drops or truncates a table.
  • The user deletes all rows in a table.
  • The user commits data, but discovers an error in the committed data.



A key issue in any database and application environment is to make sure that users are properly trained and are aware of database availability and integrity implications
A DBA should understand the types of applications and business operations that may result in loss of data from user errors and how to implement recovery measures for those situations. Some recovery situations may be quite extensive, such as restoring the database to a point-in time just prior to the error, exporting the lost data, and then importing that data back into the database from which it was lost.
  • Recover from a valid backup.
  • Import the table from an export file.
  • Use LogMiner to determine the time of error.
  • Recover with a point-in-time recovery.
  • Use LogMiner to perform object-level recovery.
  • Use FlashBack to view and repair historical data



Instance Failure:
================

An instance failure may occur for numerous reasons:
  • A power outage occurs that causes the server to become unavailable.
  • The server becomes unavailable due to hardware problems such as a CPU failure,memory corruption, or an operating system crash.
  • One of the Oracle server background processes (DBWn, LGWR, PMON, SMON,CKPT) experiences a failure.




To recover from instance failure, the DBA:


  • Starts the instance by using the “startup” command. The Oracle server will automatically recover, performing both the roll forward and rollback phases.
  • Investigates the cause of failure by reading the instance alert.log file and any other trace files that were generated during the instance failure.


Instance Recovery
=====================

Instance recovery restores a database to its transaction-consistent state just prior to instance failure. The Oracle server automatically performs instance recovery when the database is opened if it is necessary.
No recovery action needs to be performed by you. All required redo information is read by SMON. To recover from this type of failure, start the database:

After the database has opened, notify users that any data that they did not commit must be re-entered


There may be a time delay between starting the database and the “Database opened” notification—this is the roll forward phase that takes place while the database is mounted.

– SMON performs the roll forward process by applying changes recorded in the online redo log files from the last checkpoint.
– Rolling forward recovers data that has not been recorded in the database files, but has been recorded in the online redo log, including the contents of rollback segments.

Rollback can occur while the database is open, because either SMON or a server process can perform the rollback operation. This allows the database to be available for users more quickly.



Media Failures
=================

Media failure involves a physical problem when reading from or writing to a file that is necessary for the database to operate. Media failure is the most serious type of failure because it usually requires DBA intervention.
Common Types of Media Related Problems
  • The disk drive that held one of the database files experienced a head crash.
  • There is a physical problem reading from or writing to the files needed for normal database operation.
  • A file was accidentally erased.


Resolutions for Media Failures
================================================

A tested recovery strategy is the key component to resolving media failure problems. The ability of the DBA to minimize down time and data loss as a result of media failure depends on the type of backups that are available. A recovery strategy, therefore, depends on the following:

  • The backup method you choose and which files are affected.
  • The Archivelog mode of operation of the database. If archiving is used, you can apply archived redo log files to recover committed data since the last backup