ORACLE DATABASE PROBLEM AND SOLUTIONS

Saturday, January 13, 2018

ARCN (Archiver Process)

The archiver process (ARCn) copies redo log files to a designated storage device after a log switch has occurred. ARCn processes are present only when the database is in ARCHIVELOG mode, and automatic archiving is enabled ARCn Responsible for copying online redo log to archival storage before being reused. Runs only when database is in Archivelog Mode.When a Redo Log File fills up, Oracle switches to the next Redo Log File.

If all Redo Log Files fill up, then Oracle switches back to the first one and uses them in a round-robin fashion by overwriting ones that have already been used.
Overwritten Redo Log Files have information that, once overwritten, is lost forever.

ARCHIVELOG Mode:

If ARCn is in what is termed ARCHIVELOG mode, then as the Redo Log Files fill up, they are individually written to Archived Redo Log Files.
LGWR does not overwrite a Redo Log File until archiving has completed.
Committed data is not lost forever and can be recovered in the event of a disk failure.
Only the contents of the SGA will be lost if an Instance fails.

NOARCHIVELOG Mode:

The Redo Log Files are overwritten and not archived.
Recovery can only be made to the last full backup of the database files.
All committed transactions after the last full backup are lost, and you can see that this could cost the firm a lot of $$$.

When running in ARCHIVELOG mode, the DBA is responsible to ensure that the Archived Redo Log Files do not consume all available disk space! Usually after two complete backups are made, any Archived Redo Log Files for prior backups are deleted.

RECO (Recoverer Process)

The Recoverer Process (RECO) is used to resolve failures of distributed transactions in a distributed database.
Consider a database that is distributed on two servers – one in India and one in Chicago.
Further, the database may be distributed on servers of two different operating systems, e.g. LINUX and Windows.
The RECO process of a node automatically connects to other databases involved in an in-doubt distributed transaction.
When RECO reestablishes a connection between the databases, it automatically resolves all in-doubt transactions, removing from each database's pending transaction table any rows that correspond to the resolved transactions.
Recoverer is responsible for recovering failed distributed transactions in a distributed database.
If the RECO process fails to connect with a remote server, RECO automatically tries to connect again after a timed interval. However, RECO waits an increasing amount of time (growing exponentially) before it attempts another connection. The RECO process is present only if the instance permits distributed transactions. The number of concurrent distributed transactions is not limited.

CKPT (checkpoint)

A checkpoint is a data structure that defines a system change number (SCN) in the redo thread of a database. Checkpoints are recorded in the control file and in each data file header. They are a crucial element of recovery.

When a checkpoint occurs, Oracle Database must update the headers of all data files to record the details of the checkpoint. This is done by the CKPT process. The CKPT process does not write blocks to disk; DBWn always performs that work. The SCNs recorded in the file headers guarantee that all changes made to database blocks prior to that SCN have been written to disk.

A checkpoint identifies a point in time with regard to the Redo Log Files where instance recovery is to begin should it be necessary.
A checkpoint is taken at a minimum, once every three seconds.
An event called a checkpoint occurs when the Oracle background process DBWn writes all the modified database buffers in the SGA, including both committed and uncommitted data, to the data files

How works CKPT:

Think of a checkpoint record as a starting point for recovery. DBWn will have completed writing all buffers from the Database Buffer Cache to disk prior to the checkpoint, thus those records will not require recovery.
This does the following:

Ensures modified data blocks in memory are regularly written to disk – CKPT can call the DBWn process in order to ensure this and does so when writing a checkpoint record.
Reduces Instance Recovery time by minimizing the amount of work needed for recovery since only Redo Log File entries processed since the last checkpoint require recovery.
Causes all committed data to be written to datafiles during database shutdown
When checkpointing occurs ,its generate SCN (System Change Number) number and its recorded every datafiles header and control files .

If a Redo Log File fills up and a switch is made to a new Redo Log File , the CKPT process also writes checkpoint information into the headers of the datafiles.
Checkpoint information written to control files includes the system change number (the SCN is a number stored in the control file and in the headers of the database files that are used to ensure that all files in the system are synchronized), location of which Redo Log File is to be used for recovery, and other information.
CKPT does not write data blocks or redo blocks to disk – it calls DBWn and LGWR as necessary.

LGWR (Log Writer)

LGWR writes information from redo log buffers to redo log files .

Oracle database keeps record of changes made to data. Every time user performs a DML, DDL or DCL operation, its redo entries are also created. These redo entries contain commands to rebuild or redo the changes. These entries are stored in Redo Log buffer.
Log writer process (LGWR) writes these redo entries to redo log files. Redo log buffer works in circular fashion. It means that it overwrites old entries. But before overwriting, old entries must be copies to redo log files. Usually Log writer process (LGWR) is fast enough to mange these issues. Log writer process (LGWR) writes redo entries after certain amount of time to ensure that free space is available for new redo entries.
The Log Writer (LGWR) writes contents from the Redo Log Buffer to the Redo Log File that is in use. These are sequential writes since the Redo Log Files record database modifications based on the actual time that the modification takes place.
LGWR actually writes before the DBWn writes and only confirms that a COMMIT operation has succeeded when the Redo Log Buffer contents are successfully written to disk.

LGWR can also call the DBWn to write contents of the Database Buffer Cache to disk.

How LGWR works:

The redo log buffer is a circular buffer. When LGWR writes redo entries from the redo log
buffer to a redo log file, server processes can then copy new entries over the entries in the redo
log buffer that have been written to disk. LGWR normally writes fast enough to ensure that
space is always available in the buffer for new entries, even when access to the redo log is
heavy. LGWR writes one contiguous portion of the buffer to disk.
LGWR writes:
• When a user process commits a transaction
• When the redo log buffer is one-third full
• Before a DBWn process writes modified buffers to disk (if necessary)
• Every three seconds

Before DBWn can write a modified buffer, all redo records that are associated with the changes to the buffer must be written to disk (the write-ahead protocol). If DBWn finds that some redo records have not been written, it signals LGWR to write the redo records to disk and waits for LGWR to complete writing the redo log buffer before it can write out the data buffers. LGWR writes to the current log group. If one of the files in the group is damaged or unavailable, LGWR continues writing to other files in the group and logs an error in the LGWR trace file and in the system alert log. If all files in a group are damaged, or if the group is unavailable because it has not been archived, LGWR cannot continue to function.
When a user issues a COMMIT statement, LGWR puts a commit record in the redo log buffer and writes it to disk immediately, along with the transaction’s redo entries. The corresponding changes to data blocks are deferred until it is more efficient to write them. This is called a fast commit mechanism. The atomic write of the redo entry containing the transaction’s commit record is the single event that determines whether the transaction has committed. Oracle Database returns a success code to the committing transaction, although the data buffers have not yet been written to disk. If more buffer space is needed, LGWR sometimes writes redo log entries before a transaction is committed. These entries become permanent only if the transaction is later committed. When a user commits a transaction, the transaction is assigned a system change number (SCN), which Oracle Database records along with the transaction’s redo entries in the redo log. SCNs are recorded in the redo log so that recovery operations can be synchronized in Real Application Clusters and distributed databases.

In times of high activity, LGWR can write to the redo log file by using group commits. For example, suppose that a user commits a transaction. LGWR must write the transaction’s redo entries to disk. As this happens, other users issue COMMIT statements. However, LGWR cannot write to the redo log file to commit these transactions until it has completed its previous write operation. After the first transaction’s entries are written to the redo log file, the entire list of redo entries of waiting transactions (not yet committed) can be written to disk in one operation, requiring less I/O than do transaction entries handled individually. Therefore, Oracle Database minimizes disk I/O and maximizes performance of LGWR. If requests to commit continue at a high rate, every write (by LGWR) from the redo log buffer can contain multiple commit records

Saturday, January 13, 2018

ARCN (Archiver Process)

RECO (Recoverer Process)

CKPT (checkpoint)

LGWR (Log Writer)

step‑by‑step guide to check CPU sockets, cores, threads, and multithreading (Hyper‑Threading/SMT)