Pratheeshpc's blog

Tuesday, August 14, 2012

What is a process and thread?

Hi all,

After a long gap I am writing my professional blog, I have read and interesting article on threads and process which I thought will be worth sharing with you. I really like the example given in the last which explains what is a process n thread in layman term.

Processes and threads

When we are managing a system and particular user activity, our focus is commonly a process. It is at the process level that we normally monitor user activity. However, the operating system doesn't schedule processes to run anymore. The operating system schedules threads to run. The difference for some people is too subtle to care about; for others the difference is earth-shatteringly different. From a programmer's perspective, the idea of a threaded application is mind-bogglingly different from our traditional view of how applications run. Traditionally, an application will have a startup process that creates a series of individual processes to manage the various tasks that the application will undertake. Individual processes have some knowledge of the overall resources of the application only if the startup process opened all necessary files and allocated all the necessary shared memory segments and any other shared resources. Individual processes communicate between each other using some form of inter-process communication (IPC) mechanism, e.g., shared-memory, semaphores, or message-queues. Each process has its own address space, its own scheduling priority, its own little universe. The problem with this idea is that creating a process in its own little universe is an expensive series of routines for the operating system to undertake. There's a whole new address space to create and manage; there's a whole new set of memory-related structures to create and manage. We then need to locate and execute the code that constitutes this unique, individual program. All the information relating to the shared objects set up by the startup process needs to be copied to each individual process when it was created. As you can see, there's lots of work to do in order to create new processes. Then we consider why an application creates multiple processes in the first place.

An application is made up of multiple processes because an application has several individual tasks to accomplish in order to get the job done; there's reading and writing to the database, synchronizing checkpoint files, updating the GUI, and a myriad of stuff to do in order to get the job done. At this point, we ask ourselves a question. Do all these component tasks interface with similar objects, e.g., open files, chunks of data stored in memory, and so on? The answer commonly is YES! Wouldn't it be helpful if we could create some form of pseudo-process whereby the operating system doesn't have as much work to do in order to carve out an entire new process? And whereby the operating system could create a distinct entity that performed an individual task but was in some way linked to all the other related tasks? An entity that shared the same address space as all other subtasks, allowing it access to all the same data structures as the main get-the-job-done task. An entity that could be scheduled independently of other tasks (we can refresh the GUI while a write to the database is happening), as long as it didn't need any other resources. An entity that allows an application to have parallel, independent tasks running concurrently. If we could create such an entity, surely the overall throughput of the application would be improved? The answer is that with careful programming such an entity does exist, and that entity is a thread. Multithreaded applications are more natural than distinct, separate processes that are individual, standalone, and need to use expensive means of communication in order to synchronize their activities. The application can be considered the overall task of getting the job done, with individual threads being considered as individual subtasks. Multithreaded applications gain concurrency among independent threads by subdividing the overall task into smaller manageable jobs that can be performed independently of each other. A single-threaded application must do one task and then wait for some external event to occur before proceeding with the same or the next task in a sequence. Multithreaded applications offer parallelism if we can segregate individual tasks to work on separate parts of the problem, and all while sharing the same underlying address space created by the initial process. Sharing an address space gives access to all the data structures created by the initial process without having to copy all the structural information as we have to between individual processes. If we are utilizing a 64-bit address space, it is highly unlikely that an individual thread will run out of space to create its own independent data structures, should it need them. It sounds like it was remarkable that we survived without threads. I wouldn't go so far as to say it's remarkable that we survived, but it can be remarkable the improvements in overall throughput when a single-threaded application is transformed into a multi-threaded application. This in itself is a non-trivial task. Large portions of the application will need to be rewritten and possibly redesigned in order to transform the program logic from a single thread of execution into distinct and separate branches of execution. Do we have distinct and separate tasks within the application that can be running concurrently with other independent tasks? Do these tasks ever update the same data items? A consequence of multiple threads sharing the same address space is that it makes synchronizing activities between individual threads a crucial activity. There is a possibility that individual threads are working on the same block of process private data making changes independent of each other. This is not possible where individual processes have their own independent private data segments. Multithreaded applications need to exhibit a property known as thread safe. This idea is where functions within an application can be run concurrently and any updates to shared data objects are synchronized. One common technique that threads use to synchronize their activities is a form of simple locking. When one thread is going to update a data item, it needs exclusive access to that data item. The locking strategy is known as locking a mutex. Mutex stands for MUTual EXclusion. A mutex is a simple binary lock. Being binary, the lock is either open or closed. If it is open, this means the data item can be locked and then updated by the thread. If another thread comes along to update the data item, it will find the mutex closed (locked). The thread will need to wait until the mutex is unlocked (open), whereby it knows it now has exclusive access to the data item. As you can see, even this simple explanation is getting quite involved. Rewriting a single-threaded application to be multi-threaded needs lots of experience and detailed knowledge of the pitfalls of multithreaded programming. If you are interested in taking this further, I strongly suggest that you get your hands on the excellent book Threadtime: The Multithreaded Programming Guide by Scott J. Norton and Mark D. Dipasquale.

One useful thing about having a multithreaded kernel is that you don't need to use this feature if you don't want to. You can simply take your existing single-threaded applications and run them directly on a multi-threaded kernel. Each process will simply consist of a single thread. It might not be making the best use of the parallel features of the underlying architecture, but at least you don't need to hire a team of mutex-wielding programmers.

The application may consist of a single process, which is the visible face of the application. As administrators, we can still manage the visible application. Internally, the single process will create a new thread for each individual task that it needs to perform. Because of the thread model used each user-level thread corresponds to a kernel thread; because the kernel can see these individual threads, the kernel can schedule these individual tasks independently of each other (a thread visible to the kernel is known as a bound thread). This offers internal concurrency in the application with individual tasks doing their own thing as quickly as they can, being scheduled by the kernel as often as they want to run. Tasks that are interrelated need to synchronize themselves using some form of primitive inter-task locking strategy such as mutexes mentioned above. This is the job of application programmers, not administrators. The application programmer needs to understand the importance of the use of signals; we send signals to processes. Does that signal get sent to all threads? The answer is "it depends." A common solution used by application programmers is to create a signal-handling thread. This thread receives the signal while all other threads mask signals. The signal-handling thread can then coordinate sending signals to individual threads (using system calls such as pthread_kill). This is all internal to the process and of little direct concern to us. As far as administering this application, we manage the process; we can send a process signals, we can increase its priority, we can STOP it we can kill it. We are managing the whole set of tasks through the process, while internally each individual thread of execution is being scheduled and managed by the kernel.

A process is a "container" for a whole set of instructions that carry out the overall task of the program. A thread is an independently scheduled subtask within the program. It is an independent flow of control within the process with its own register context, program counter, and thread-local data but sharing the host process's address space, making access to related data structures simpler.

An analogy I often use is a beehive. From the outside, it is a single entity whose purpose is to produce honey. The beehive is the application, and, hence, the beehive can be thought of as the process; it has a job to do. Each individual bee has a unique and distinct role that needs to be performed. Individual bees are individual threads within the process/beehive. Some bees coordinate their activities with miraculous precision but completely independently to the external world. The end product is produced at amazing efficiency, more effective than if we subdivided the task of producing honey between independent hives. Imagine the situation: Every now and then, the individual hives would meet up to exchange information and make sure the project was still on-track, and then they would go back to doing their own little part of the job of making honey. Honey-by-committee wouldn't work. The beehive is the process, and the bees are the threads: amazing internal efficiencies when programmed correctly, but retaining important external simplicity. We as information-gatherers (honey-monsters) will interface with the application/process (beehive) in order to extract information (honey) from the system. There's no point in going to individual bees and trying to extract honey from them; it's the end product that we are interested in, not how we got there.

Thursday, June 3, 2010

HP- Unix - Booting Process

System booting

1) Boot –Rom startup

2) HP-UX startup

PDC checks the memory and cpu,peripherals ,initialize the console and loads and executes ISL

ISL loads the SSL called hpux . hpux loads the HP-UX kernel(default is /stand/vmunix)

Kernel starts the swapper process and the init.

Init process reads /etc/inittab and brings up the daemons.

Setting autoboot

From the PDC goto the configuration menu using the configure command here. We can set the value on or off for autoboot, autosearch etc.

Also autoboot command can be used from ISL prompt.

Again we can use setboot command from OS

#setboot –p -> primary boot device

#serboot –s -> alternate boot device

#setboot –b -> autoboot(on/off)

#setboot –s -> autosearch(on/off)

A file named AUTO is located in the boot area(LIF area) of the disk. This file contains the default arguments to be supplied to hpux ie if we give hpux command from ISL prompt without any argument it will take the arguments such as path of the kernel file from the AUTO file

Booting in single user mode

Isl>hpux –is

Booting another kernel

ISL>hpux /stand/vmunix.old

Kernel booting stage

• The kernel first loads the swapper process.

• Then pre_init_rc is loaded (/sbin/pre_init_rc)

• Fsck

• Then the init(/sbin/init) process is loaded

The swapper is started with the PID 0 and init is started with PID 1

Init process

 It reads the initdefault from /etc/inittab

 Then loads ioinitrc,bcheckrc,rc and getty process.

/sbin/ioinitrc -> it checks/initialize i/o devices maintain the consistency between /etc/ioconfig and kernel data structure.

/sbin/bcheckrc -> runs fsck,activates LVM.

Daemon starting

/sbin/rcn.d -> link files which actually calls the startup scripts for each daemons(s20cron..)

/sbin/init.d -> The startup script for each daemon(/sbin/init.d/cron)

/etc/rc.config.d -> Configuration file for each daemon (/etc/rc.config.d/cron)

Several steps are required to bring an HP-UX system to a fully functional state.

The PDC Chooses a Boot Disk

The Processor Dependent Code (PDC) is the first player in the boot process after the boot process is initiated. The PDC does a self-test on the SPU, then initializes the system console, and checks Stable Storage to determine which disk to boot from. Finally, the PDC loads the Initial System Loader (ISL) utility from the boot area on the chosen boot disk.

ISL Chooses a Kernel to Boot

The ISL consults the AUTO file to determine the pathname of the default kernel, and any options that should be passed to the hpux kernel loader. Finally, the ISL loads and runs the hpux utility from the boot disk.

hpux Loads the Kernel

hpux uses the options and kernel pathname provided by the ISL to find and load the kernel. If the ISL called hpux without any options or arguments, hpux loads the default kernel /stand/vmunix and boots the system to multi-user mode.

vmunix Brings the System to a Fully Functional State

The kernel then scans the hardware, mounts the root file system, and starts the init daemon. The init daemon starts the daemons and services necessary to bring the system up to multi-user mode

Kernel reconfiguration

There are 3 major steps

1) Create or modify /stand/system file

2) Regenerate the kernel.

3) Reboot the system with new kernel

A kernel reconfiguration may require in the following situations

1) Adding/removing drivers.

2) Adding or removing subsystems

3) Changing swap/dump device

4) Modifying system parameters

# sysdef -> Analyse the running system and displays the tunable parameters.

# ioscan -> gives the list of h/w attached to the system. (#ioscan –f ->full list)

# lanscan -> lan configurations

#ioscan –C -> to display a class of device , eg: ioscan –fC disk

# system_prep ->builds a configuration file from a running system.

For creating new configuration file

#cd /stand/build

# /usr/lbin/sysadm/system_prep –s system

Edit this system file to make the appropriate changes.

To build a new kernel

#/usr/sbin/mk_kernel –s ./system

Files for any additional modules are expected to be in /stand/system directory.

The newly build kernel will be in /stand/build directory with the name vmunix_test.

Installing new kernel

#mv /stand/system /stand/system.old

#mv /stand/vmunix /stand/vmunix.old

#mv /stand/build/system /stand/system

#mv /stand/build/vmunix_test /stand/vmunix

Reboot the system

Wednesday, April 14, 2010

3. Logical Volume

Logical Volumes

Some important things about LV's which many won't know
• Members of specific volume groups

• Can control location of logical volumes

• Can view location of logical volumes

• Non-typical logical volumes

• Logical Volume Control Block

• LVID

Members of specific volume groups

Logical volumes that reside on a volume group and only exist in that volume group. Logical volumes are not allowed to be mirrored to other volume groups and cannot be spanned from one volume group to another volume group. This is in part to the information held on a logical volume by the VGDA. The VGDA can’t be expected to not only track a logical volume in its own volume group and then additionally track the logical volume (and more important its status) on another volume group.

Can control location of logical volumes

The creation of the logical volume allows many levels of control, from no control to disk specific control to partition location control. The main binary which determines how the partitions are allocated to the user’s request is /usr/sbin/allocp. The binary takes into account user specific requirements at the command line and the default values, such as the intra-policy and the inter-policy. With no specific arguments, such as the disk(s) where the logical volumes “should” be created on, allocp will just use the available defaults as the blueprint to determine how the logical volume is allocated. If the user gives a list of disks where they would like the logical volume to be allocated, allocp will create the logical volume with those disks as its parameters. If it cannot fulfill these disk requirements, then the “mklv” command will fail. Finally, with map files,
the user can tell the logical volume which exact physical partition on the disk they wish the logical volume to be created. Implicit in this fact is the ability to control the ORDER which these physical partitions are allocated. People with their own theories of optimal data access tend to try to control the logical volume formation at the physical partition level.

Can view location of logical volumes

The LVM commands provide many ways of letting the user view the same logical volume. They can usually just be simply deduced, after some experience in using LVM. For instance, if I want to look at how a logical volume is laid out onto one disk, I simply type: “lspv -M hdiskA grep lvX”. Or, I can type “lslv -p lvX”. Or, I can use “lslv -m lvX”. Or, I can use “lsvg -l vgname grep lvX”. The point is that there is more than one way the logical volume’s location can be viewed by the many LVM high level commands.

Non-typical logical volumes

There are a few logical volumes that exist on the system that are used and accessed in the typical manner. Besides the log logical volume (used by jfs), there are the dump device logical volume, the boot logical volume, and the paging device logical volume.After AIX 4.1, note that the paging device and the dump device are the same logical volume. This was done as part of an effort to “lighten” the system and makes sense. Not only do you free up disk space formally used by the dump logical volume (which you hoped never had to be used), but you’ve also now guaranteed that the dump device is large enough to capture the dump image if you do have a system failure. However, there is a side-effect to th is change.When the user tries moving their original paging device (hd6), they cannot do so even though they have correctly turned
off their paging device and rebooted the machine. The reason is that the dump device has a lock on the logical volume that prevents the removal of the dormant paging device. This is easily fixed by resetting the dump device with the “sysdumpdev” command.

Logical Volume Control Block

The logical volume control block (lvcb) consists of the first 512 bytes of a logical volume. This area holds important information such as the creation date of the logical volume, information about mirrored copies, and possible mount points in a journaled filesystem.

LVID

The LVID is the soft serial number used to represent the logical volume to the LVM libraries and low level commands. The LVID is created from the VGID of the volume group, a decimal point, and a number which represents the order which the logical volume was created on the volume group.

Commands related to Logical Volume
******************************
A logical volume may contain one of the following and only one at a time
• Journaled or Enhanced journaled file system (for Ex: /dev/hd4)
• Paging space (/dev/hd6)
• Journal Log (/dev/hd8)
• Boot Logical Volume (/dev/hd5)
• Nothing (raw device)

#smitty lv

To show characteristics of a LV
#lslv –l lv00
Map of which PV’s contain which PP’s for the LP’s of the LV
#lslv –m lv00
To create a LV
#smitty mklv
To remove a LV
#smitty rmlv
Note: Do not use rmlv to remove JFS or Paging space volumes.
To list all the LV’s defined on the system
#lsvg
lsvg –il
To increasing the size of a LV
#smitty extendlv
Logical volume size
Total LV size=PP size * LP’s assigned to LV * Number of copies of the LV.

2. Volume Group

Volume Groups

• Portability of volume groups

• Mix and Match disks

• Volume Group Descriptor Area

• Volume Group Status Area

• Quorum

• VGID

• A Graphical View of VGDA expansion

Portability of volume groups

The one great attribute of LVM is the ability of the user to take a disk or sets of disks that make up a volume group and take it to another system and introduce the information created on another machine onto the second machine.This ability is provided through the Volume Group Descriptor Area (VGDA) and the logical volume control block (lvcb). The design of LVM also allows for accidental duplication of volume group and logical volume names. If on the new machine,the volume group or logical volume names being imported already exist, then LVM will generate a distinct volume group or logical volume name.

Mix and Match disks

LVM allows the user to attach disks to volume group, regardless of what type of physical device it true is or what type of device driver is used to operate the device. Thus, RAID systems, serial dasd drives, and the plain SCSI drives can all make up one volume group that may reside across several adapters. The physical location of each drive and the true nature of the drive doesn’t matter to LVM as long as the disk device drivers follow a certain format required by LVM in order to create logical volumes on those drives

Volume Group Descriptor Area

The VGDA is an area at the front of each disk which contains information about the volume group, the logical volumes that reside on the volume group and disks that make up the volume group. For each disk in a volume group, there exists a VGDA concerning that volume group. This VGDA area is also used in quorum voting. The VGDA contains information about what other disks make up the volume group. This information is what allows the user to just specify one of the disks in the volume group when they are using the “importvg” command to import a volume group into an AIX system. The importvg will go to that disk, read the VGDA and find out what other disks (by PVID) make up the volume group and automatically import those disks into the system (and its) ODM as well. The information about neighboring disks can sometimes be useful in data recovery. For the logical volumes that exist on that disk, the VGDA gives information about
that logical volume so anytime some change is done to the status of the logical volume (creation, extension, or deletion), then the VGDA on that disk and the others in the volume group must be updated.

Volume Group Status Area

The Volume Group Status Area (VGSA) is comprised of 127 bytes, where each bit in the bytes represents up to 1016 Physical Partitions that reside on each disk. The bits of the VGSA are used as a quick bit-mask to determine which Physical Partitions, if any, have become stale. This is only important in the case of mirroring where there exists more than one copy of the Physical Partition. Stale partitions are flagged by the VGSA. Unlike the VGDA, the VGSA’s are specific only to the drives which they exist. They do not contain information about the status of partitions on other drives in the same volume group. The VGSA is also the bit masked used to determine which physical partitions must undergo data resyncing when mirror copy resolution is performed.

Quorum

Quorum is a sort of “sanity” check that LVM uses to resolve possible data confliction and prevent data corruption. Quorum is a method by which 51% or more quorum votes must be available to a volume group before LVM actions can continue.
Quorum is issued to a disk in a volume group according to how the disk was created within the volume group. When a volume group consists of one disk, there are two VGDA’s on that disk. Thus, this single disk volume group has a quorum vote of 2. When another disk is added to the volume group with an “extendvg”, then this new disk gets one VGDA, but the original, first disk still retains the two VGDA’s. When the volume group has been extended to three disks, the third disk gets the spare VGDA sitting on the first disk and then each disk has a quorum vote of 1. Every disk after the third disk is automatically given one VGDA, and thus one vote.

VGID

Just as the PVID is a soft serial number for a disk, the VGID is the soft serial number for the volume group. It is this serial number, not the volume group’s ascii name, which all low level LVM commands reference. Additionally, it is the basis for the LVIDs created on that VGID.

A Graphical View of VGDA Expansion

This shows how the VGDA grows from one disk to another and how information is carried in the VGDA which crossreferences information among the disks in a volume group.

In these examples, you see how the VGDA is used to monitor disks and logical volumes that belong to a volume group. InCase A, the volume group, samplevg, contains only hdisk0. The example also shows how the VGDA holds the map forthe logical volume lv00. There is no VGDA represented on hdisk1 since it is not part of the samplevg. However, it may bethe case that hdisk1 contains some long-forgotten VGDA from a previous volume group. But since the VGDA forsamplevg has no record of hdisk1 belonging to the volume group, hdisk1 is ignored in samplevg’s eyes.

Case B is an example of the aftermath of an extendvg of samplevg into hdisk1. A copy of the VGDA from hdisk0 is placed onto hdisk1. But, notice that the VGDA has also been updated on hdisk0 to reflect the inclusion of a new disk into the volume group. Although lv00 does not exist on hdisk1, the VGDA on hdisk1 will have a record of the lv00 mapping sitting on another disk since it’s part of the same volume group.

Case C shows the result of the creation of two logical volumes, lv01 and lv02, onto hdisk1. This is to show how the VGDA sitting on hdisk0 is updated with the activity that is being conducted on another disk in the volume group.

One critical point from this information is that since the VGDA on each disk in a volume group “knows its neighbors" business”, then sometimes this information about neighboring disks can be used to recover logical volumes residing on other the other drives. This usually happens works if the VGDA on a disk is missing or corrupted. However, this is invalid in the cases where the disk is truly dead or if the volume group is only made up of one disk.

Volume group related commands

********************************

#smitty vg

Listing volume group characteristics
#lsvg rootvg
List the volume groups in the system
#lsvg
List of all volume groups that are currently active (varied on)
#lsvg –o
Information about all of the PV’s with in the volume group
#lsvg –p rootvg
Information about the entire LV’s with in the volume group
#lsvg –l rootvg
To list the physical volume status within a volume group
#lsvg –p rootvg
To create a volume group
#smitty mkvg or mkvg –s 2 –t 2 –y newvg hdisk1
To remove a physical volume from a VG, if it is the last PV, the VG will be removed.
#reducevg –d rootvg hdisk1
To add a new PV to an existing VG
#extendvg –f rootvg hdisk1
To change the startup characteristics of a VG
#smitty chvg
To activate VG (make it available for use)
#varyon –f datavg
To deactivate a VG (make it unavailable for use)
#varyoffvg datavg

Unlocking a Volume Group
A volume group can become locked when an LVM command terminates abnormally, due to a system crash while an LVM operation was being performed on the system.
To unlock the datavg volume group
#chvg –u datavg

Logical Track Group Size (LTG)
 Flexible LTG size for better disk I/O performance.

The logical track group size corresponds to the maximum allowed transfer size for disk I/O. To take advantage of these larger transfer sizes and achieves better disk I/O performance.
The supported LTG size was 128 KB, the accepted values are 128, 256, 512, 1024 KB.

To find the LTG size
#/usr/sbin/lquerypv –M hdisk0
256
To set the LTG size
#mkvg or chvg
To change the LTG size, the volume group must be varied on, the logical volumes must be closed, and file systems must be unmounted.

Hot Spare
A hot spare is a disk or group of disks used to replace failing disk. LVM marks a physical volume missing due to write failures. It then starts the migration of data to the hot spare disk.
Minimum hot spare requirements are
 Spares are allocated and used by volume group
 Logical volumes must be mirrored
 All logical partitions on hot spare disks must be unallocated
 Hot spare disks must have at least equal capacity to the smallest disk already in the volume group.
Hot spare policy and synchronization policy are applied using chpv and chvg commands.
Examples:
It marks hdisk1 as a hot spare disk
#chpv –hy hdisk1
The following command sets an automatic migration which uses the smallest hot spare that is large enough to replace the failing disk, and automatically tries to synchronize stale partitions
#chvg –hy –sy testvg
How to set up hot sparing
Step1: Decide which volume groups with mirrored logical volumes require high availability.
Step2: Decide how many hot spare disks are required and how large the hot spare disks must be, based on the existing disks in the volume group.
Step3: Add the hot spares to the volume groups which they are to protect by using extendvg command.
Step4: Decide which hot spare policy will be most effective for your volume groups.
Step5: Designate the selected disks as hot spares by using chpv command.
Step6: Decide which synchronization policy meets the business needs and set the policy by using chvg command.

Importing and exporting a volume group

If you have a volume group on one or more removable disks that you want to access on another system, you must first export the volume group from the current system using the exportvg command. This removes any information about the volume group from the system. To export a volume group it must be inactive.
To access an exported volume group on a system, it must be imported to the system using the importvg command. Do not attempt to import a rootvg.

Note: To remove the system definition of a volume group from the ODM database, the volume group needs to be exported using the exportvg command. This command will not remove any user data in the volume group, but will only remove its definition from the ODM database. Similarly, when a volume group is moved, the target system needs to add he definition of the new volume group. This can be achieved by importing the volume group by using the importvg command, which will add an entry to the ODM Database.

To export a volume group
#exportvg datavg or smitty exportvg
To import a volume group
#importvg datavg or smitty importvg
Note: It is also possible that some logical volume names may also conflict with those already on the system. The importvg command will automatically reassign these with system default names.
Steps:
#lspv
#varyoffvg datavg
#exportvg datavg
#importvg datavg
#varyonvg datavg
#lspv
Note: If you imported a volume group that contains file systems or if you activated the volume group through smitty importvg, it is highly recommended that you run the fsck command before you mount the file systems.
Note: The smitty exportvg command deletes references to file systems in /etc/filesystems, but it leaves the mount points on the system.

Reorganizing a volume group
The reorgvg command is used to reorganize the physical partition allocation for a volume group according to the allocation characteristics of each logical volume. The volume group must be varied on and must have free partitions before you can use the reorgvg command.
Examples:
To reorganizes the logical volumes lv03, lv04 and lv07 on VG datavg
#reorgvg datavg lv03 lv04 lv07
To reorganize the partitions located on physical volumes hdisk04 and hdisk06 that belong to the logical volumes lv04 and lv06
#echo “hdisk04 hdisk06”
reorgvg –i datavg lv04 lv06

Synchronizing a Volume group
The syncvg command is used to synchronize logical volume copies that are not current (stale). The syncvg command synchronizes the physical partitions which are copies of the original physical partition that are not current. The syncvg command can be used with logical volumes, physical volumes or volume groups. Unless disabled the copies within a volume group are synchronized automatically when the volume group is activated by the varyonvg command.
Examples:
To synchronize the copies on hysical volumes hdisk04 and hdisk05
#syncvg –p hdisk04 hdisk05
To synchronize the copies on volume groups datavg and sapvg
#syncvg –v datavg sapvg

Tuesday, April 13, 2010

1.Physical volume

Here I would like to add some contents which will be useful for administrators to troubleshoot the issue.

n one system, many disks can make up a volume group. Disks cannot be shared between volume groups. The entire disk is dedicated to being part of one volume roup.

On one system, many disks can make up a volume group. Disks cannot be shared between volume groups. The entire disk is dedicated to being part of one volume group.

Logical volumes reside only on volume groups. A logical volume can span multiple disks within a volume group(but cannot span multiple volume groups): lv00. Or, a logical volume can just reside on one disk: biglv.Finally,a logical volume can have multiple copies of itself: mirrlv.

Physical volumes

Disk Independence

The basic design of the interaction between LVM and the disk device driver has always assured that LVM’s use of the disk would behave the same regardless of the type of disk being used in LVM. Thus, a disk drive such as the serial dasd drive (9333 drives) behave the same in LVM as the standard SCSI drive, although they use different disk and adapter device drivers.

PVID’s and how they configure

When a disk is configured to a system for the first time, it shouldn’t have a PVID if it is a brand new disk. When it is used in a volume group, the user sees a message to the effect of:

Making this disk a physical volume

Which is another way of saying that a PVID is being placed onto the disk. The PVID is an amalgamation of the machine’s serial number (from the systems EPROMs) and the date that the PVID is being generated. This combination insures the extremely low chance of two disks being created with the same PVID. Finally, when a system is booted, the disk configurator goes and looks at the PVID sitting on each disk platter and then compares that to an entry in ODM. If the entry is found, then the disk is given the hdiskX name that is associated with the ODM entry for the PVID. If there is no PVID, the configuration routines will automatically assign the next free hdisk name from the pool of “free” hdisk names.

Note that if a hdisk has been removed with the “rmdev -l hdiskX -d” command, then this hdisk name will be available for reuse by a later disk.

One question here....

Why it is unsupported method to copy using "dd" command to copy the entire contents of one disk to another disk?
When we copy using "dd" we forgets in this action is that they are literally copying over the PVID of
the first disk onto the platter of the second disk. The extremely rare consequence of this “dd” is that the user may have, in the distant future, two disks attached to the same p-series or RS/6000 with the same PVID. When this occurs, EXTREME confusion will occur.

Working on physical volumes
***********************
A physical partition (PP) is the basic or smallest unit of allocation of disk. The default size of the PP’s is 4 MB. The PP size can be changed in increments of the power of 2 up to 1024MB.The default maximum number of PP’s per Physical Volume is 1016. All PP’s within a volume group are the same size and can’t be changed dynamically. The number of physical partitions per physical volume can be changed dynamically in multiples of 1016 (1016, 2032…).

The top level menu for Physical Volume

#smitty pv

List all Physical Volumes in system

#lspv

List contents of a Physical Volume

#lspv hdisk0

To lists all the logical volumes on a Physical Volume

#lspv –l hdisk0

Listing Physical partition allocation by Physical Volume region

#lspv –p hdisk0

Listing Physical partition allocation table

#lspv –M hdisk0

Migrating the contents of a Physical Volume
*************************************
The PP’s belonging to one or more specified logical volumes can be moved from one physical volume to one or more other physical volumes within a volume group using the migratev command.

Note: The migratepv command can’t move data between different volume groups.
How to move the data from a failing disk before it is removed for repair or replacement.

Step1: Make sure that the source and destination PV’s are in the same VG.

#lsvg –p rootvg

Step2: If you are planning to a new disk such as when you have a failing disk

Make sure the disk is available

#lsdev –Cc disk

If OK , Make sure it does not belong to another VG

#lspv

Add the new disk to the VG

#extendvg rootvg hdisk1

Step3: Make sure that you have enough room on the target disk for the source that you want to move.

Determine the number of PP’s on the source disk

#lspv hdisk1
grep “USED PPs”

Determine the number of free PP’s on the Destination disk

#lspv hdisk2
grep “FREE PPs”

Step4: If you are migrating data from a disk in the rootvg

#lspv –l hdisk1
grep hd5

If you get the hd5 information

#migratepv –l hd5 hdisk1 hdisk2

Note: The migratepv command is not allowed if the VG is varied on in a concurrent mode.

Next, you will get a message warning you to perform the bosboot command on the destination disk.

#bosboot –ad /dev/hdisk2 === hdisk2---Destination Disk

#bootlist –m normal hdisk2 === hdisk1--- Source Disk

#mkboot –cd /dev/hdisk1

Step5: If you are migrating data from a disk in other than rootvg

#smitty migratepv

Step6: To remove the source disk from the volume group such as when it is failing

#reducevg rootvg hdisk1

#rmdev –dl hdisk1

Examples related to migration:

To move PP’s from hdisk1 to hdisk6 and hdisk7 (all PV’s are in one VG)

#migratepv hdisk1 hdisk6 hdisk7

To move PP’s in LV lv02 from hdisk1 to hdisk6

#migratepv –l lv02 hdisk1 hdisk6

Logical Volume Manager Basics

High Level Commands• Commands used by SMIT
• Commands used by users in command line or shell
scripts
• High levels of error checking

Intermediate Commands
Notes: The Intermediate commands are all C binaries:
getlvcb, getlvname, getlvodm, getvgname, lvgenmajor, lvgenminor, lvrelmajor, lvrelminor, putlvcb, putlvodm, lchangelv,
lcreatelv, ldeletelv, lextendlv, lquerylv, lreducelv, lresynclv, lchangepv, ldeletepv, linstallpv, lquerypv, lresyncpv, lcreatevg,
lqueryvg, lqueryvgs, lvaryonvg, lvaryoffvg, lresynclp, lmigratepp.
The commands are called by the High Level command shell scripts.

Library Calls - API
• Library calls to be used as Application Program Interface
It lets programmers access the LVM layer directly through the library layer provided by /usr/lib/liblvm.a.

LVM Device Driver• Access same as those for typical IBM device driver
• Checking provided, but assumes user has basic device
driver knowledge
• Rarely used. If used wrong, disastrous results!

Notes: The LVM device driver comes in two portions, the pinned portion and the non-pinned portion. The pinned portion is called /usr/lib/drivers/hd_pin and /usr/lib/drivers/hd_pin_bot. Before AIX 4.1, the driver was just called hd_pin and the entire driver was pinned into memory (not pageable). With AIX 4.1, the driver’s true non-pageable portion is in hd_pin_bot and the hd_pin is now pageable. The LVM device driver is either called by the jfs filesystem or the lvm library routines.
When a request is received by the LVM device driver, it calls the disk device driver.

Disk Device Driver• Access same as those for typical IBM device driver
• Sometimes other vendors use the disk device driver as pseudo driver for their products

Note:The most common disk device driver is the one for scsi device drivers,
/usr/lib/drivers/scdisk and /usr/lib/drivers/scdiskpin. The second most common disk device driver is probably the one for the 9333 serial dasd device. This is found in the binaries /usr/lib/drivers/sd and /usr/lib/drivers/sdpin. In both device drivers, the portion called “pin” is that which is pinned into memory and cannot be paged out of memory. When a request from the LVM device driver is received by the disk device driver, the request is packaged and then transferred down to the scsi device drivers.

Friday, April 9, 2010

Files to backup before rebooting an lpar/server

First and important thing to check the alt_disk is available

How To Boot From Alternate Drive.
**********************************
Cases there when Primary Boot Device on Unix server gets corrupted. This will enable the System Administrator to choose the appropriate alternate booting device and bring the server up. It will be used to bring the server up on the alternate Drive incase the primary devices is corrupted and for any other problem.

IBM:
Check for altinst_rootvg:

# lspv
hdisk0 000ffa6d6a891be2 rootvg
hdisk1 000ffa6d546b51bc altinst_rootvg
hdisk2 000ffa6da42b24f7 abcvg
hdisk3 000ffa6da42b3377 cdevg
hdisk4 000ffa6da42b365e efgvg
hdisk5 none None
Set the new boot disk.:

# bootlist -m normal

Note: hdisk# = Physical Volume from step 1 (hdisk1 in this case).
Reboot the server to bring the box up on the alternate drive.:

# shutdown -Fr
Bring server to ok prompt

Note: Normally in our enviornments we will add a alt disk script on crontab and will check if the last alt_disk is successful or not.

There is obviously a risk involved in rebooting a lpar without alt-disk successful, worst case we need to take that risk. :D

The important files we need to take backup before rebooting an lpar

df -k | tee /var/tmp/upg/df-k.out
lspath | tee /var/tmp/upg/lspath.out
powermt display | tee /var/tmp/upg/pmtdisplay.out
powermt display dev=all | tee /var/tmp/upg/pmtdisplaydev.out
powermt display paths | tee /var/tmp/upg/pmtdisplaypaths.out
/usr/lpp/EMC/Symmetrix/bin/inq.*| tee /var/tmp/upg/inq.out
lspv | tee /var/tmp/upg/lspv.out
lsvg | tee /var/tmp/upg/lsvg.out
#Save volume group info
vgs=`lsvg -o`
for vg in $vgs
do
lsvg $vg | tee /var/tmp/upg/$vg.out
lsvg -l $vg | tee -a /var/tmp/upg/$vg.out
lsvg -p $vg | tee -a /var/tmp/upg/$vg.disk.out
done
lsdev -Cc disk | tee /var/tmp/upg/lsdevdsk.out
lsdev -C | tee /var/tmp/upg/lsdev.out
cp /etc/filesystems /var/tmp/upg/filesystems.B4
netstat -rn | tee /var/tmp/upg/netstat.out
netstat -in | tee /var/tmp/upg/netstati.out
ifconfig -a | tee /var/tmp/upg/ifconfig.out
lscfg -v | tee /var/tmp/upg/lscfg.out
lsattr -El inet0 | tee /var/tmp/upg/inet0.out

Thursday, April 8, 2010

HACMP TL Upgrade procedure

Pre-requistes for both the nodes.

1: Make sure you have fresh copy of mksysb for A and B node
2: Make sure you have sufficient space on /tmp and /usr

3: Take the following outputs for both nodes

a: lspv,lsvg,netstat -in,netstat -ir,lsfs

b: clstat,cltopinfo,clRGinfo

c: cluster snapshots

d: HDLM Settings.

4: Create a temporary logical volume on both nodes and copy the tl8,sp6,hacmp and hacmp service pack.

Location of the Softwares for TL8,SP6,HACMP AND HACMP Service Pack.

/data/OS_install/TL8/5300tl8sp2.tar.gz ( For TL8) --- this is my location on NIM server :)

/data/OS_install/Update53TL8SP6 ( Service pack)

/data/OS_install/HACMP5.4 ( Location for HACMP)

/data/OS_install/HACMP5.4/sps ( HACMP Service Pack )

Implementation Plan :

1: Shutdown the cluster services on Node B(which one is passive) after informing respective application team
a: check the output of clstat

b: sync the cluster smitty hacmp------>Problem Determination ----------> HACMP VERIFICATION OR cldare -tr)

c: Bring the resource group offline on Node B
d: shutdown the cluster gracefully.

e: lssrc -g cluster clRGinfo resource group should show offline ( This output should as empty )

2: copy the TL8 software from NIM backup network to the local filesystem unzip and untar it.

smitty update_all

Do preview as “yes” with Commit software updates as “Yes” and Accept License Agreements as “Yes”
If no errors, proceed with installation by changing preview as “no”.
After installation check “oslevel –s”.If the oslevel still shows less than TL8,then probably few filesets are not in applied state. Reapply the filesets that are at lower level (lower level file sets can be found using instfix command).

Note: Entire TL,SP6,HACMP you can also install it in applied state and after checking with application team regarding, if everything is fine and make it commit also.

bootlist –m normal –o (verify both the disks in the rootvg are shown)

Reboot the server and check the output oslevel -s

3: SP6 installation

Mount the file system /data/OS_install from nim as /mnt using backup network. Omit this if already in mounted state.
a. Check whether fileset rsct.opt.storagerm is installed or not. If not, install base fileset (2.4.5.0) found under /mnt. Verify it using lslpp.
b. Use “smitty update_all” and select the source directory as /mnt/Update53TL8SP6
c. Do preview as “yes” with Commit software updates as “Yes” and Accept License Agreements as “Yes”
d. If no errors, proceed with installation by changing preview as “no”.
e. After installation check “oslevel –s”.It should show 5300-08-06-0918 and check the version of fileset rsct.opt.storagerm version (2.4.9.0).
f. bootlist –m normal –o (verify both the disks in the rootvg are shown)
g. Reboot the server.

4: HACMP Upgrade Procedure :

a: Check if the cluster is running or not,If running stop it

b: Remove the cluster filesets

smitty remove

c: mv /usr/es/sbin/cluster /usr/es/sbin/cluster.old

d: Install the HACMP Filesets which are copied to the temporary filesystem on Node B

smitty install_latest

Do preview as “yes” with Commit software updates as “Yes” and Accept License Agreements as “Yes”. If no errors, proceed with installation by changing preview as “no”.

e: Install the HACMP Service pack

f: Reboot the server

5: HDLM UPGRADE:

5.2 For HDLM 5.42
a. Backup HDLM settings – ** dlnkmgr view –sys > /var/dlmset1
** dlmodmset –o > /var/dlmset2
Execute “dlmmigsts” under /data/OS_install/HDLM6.1/HDLM_AIX_6101/HS066-59_HDLM_AIX6100sw/hdlmtool directory as below
** #./dlmmigsts –b -odm /var/dlmset3 –set /var/dlmset4. This backs up ODM settings.
b. dlmrmdev –A ? This unmounts the file systems and varyoff the vg’s that are used by HDLM.If any error, check and manually and unmount them and then reexecute the command.
c. Remove the HDLM software using smitty.
d. Create the directory /var/DLM
e. echo "X0YAFGHIJKLMX9A89ABCDHIJK0123PQTST2VX0YZ62A6D768" > /var/DLM/dlm.lic_key
f. Copy the license key file hdlm_license under to /var/tmp
g. Go to directory /mnt/HDLM6.1/HDLM_AIX_6101/HS066-59_HDLM_AIX6100sw and type # installp -aXgd . all. This will install HDLM 6.1. Sometimes it gives error that “Hitachi.aix.support.rte” is missing. In that case go to directory /mnt/HDLM6.1/HDLM_AIX_6101/HS066-59_HDLM_AIX6100sw/ AIX_ODM/HTC_ODM and install the fileset.Once done proceed with HDLM 6.1 installation. There is HDLM service pack that needs to be installed. File is found under
/data/OS_install/HDLM6.1/HDLM_AIX_6101/HS278-04_HDLM_AIX6101servicepack/DLM06-10-01P_E01_AIX. Go to this directory.
h. Run installp -aXgd . all.

i. Run cfgmgr
j. dlnkmgr view –sys should show service pack version as 6.1.0-01P.
k. Restore the HDLM settings saved as above.
#./dlmmigsts -r -odm /var/dlmset3 -set /var/dlmset4
j. dlnkmgr view –sys to verify the settings (failback, path health check etc.) and they should be same as /var/dlmset1.
k. dlnkmgr view –drv should show all devices as hdisk* and not as dlm devices.
l. dlnkmgr view –path shows all the path status.

Repeat the same steps for Node 1 ,check if any backup running on Node 1

Applying the cluster snapshot
After reboot of the both nodes apply the cluster snapshot on primrary node as follows

On the primary node, copy the "odm" portion of the snapshot from your
home directory to /usr/es/sbin/cluster/snapshots.

Apply the Snapshot now which is converted.

Sync the cluster from primrary node

cldare -tr

make sure it completes successfully

Start the cluster.

First on primary:
smitty clstart
make sure to startup cluster information daemon
tail -f /tmp/hacmp.out -> to watch it come up, watch for errors

After first node is up, start cluster on secondary.

Unix operating states

Multi user mode
 Normal machine state
 User can log in
 File systems are mounted in
 Most services and daemons are running

Single user mode

 Required for some admin tasks
 Only root login is enabled
 Non –critical file systems are mounted
 Non-critical daemons are shut down
Halt state
 Nothing is running

Note

Use the shutdown and reboot commands to properly move between single-user mode, multiuser mode, and the halt state as shown in the diagram on the slide.
The shutdown Command Details
Executing the shutdown command stops system activities in an orderly and consistent manner. The shutdown command performs the following tasks:
Prompts the administrator for a broadcast message to send to all users.
Broadcasts the warning message to all user terminal sessions.
Grants a 60 second (by default) grace period for users to log out.
Kills all user logins.
Shuts down all non-critical processes.
Unmounts all non-critical file systems.

Depending on the option specified, shutdown will either leave the system in single-user mode (no options), the halt state (if -h was specified), or initiate a reboot (if -r was specified).

Common shutdown options:
# shutdown -hy 600 # shutdown to a halt state in 600 seconds.
# "-y" (yes) option prevents shutdown from
# requesting confirmation before proceeding.

# shutdown -ry 600 # reboot in 600 seconds without requesting confirmation.

# shutdown -ry 0 # reboot immediately without requesting confirmation.
Reboot Command Details
The reboot command uses "kill -9" to kill running processes, which takes the system down quickly, but can cause problems for applications and file systems. The shutdown command shuts down applications and processes more gracefully, and thus is the preferred method for halting or rebooting the system from multiuser mode. Reboot may be used if:
The system is already in single-user mode.
You need to bring the system down very quickly.

Common reboot options: (For a complete list of options, see the man page for reboot (1m))

# reboot –h # shutdown to a halt state (only use this from single-user mode).
# reboot # reboot

AIX V5.3 Installation

Installation Methods are listed below
1) CD-ROM
2) Tape
3) Preinstallation (For new system order)
4) Network install Manager (NIM)
The contents of the CD-ROM is packaged in a file system format, thus the installation process from a CD is carried out in a different format than the tae. The preinstall option is valid only if accompanied by a hardware order that includes the preinstalled AIX Version 5.3. Network installations are carried out using the AIX Network install Manager (NIM).

Method of Installations
1) New and Complete overwrite:
On a new machine, New and complete Overwrite is the only possible method of installation. On an existing machine, if you want to completely overwrite the existing version of BOS, then you should use this method.
2) Preservation Install
Use this installation method when a previous version of BOS is installed on your system and you want to preserve the user data in the root volume group. This method will remove only the contents of /usr, / (root), /var and /tmp. The preservation install option will preserve page and dump devices as well as /home and other user created file systems. System configuration will have to be done after doing a preservation installation.
3) Migration Install
Use this installation method to upgrade an AIX version 5.1 or later system to AIX version 5.3, while preserving the existing root volume group. This method preserves all file systems except /tmp, as well as the logical volumes and system configuration files. Obsolete or selective fix files are removed. Migration is the default method for an AIX system running version 3.2 or 4.x.

The installation process determines which optional software products will be installed.

Install 64-bit and JFS2 Support

If you choose yes and are installing with the New and Complete Overwrite method, the file systems are created with JFS, instead of JFS. If you want the 64-bit kernel, but do not want JFS2 file systems, then select No. after the install completes, use the following commands to switch to he 64-bit kernel:
#ln –fs /usr/lib/boot/unix_64 /unix
#ln –fs /usr/lib/boot/unix_64 /usr/lib/boot/unix
#bosboot –ad /dev/ipldevice
Finally reboot your system