Skip to main content

Software  >  Tivoli  >  CCR2  > 

CCR2

A publication for the IBM Tivoli and zSeries community

Tivoli software

Simplifying MVS Operations with IBM Tivoli AF/OPERATOR on z/OS—"Cradle to Z EOD"
from CCR2, Issue 1 - 2005

Jack Brownwood By Jack Brownwood
Senior solutions consultant
IBM Tivoli Software Group

The phrase "Cradle to Z EOD" is intended to convey the message that IBM Tivoli AF/OPERATOR® on z/OS can manage MVS host resources from the time the power on/load button is actuated on the HMC console to the final shutdown commands. The intent of this article is to explain how this is done and the benefit realized by completing the process.

Why would you want to do this?
The objective of this step in automation is to remove manual intervention from the IPL and shutdown processes. Doing this accomplishes one more step in the process toward "lights out" MVS console automation, a long-held goal of most automation projects.

The reasons for this goal are varied: remote site support, consolidation of data centers and LPARs, cost control and elimination of operational errors. The list is lengthy and the motivation usually a combination of factors. Data center management is a significant corporate expense in a climate where every dollar counts and the pressure to cut costs increases as time passes. Most data centers are measured against "best practice" standards and falling short puts data centers in the direct line of fire of outsourcers.

What are the base requirements?
There are two salient issues relating to this topic. One is how and why you would want to run AF/OPERATOR in a SUB=MSTR mode. The second relates to the overall objective of MVS console automation.

To get to the point where you can run multiple MVS images with little console intervention or run an MVS site environment remotely, most operator functions must be automated and message suppression completed (somewhere on the upside of 90%). Presumably, the eventual outcome of the automation task is a somewhat unattended environment, with most problem situations being dealt with directly by technicians best equipped to manage them successfully.

This leads us back to the first point, running SUB=MSTR. Doing this assures automation is capable of performing all the console tasks related to system IPL and shutdown. What it can't do, an outboard automation platform, such as IBM Tivoli AF/REMOTE®, can.

Running SUB=MSTR used to involve making sure all the datasets contained in the PROC that you ran were cataloged in the system master catalog. This is no longer a requirement. MVS can now manage datasets listed in user catalogs, before JES is initialized.

The rationale behind this process is for automation to manage the entire system initialization. To do this, you must assure AF/OPERATOR initializes first during the IPL and is shut down last. If more than one type of start-up is required (e.g., maintenance, checkpoint and normal), several IEASYSxx members are required or, alternatively, a choice is given to the operator to determine which IPL is being initiated.

The second step is to eliminate the start of any other system task from the IEFSSNxx and COMMNDxx members. All of these steps are done by the MVS systems programmer after agreement of data center management is obtained. Much work must be done in automation to assure these critical system tasks are started correctly and in the right order. Presumably, the automation started task manager in use is flexible, and capable of starting IPL-only tasks and following the chain of prerequisites that result in a fully functional system.

We're here to help
Easy ways to get the answers you need.

Call me now
E-mail us  E-mail us
or call us at
1-877-426-3774
Priority code: 104CBW62

eNewsletter
Free eNewsletters!
Publications for the IBM Tivoli and System z communities
Learn more

Tivoli Beat
Tivoli Weekly Feature
Click here for weekly insight on IT Service Management solutions

More offers

 


How would you begin the process?
If you already have substantial automation in operation and your IPL is managed largely by AF/OPERATOR or some other automation product, you may only need to extend the service from system load to where JES is initialized, and from when JES shuts down to Z EOD. Where do you start? You need a copy of the system messages and commands issued before SYSLOG begins. Here are three AF/OPERATOR traps you can use to capture those messages and write them to a user log.

"TRAP ADD($$MSGS) WTO('*') ENA AOTRAP NOLOG ACT('EX USRLOG')"
"TRAP ADD($$CMDS) CMD('*') ENA AOTRAP NOLOG ACT('EX USRLOG')"
"TRAP ADD($$STOPLOG) WTO('IEA630I *') ENA LOG",
     "AOTRAP ACT('EX STOPLOG')"

The USRLOG program simply issues the REXX 'logmsg(aotext,'USER' or 'RKOGLOGM') and STOPLOG issues an AO FREE for the appropriate dataset and deletes the $$MSGS & $$CMDS traps. Save the log dataset into a permanent dataset before the next IPL for analysis and development.

The last trap, $$STOPLOG, assumes you are running JES2 and it is the message issued when the log becomes available. If you are running JES3, simply replace the trap with the JES3 active message.

"TRAP DEL($$MSGS) WTO"
"TRAP DEL($$CMDS) CMD"
"AO FREE(USERLG)"

Why do you need the early log messages?
The early syslog messages should be scanned carefully to determine what needs to be added to your IPL code to truly automate the rest of the process. You may find start commands issued from a SYS1.PARMLIB member not yet in the code. In particular, these would be the COMMNDxx or IEFSSNxx members or IEACMD(SYS)xx. You also may find JCL or imbedded code that ties several starts to CAS9, for instance. This must come out and be integrated into your start and stop processes. You may also find WTOR messages that require responses:

*4102 IOS120D I/O TIMED OUT FOR DEVICE 3036. REPLY 'WAIT' FOR I/O
COMPLETION OR 'CONT' TO CONTINUE WITH DEVICE OFFLINE

This WTOR requires a CONT response before the IPL can commence. Traps for these conditions must be added very early in the automation process.

Configuring AF/OPERATOR
The steps in configuring AF/OPERATOR are to assure the DASD message logs and user logs are defined and cataloged as disk datasets. You will see in the CICAT configuration that if you select DISKMLOG, file definitions are added to the JCL. If this was not done during configuration, you will need to define them yourself. The DD names are RKOGLM01/n for the message log (n is the number of datasets you've defined) and RKOGLU01 for the user logs. They are physical sequential datasets that should be defined as fixed block with a logical record length of 133. With current DASD technology, 27930 seems to work out as an efficient block size. At this juncture of your automation development, it is unlikely that you will require 'Retain View' logs RKOGLH01/02. If you happen to use this facility, the logical record length is 480 with the rest of the parameters the same as the log files.

In addition to defining the logs in the JCL (and removing the '//RKOGLOGM DD SYSOUT=&outclass' statement), you will need to specify DISKMLOG in the AF/OPERATOR parameter file. If check pointing of TOD traps is in use and RELOAD is either TOD or ALL, the parameter TODISYNC should be set to "Y." If it is set to "N," the check pointed TOD traps will begin to run their scripts as soon as AF/OPERATOR starts and it's almost certain you'll want the rest of the environment to be initialized. The AO TODSTART command will then need to be added to the system initialization process sometime after the tasks which the TOD activities are impacting are initialized. This is probably just after the system IPL is complete, or at least after the USER SYSTEM variables are added and any requisite files are allocated.

What happens in the middle?
Between IPL and shutdown, the day-to-day automation takes place. The level of automation support during "working hours" varies from site to site. Manual processes executed by MVS operations should be automated as far as practical. Alert management and response technology should—as far as possible—eliminate manual intervention. All of this depends on a number of variables, not the least of which is the available technology, the mission of the automation team, and management support. A thorough discussion of this technology is a fitting subject for extended discussion, but beyond the scope of this article, which is dedicated to the notion of a single —button— start and stop process.

System shutdown
On the back end of the process, it is critical to assure that system shutdown proceeds in an orderly manner. For instance, in a JES2 environment, you must have received the $HASP099 message before the $PJES command is issued. This message is only issued after all JES resources are down. So, as a pre-stop process for JES, issue a D A,L and parse the output to assure all tasks—except JES, AF/OPERATOR, LLA, VLF and any other SUB=MSTR task stopping after JES—are down. OMVS creates subtasks that run under JES that do not show up in a D A,L display. For these you should issue a D OMVS,ALL command and trap and parse the output to assure all OMVS components are down.

/* REXX */
/* Issue all requisite $P commands, lines, logons, initiators, etc., */
ret = redirect('line.','*')
"OPER 'D A,L' RESP"
do i = 5 to line.0
    line = line.i
    if wordindex(line,2) = 10 then do
        got_tso = 1
        leave i
    end
    parse upper var line task1 . 38 task2 .
/* If TASK1 or 2 aren't JES, AO, VLF, LLA, etc. stop or cancel */
end

"WAIT WTO('BPXO040I * DISPLAY OMVS *')",
    "SECONDS(60)",
    "AFTER('OPER ''D OMVS,A=ALL''')"
rc = glbvget('aocase')
if aocase = 1 then do
    rc = glbvget('aowtx#')
    do index = 7 to aowtx#
       rc = glbvget('aowtx'index)
       resp = value('aowtx'index)
       IF MATCH(resp,'LATCHWAIT*') then do
             parse var resp . v2 v3 v4 .
             if datatype(v2) = 'NUM' then do
                 "OPER 'F BPXOINIT,TERM="v3"'"
             end
             else do
                 "OPER 'F BPXOINIT,TERM="v4"'"
             end
       end
    end
end
"OPER 'F BPXOINIT,SHUTDOWN=FILESYS'"
"OPER 'F BPXOINIT,SHUTDOWN=FORKINIT'"

The bottom line of this code assures JES is brought down successfully without having to "abend" it. If the $HASP099 message does not appear, there is some task or subtask of JES still active in the system and it must be brought down before the $PJES2 command can be issued successfully. This command should not be issued until the '$HASP099 ALL AVAILABLE FUNCTIONS COMPLETE'message appears on the console. It can have an adverse affect on stopping the remaining resources and may force operations to ABEND JES. This is not an action recommended or necessary for proper system shutdown. The long and short of this discussion is that you need something like:

"WAIT WTO('$HASP099*') ENA SECS(600) ACT('OPER '' $PJES2''')"

to stop JES. An ordinary WTO trap would work fine, but there are actions to take subsequent to the JES shutdown message receipt.

When JES terminates, stop LLA and VLF and wait for them to come down. These are simple P or STOP commands that execute very quickly. There may be other stop commands you need to issue to get the system completely purged. In general, the system security task does not need to be stopped, but some shops run some clean up code after JES is down.

At this juncture the system is all but down. Only three things remain to be done:

  • Stop AF/OPERATOR
  • Issue the Z EOD command
  • Vary the LPAR out of the SYSPLEX

If this is the last LPAR in the SYSPLEX, the V XCF must be issued manually or by an outboard automation source. If there are LPARs remaining and AF/OPERATOR is up and running, the vary can be handled by routing a command to the remaining LPAR. This will set a TOD trap for a few seconds in the future that issues the vary to the coupling facility. The command looks like:

"OPER 'RO "plex_smfid", EX TODVXCF "''AOSMFID "'''"

… where plex_smfid is the focal LPAR for the SYSPLEX and presumably the last one down. TODVXCF is a script that adds a few seconds to the current time, then executes a script:

"WAIT WTO('##REP IXC101I *') SEC(10) ",
    "AFTER('OPER ''V XCF "targ_smifid",OFF''')"
/* if aocase = 1 then */
call glbvget 'REP'
"OPER 'R "rep",YES'"

In this example, the variable 'targ_smfid' is passed to the script from the TOD trap generated when the route command is received from the partner system.

Now the final two steps:

  • Issue "OPER ''Z EOD'''
Then immediately:
  • Issue a "OPER 'C(ancel) &AOTASK'"

A cancel is acceptable if and only if all AF/OPERATOR resources are down. The only resources up after all other resources are down are files that may need to be closed and freed, and any other termination requirements that may be part of shutdown, such as displays of statistics. The cancel of AF/OPERATOR is the last command you can issue. The Z EOD will wait until the last task is down before executing.

Concluding thoughts
With MVS automation in the data center for over 15 years now, it's time to use it to its full potential. Issuing commands manually for almost any reason is at best slow and at worst chancy. Mainframe availability is far too important to risk error when the correct solution is easily programmed.

For more information

IBM Tivoli AF/OPERATOR on z/OS

IBM Tivoli AF/OPERATOR on z/OS support

IBM Tivoli AF/OPERATOR on z/OS education

Download a PDF version of this article
Get Adobe® Reader®


Related links
The Mainstream
Business journal for the System z community
Tivoli Beat
Weekly updates on the IBM service management perspective
IBM software for System z
The power to drive an enterprise
IBM Tivoli software
Intelligent management software for the on demand world
Tivoli Software Global User Group Community
Join your peers in our information and community hub
IBM Tivoli Monitoring Newsletter
Enhance your skills in the management and support of your monitoring product portfolio
Open Process Automation Library
OPAL is Tivoli's worldwide online catalog with hundreds of technically validated, production ready IT Service Management integrated extensions provided by IBM and IBM Tivoli Business Partners.