Configuration File


One of the great strengths of InfoSphere DataStage is that, when designing parallel jobs, you don’t have to worry too much about the underlying structure of your system, beyond appreciating its parallel processing capabilities.

If your system changes, is upgraded or improved, or if you develop a job on one platform and implement it on another, you don’t necessarily have to change your job design.

InfoSphere DataStage learns about the shape and size of the system from the configuration file. It organizes the resources needed for a job according to what is defined in the configuration file. When your system changes, you change the file not the jobs.

Unless you specify otherwise, the parallel engine uses a default configuration file that is set up when DataStage is installed.

Opening the default configuration file.

To open the default configuration file Select Tools > Configurations.

Example configuration file

The following example shows a default configuration file from a four-processor SMP computer system.

node "node1"
	fastname "R101"
	pools ""
	resource disk "C:/IBM/InformationServer/Server/Datasets" {pools ""}
	resource scratchdisk "C:/IBM/InformationServer/Server/Scratch" {pools ""}
node "node2"
	fastname "R101"
	pools ""
	resource disk "C:/IBM/InformationServer/Server/Datasets" {pools ""}
	resource scratchdisk "C:/IBM/InformationServer/Server/Scratch" {pools ""}


The default configuration file is created when InfoSphere DataStage is installed. Although the system has four processors, the configuration file specifies two processing nodes. Specify fewer processing nodes than there are physical processors to ensure that your computer has processing resources available for other tasks while it runs InfoSphere DataStage jobs.

This file contains the following fields:

The name of the processing node that this entry defines.
The name of the node as it is referred to on the fastest network in the system. For an SMP system, all processors share a single connection to the network, so the fastname node is the same.
Specifies that nodes belong to a particular pool of processing nodes. A pool of nodes typically has access to the same resource, for example, access to a high-speed network link or to a mainframe computer. The pools string is empty for both nodes, specifying that both nodes belong to the default pool.
resource disk
Specifies the name of the directory where the processing node will write data set files. When you create a data set or file set, you specify where the controlling file is called and where it is stored, but the controlling file points to other files that store the data. These files are written to the directory that is specified by the resource disk field.
resource scratchdisk
Specifies the name of a directory where intermediate, temporary data is stored.

Configuration files can be more complex and sophisticated than the example file and can be used to tune your system to get the best possible performance from the parallel jobs that you design.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s