Ask any network administrator what the most important developments of the last 20 years are, and most will name TCP/IP near the top of the list. Prior to the development of a standard network protocol, getting computers to exchange information was a difficult and arduous task. Each type of computer used its own method of communication, and converting between methods often involved several steps. Unfortunately, most database administrators still face that situation every day.
Although the world has standardized on a single network protocol, every RDBMS uses its own method for handling data. It is common to find one RDBMS at the branch or small-office level, while the central office uses another, and yet a third is used in decision- support functions.
Transferring data between different RDBMS systems can involve considerable difficulty and expense, especially if real-time data consistency is required. Often, the solution involves expensive middleware products and replicated data, which add another level of complexity and administrative headaches. If real-time data consistency is not required, however, you can often transfer data between databases by using the tools and utilities included with the RDBMS. This article describes just such a solution for transferring data between Microsoft SQLServer and IBM DB2 on an AS/400.
A Solution to a Problem
The problem involves a medium-sized company with about 20 branch offices running Windows NT as the primary file and print server. The central office uses an AS/400 running DB2 as its primary decision-support database. The company has decided to implement an SQLServer application at each of the branches but still needs a way to collect nightly summary information and import it into the central DB2 database. Since the data is only a daily summary, real-time data consistency between the two platforms is not a requirement. In addition, the customer is looking for the simplest and most cost-effective solution, preferably with a minimum of custom programming.
SQLServer includes native database replication and partitioning that can transfer data but only with other SQLServers. In addition, SQLServer 7.0 now includes a number of data-transformation utilities designed to move data to and from other databases.
However, these utilities are primarily graphical, and the lack of a local database administrator at each branch creates a need to find the simplest solution possible.
The answer to this problem involves an SQLServer command-line utility: the Bulk Copy Program (BCP). BCP is designed specifically to transfer information between SQLServer and a flat-text file. The text file can then be copied across the network and imported into any other RDBMS. The command-line nature of BCP makes it very easy to incorporate into batch files, which the Windows NT Scheduler service can then execute at specific times. In the end, the final solution takes the following form:
1. At each of the branch NT servers, the Windows NT Scheduler service calls a batch file every night at midnight.
2. The batch file first executes a BCP command that executes a query against the database and stores the output in a comma-delimited text file with the branch code as part of the filename.
3. The batch file then starts the NT FTP client by using a predefined script and transfers the results file to an FTP server on the AS/400.
4. A process on the AS/400 imports the files into the DB2 database.
5. Each morning, an administrator at the central office checks the results to be sure that all files have been received and that the import process has finished correctly. If any files have not been received correctly, the administrator recopies the file manually and reruns the import process on that file.
The solution proves extremely reliable, in part because of its simplicity. This article discusses the first three steps in the process: calling a batch file, using the batch file to call a BCP, and FTPing the file to an AS/400.
The BCP Command Line
By default, BCP is installed in the MSSQL7BINN directory and included with SQLServer specifically to provide a way to import and export data by using flat-text files. These text files can then be shared with many other programs, including spreadsheets, word processors, and, in this case, other databases.
Unfortunately, this speed and flexibility comes at a price. The complex nature of the program requires a large number of command-line parameters. You probably need to experiment by testing a BCP command before trying it in a production environment. Figure 1 shows several sample BCP command lines that illustrate most of the parameters commonly used with the utility along with definitions for the most common parameters. All these command lines are designed to work with the Pubs sample database that comes with SQLServer. For more information on all of the BCP command line options, you can refer to the Web sites listed at the end of this article.
You can execute BCP against any table or view, or BCP can execute an ad hoc SQL query, as shown in example C (Figure 1). In each of the examples, I have also specified the database name (Pubs) and the table or view owner (dbo); both of these items are optional. If no database is specified, BCP automatically executes against the users default database. However, if the user executing BCP is not the table owner and the table owner is not specified, an error is returned and the BCP operation aborts.
The next parameter defines the mode of operation for BCP. If a table or view is specified, you can transfer data either in or out. If used with an ad hoc query, BCP should use the queryout mode, which will execute the SQL statement included in the command line. This allows for more complex data manipulation before the text file is generated, such as column summation or table joins. Finally, you use the FORMAT mode with a format file to alter data as it is imported or exported. (The FORMAT mode is discussed later in more detail.) Each command line specifies the data file used in the operation (in this case, bcpdata.txt).
Having told BCP what mode to use, the next step is to give the command its security context. The -T switch, seen in example A, specifies that BCP use a trusted
connection, meaning that you must integrate the SQLServer with the Windows NT security subsystem. If SQLServer were installed to use its own security, you must give BCP a login ID and password by using the -U and -P switches respectively, as in example B. Finally, example B also shows the -S switch, which can determine the server that the BCP command is executed against. If the server is not listed, the BCP command executes against the local SQLServer.
A user can also specify the field and row terminators used during the BCP operation by using the -t and -r switches respectively. The default is to use a tab ( ) as the field separator and a new line ( ) as the row separator, although these are not needed if a format file is used. One very common change is to use the -t, switch to identify a comma- delimited data file, as shown in example A.
The -F and -L switches specify the first and last rows to be copied by using BCP. The default is for BCP to duplicate all rows in the table or view (-F1 to -L0). BCP also includes several methods for handling errors. The -e switch, seen in example B, records any error messages to the listed file, while the -m switch causes BCP to abort after it has reached the set number of errors.
Data Types and Format Files
Having selected from this wide range of parameters, you are now ready to perform your first BCP operation. As any experienced database administrator knows, however, the fun is just beginning. Most problems that arise while transferring data between databases involve the different data types that each RDBMS uses. Fortunately, BCP has many options for controlling the type of data being imported or exported. Several default switches can be used for standardized data types:
The -n switch indicates that the data file uses native database data types.
The -c switch indicates that the data file uses character data types for all fields. (This switch is the most portable format.)
The -w switch indicates that the data file uses Unicode data types. (This switch does not work with SQLServer 6.5 and earlier.)
The -N switch indicates that the data file uses Unicode for character data and native data types for numeric data. (This switch is a faster alternative to -w and also does not work with SQLServer 6.5 and earlier.)
The -6 switch indicates that the data file uses SQLServer 6.0 or 6.5 data types. You should try your BCP command by using several of these data types to see which one works best for your situation. Often, you can use the -n switch even when moving data between different RDBMSs, however, the -c switch will create the most portable data file. This will probably be the best option when transferring data to DB2/400. If none of these options works for your situation, never fear. BCP has another tool for manipulating data: the format file.
If you execute a BCP command without listing one of the five options above, the command prompts you for the name, data type, and length of each field in the table or view. The information you enter can then be stored in a format file by using the -f switch (FORMAT mode) to specify the name of the output file. You can also use the information with future BCP operations to provide customized data types. The FORMAT mode of BCP automatically generates a format file for the given table by using one of the five standard data typing schemes listed above. If none of these schemes proves satisfactory, however, you can manually create or edit the format file to fit your particular needs. Figure 2 shows a sample format file.
The first row of the format file specifies the version of BCP to be used, and the second is the number of fields or columns in the database table. Each subsequent row defines a field in the table, listing the name, length, data type, field terminator, and row terminator. For example, all the items in Figure 2 are of the generic SQLCHAR data type and use tabs as the field terminator. Figure 3 shows three different format files that were
generated from the discounts table in the Pubs database and that use three different sets of data types. In general, the character mode is the most useful mode when transferring data to another RDBMS, although care has to be taken to be sure that the data types are read correctly when importing data on the remote database.
In addition to controlling data types, format files are useful for combining or separating fields while moving data between a table and the data file. They are also useful for changing the order of the fields. (More information about using format files to change data types and field order is available from the Microsoft Web site at technet.microsoft.com/cdonline/content/complete/srvnetwk/SQL/manuals/admincmp/75517 c02.htm.)
FTP Scripts and Batch Files
Returning to the original problem of transferring data to DB2/400, you have now created a BCP command and format file that creates the desired output data file. You have also tested the data file with the import process on the AS/400 and determined that the data is being successfully entered into DB2. All that remains is to design a way to transfer the flat file from the NT server to the AS/400. Here again, you turn to simple command-line utilities to move the data file (namely FTP, batch files, and the NT Scheduler service).
Figure 4 shows a sample batch file that can start BCP and FTP the file to the AS/400. The batch file first uses BCP to create an output file from two different tables on the SQLServer. It then starts the FTP client and transfers those files to the AS/400 by using an FTP script similar to the one in Figure 5. The batch file is started by using the NT Scheduler service and AT command and is set to run every evening at midnight. (You can find more information about using FTP to transfer data between NT and the AS/400 in
Setting Up Windows NT to FTP to the AS/400, AS/400 Network Expert, September/October 1999.)
Simplify! Simplify! Simplify!
More can certainly be done to provide a more automated solution for the customer. There is no error checking anywhere in the process, other than in the BCP error output file. All error checking is performed the following morning by an employee at the central office, who checks to see that all files have been transferred to the AS/400 and properly imported into the DB2 database. The client has a very reliable network, and the simple nature of the solution means that problems are rare. However, in the event of an error, the employee reruns the BCP and FTP processes manually, remotely running the same batch file called by the Scheduler service. The new files are then reimported into DB2/400. You can automate this whole process easily by using any of the more complicated scripting hosts, such as Perl or Windows Script Host (WSH). The command-line nature of BCP means that it can be called from any programming language that can execute operating-system instructions. In this case, the customer is looking for an extremely simple solution. By using native utilities, you can deliver one and meet the clients needs.
References
Importing and Exporting Data, Microsoft TechNet: technet.microsoft.com/ cdonline/content/complete/srvnetwk/SQL/manuals/admincmp/75517c02.htm
Setting Up Windows NT to FTP to the AS/400, John P. Lyons, AS/400 Network Expert, September/October 1999
Related Materials
Command Prompt Utilities, Microsoft TechNet: technet.microsoft.com/cdonline/ content/complete/srvnetwk/SQL/manuals/utilref2/chpt4/75524c16.htm
Optimizing Utility and Tool Performance, Microsoft TechNet: technet. microsoft.com/cdonline/content/complete/srvnetwk/SQL/manuals/diag/part/75528c04.htm
IN BCP transfers data from the text file into the specified table OUT BCP transfers data from the table into a text file QUERYOUT BCP executes the SQL query from the command line FORMAT BCP creates a format file named with the -f switch
-T Use a trusted connection (integrated with NT security) -Uxxxx Specify a database login user ID
-Pxxxx Specify a password
-Sxxxx Specify the SQLServer to log in to
-t"x" Specify the field terminator in quotes (default is tab, ) -r"x" Specify the row terminator in quotes (default is a line feed, )
-Fx Specifies the number of the first row to copy from the table -Lx Specifies the number of the last row to copy from the table -e"x:xxx" Specifies the error output file
-m10 Specifies the maximum number of errors before the BCP process aborts
-n BCP will use native SQLServer 7.0 data types for the output -c BCP will use character output for all fields
-w BCP will use Unicode datatypes for the output
-N BCP will use Unicode for character data and native data types for numeric
-6 BCP will use SQL 6.0 and 6.5 datatypes for the output -f"x:xxx" Specifies the name of the format file to be created or used by this BCP operation Ex. a) bcp pubs.dbo.authors in bcpdata.txt -T -t"," -F100 -L200 -n Ex. b) bcp pubs.dbo.titles out bcpdata.txt -Smssql -Usa -Ppassworde"c:error.txt" -m10 -c Ex. c) bcp pubs.dbo "select * from authors where authors.state = ëCAí" queryout bcpdata.txt -T Ex. d) bcp pubs.dbo.titleview format bcpdata.txt -Usa -Ppassword -t","f"C:format.txt"
Figure 1: Here are common BCP command-line options and illustrations of their use.
Version Number of
Columns
Host File Field Order
7.0
5
1 SQLCHAR 0 4 1 pub_id 2 SQLCHAR 0 40 2 pub_name 3 SQLCHAR 0 20 3 city
4 SQLCHAR 0 2 4 state
5 SQLCHAR 0 30 5 country
Host File Data Type Host File Data Length
Prefix Length Terminator Server
Column Name
Figure 2: These items all use tabs as the field terminator.
BCP Format Files
Native Mode (-n)
7.0
5
1 SQLCHAR 2 40 1 discounttype
2 SQLCHAR 2 4 2 stor_id
3 SQLSMALLINT 1 2 3 lowqty
4 SQLSMALLINT 1 2 4 highqty
5 SQLDECIMAL 1 19 5 discount
Character Mode (-c)
7.0
5
1 SQLCHAR 0 40 1 discounttype
2 SQLCHAR 0 4 2 stor_id
3 SQLCHAR 0 7 3 lowqty
4 SQLCHAR 0 7 4 highqty
5 SQLCHAR 0 41 5 discount
Unicode Mode (-w)
7.0
5
1 SQLNCHAR 0 80 1 discounttype
2 SQLNCHAR 0 8 2 stor_id
3 SQLNCHAR 0 14 3 lowqty
4 SQLNCHAR 0 14 4 highqty
5 SQLNCHAR 0 82 5 discount
Figure 3: These format files use three different sets of data types.
dumpdb.bat
bcp db1.owner.table1 out c:dataoutserver1table1.dat -Usa -Ppassword -ftransfer1.fmt
bcp db1.owner.table2 out C:dataoutserver1table2.dat -Usa -Ppassword -ftransfer2.fmt
ftp -s:c:atas400ftp.cmd
Figure 4: This batch file can start BCP.
as400ftp.cmd
open ftp.as400.mycompany.com
myuserid
Server Column Order
mypassword
cd /pub/ftp-in/db2files
lcd d:dataout
binary
prompt
mput *.dat
quit
Figure 5: This FTP script transfers files.
LATEST COMMENTS
MC Press Online