Data Transfer Nodes
Data transfer nodes (DTNs) are dedicated, high performing servers used specifically for data transfers. These servers are optimized for large-scale data transfers and are usually best located within a Science DMZ infrastructure.
For DTN hardware, LLNL's Livermore Computing Center has compiled a document detailing hardware recommendations that work well for an ESGF environment.
ICNWG will be working with Globus-equipped DTNs (GridFTP-based software) for data replication between the ESGF sites. LLNL has installed the Globus Connect Server on their DTNs: information on how they configured the systems are available here.
ESnet has three high-performance data transfer hosts connected directly to the ESnet 100Gbps network backbone, which will help test the transfer performance from disk-to-disk for the Globus DTN endpoints. These are accessible to any university or science site.
-anl-diskpt1.es.net / anl-diskpt1-v6.es.net: Near Chicago, IL
-bnl-diskpt1.es.net / bnl-diskpt1-v6.es.net: Near NYC, NY
-lbl-diskpt1.es.net / lbl-diskpt1-v6.es.net: Berkeley, CA
Globus Service Access
The test hosts are also available via the Globus Transfer service. They are configured for anonymous, read-only access.
anl-diskpt1.es.net is registered as the endpoint esnet#anl-diskpt1
bnl-diskpt1.es.net is registered as the endpoint esnet#bnl-diskpt1
lbl-diskpt1.es.net is registered as the endpoint esnet#lbl-diskpt1
Sample GridFTP test commands
If you don't have globus-url-copy installed, please refer to the GridFTP Quick Start Guide
#make sure you can connect to server
globus-url-copy -list ftp://lbl-diskpt1.es.net:2811/data1/
# copy 1G file
globus-url-copy -vb -fast ftp://lbl-diskpt1.es.net:2811/data1/10G.dat file:///tmp/test.out
# copy 1G file using 4 parallel streams
globus-url-copy -vb -fast -p 4 ftp://lbl-diskpt1.es.net:2811/data1/10G.dat file:///tmp/test.out
# write to /dev/null
globus-url-copy -vb -fast -p 4 ftp://lbl-diskpt1.es.net:2811/data1/10G.dat file:///dev/null
# read from /dev/zero
globus-url-copy -vb -fast -p 4 -len 1G ftp://lbl-diskpt1.es.net:2811/dev/zero file:///tmp/t.out
# Use UDT instead of TCP
globus-url-copy -vb -udt ftp://lbl-diskpt1.es.net:2811/data1/10G.dat file:///dev/null
Each host has a high-performance disk array, mounted as /data1. The following test files are available on each server, and are generated using "/dev/urandom" (the size is what you would expect from reading the filename):
/data1/1M.dat, /data1/10M.dat, /data1/50M.dat, /data1/100M.dat,
/data1/1G.dat, /data1/10G.dat, /data1/50G.dat, /data1/100G.dat
In addition, there are currently several data sets composed of multiple files in a directory structure. These data sets are for testing multi-file transfers. The data sets each contain directories a through y. Each of these directories contains directories a through y. Each leaf directory contains data files named for their place in the directory structure. So, a-a-1M.dat is a 1,000,000 byte data file in the data set with path 5GB-in-small-files/a/a/a-a-1M.dat. Note that the tiny file test set is primarily for testing directory creation performance, as the amount of data transferred will be trivial.
The data sets are:
/data1/5MB-in-tiny-files - 1KB, 2KB, and 5KB files in each leaf directory
/data1/5GB-in-small-files - 1MB, 2MB, and 5MB files in each leaf directory
/data1/50GB-in-medium-files - 10MB, 20MB, and 50MB files in each leaf directory
/data1/500GB-in-large-files - 100MB, 200MB, and 500MB files in each leaf directory
Sample commands for copying the complete data sets (these use the Berkeley DTN - substitute the other DTNs as needed):
# Copy using one stream only to test single-stream disk-to-disk performance
globus-url-copy -vb -p 1 -fast -r ftp://lbl-diskpt1.es.net:2811/data1/5GB-in-small-files/ \
globus-url-copy -vb -p 1 -fast -r ftp://lbl-diskpt1.es.net:2811/data1/50GB-in-medium-files/ \
# Copy using 4 parallel streams
globus-url-copy -vb -p 4 -fast -r ftp://lbl-diskpt1.es.net:2811/data1/5GB-in-small-files/ \
globus-url-copy -vb -p 4 -fast -r ftp://lbl-diskpt1.es.net:2811/data1/50GB-in-medium-files/ \
# Copy the big data set using 8 parallel streams
# (make sure your performance is good before doing this one!)
globus-url-copy -vb -p 8 -fast -r ftp://lbl-diskpt1.es.net:2811/data1/500GB-in-large-files/ \