HPCC Data HandlingBoca Raton Documentation TeamWe welcome your comments and feedback about this document via
email to docfeedback@hpccsystems.com Please include
Documentation Feedback in the subject
line and reference the document name, page numbers, and current Version
Number in the text of the message.LexisNexis and the Knowledge Burst logo are registered trademarks
of Reed Elsevier Properties Inc., used under license. Other products,
logos, and services may be trademarks or registered trademarks of their
respective companies. All names and example data used in this manual are
fictitious. Any similarity to actual persons, living or dead, is purely
coincidental.HPCC SystemsHPCC Data HandlingIntroductionThere are a number of different ways in which data may be
transferred to, from, or within an HPCC system. For each of these data
transfers, there are a few key parameters that must be known.Prerequisites for most file
movements:Logical filenamePhysical filenameRecord size (fixed)Source directoryDestination directoryDali IP address (source and/or destination)Landing Zone IP addressThe above parameters are used for these major data handling
methods:Import - Spraying Data from the Landing Zone to ThorExport - Despraying Data from Thor to Landing ZoneCopy - Replicating Data from Thor to Thor (within same Dali
File System)Copying Data from Thor to Thor (between different Dali File
Systems)Data Handling TermsA spray or import is the
relocation of a data file from one location (such as a Landing Zone) to
a Data Refinery cluster. The term spray was adopted due to the nature of
the file movement – the file is partitioned across all nodes within a
cluster.A despray or export is
the relocation of a data file from a Data Refinery cluster to a single
machine location (such as a Landing Zone). The term despray was adopted
due to the nature of the file movement – the file is reassembled from
its parts on all nodes in the cluster and placed in a single file on the
destination.A copy is the replication of a data file from
one Data Refinery cluster to another cluster within the same
environment.A Remote copy is the replication of a data
file from one Data Refinery cluster to another cluster in a different
environment.A Landing Zone (or Drop Zone) is a physical
storage location defined in your system's environment. There can be one
or more of these locations defined. A daemon (DaFileSrv) must be running
on that server to enable file sprays and desprays.Working with data filesOnce you start working with your HPCC system, you will want to
process some real data, this section shows you how to load data to your
HPCC system.Before you beginFirst, you should consider the size of the data and the capacity
of your system. A typical production HPCC system would have much more
data capacity than a development system. The size of the files you
wish to work with is limited by the size of your system.Uploading a fileFor smaller data files, maximum of 2GB, you can use the
upload/download file utility in ECL Watch.In your browser, go to the ECL
Watch URL displayed example,
http://nnn.nnn.nnn.nnn:8010, where nnn.nnn.nnn.nnn is your ESP
Server's IP address.Your IP address could be different from the ones
provided in the example images. Please use the IP
address provided by your installation.From ECL Watch page, click on the Upload/download File link in the menu on
the left side.Upload/downloadOnce you click on the Upload/download file
link, it will take you to the dropzones and files page, where you
can choose to Browse your machine
for a file to upload:DropzonesPress the Browse button to
browse the files on your local machine, select the file to upload
and then click Open
button.The file you selected should appear in the Select a file to upload field.Press on Upload Now to
complete the file upload.Uploading files with a Secure Copy ClientTo upload a large file for processing to your virtual machine,
you will need a tool that supports the secure copy protocol. In this
section, we discuss using WinSCP. There are other tools available, but
the steps are similar.Open the WinSCP tool, and login to your Virtual Machine's
IP address using the username and password given.Login ID:hpccdemoPassword:hpccdemoOnce logged in, it should, navigate automatically to the
landing zone folder. (/var/lib/LexisNexis/mydropzone)Navigate to where your local file is in the left part of
the window.WinSCPSelect the data file to send and copy it to the landing
zone, using drag-and-drop.Data Handling MethodsThere are several ways to spray, despray, or copy data
files:The DFU interface in Ecl WatchThe DFU Plus command line utilitySee the Client Tools manual for
detailsUsing ECL Code and FileServices library functions.See the ECL Language Reference for
details.Data Handling Using ECL WatchLogin to ECL Watch for the environment.The URL is the IP address where the ESP Server is installed
plus the port to which the wssmc service is bound. The default
port is 8010. For example:http://<ESPserverIP>:8010/Click on the Browse Logical
Files hyperlink below DFU Files
in the menu on the left.The Logical Files page displays showing all files with
logical entries in the Dali Server’s Distributed File
System.From this page, you can despray or copy any file.DespraysLocate the file to despray in the list of files, then
select the arrow graphic on the left hand side, then select
Despray from the pop-up
menu.Check the Source
information that is already filled in. Provide Destination
information.Destination-MachineUse the drop-list to select the machine to
despray to. The items in the list are landing zones
defined in the system’s confguration.Destination-IP
AddressThis is prefilled based upon the selected
machine.Destination-Local
PathProvide the complete file path of the
destination including file name and extention.Destination-Network PathThe complete network path of the destination
including file name and extension. (read only)OverwriteCheck this box to overwrite a file with the
same name if it exists.Press the Submit
button.The DFU Workunit
displays.Press the Refresh button
periodically until the status of your request indicates it is
Finished or click on the
View Progress hyperlink to see
a progress indicator.The
Progress window shows a green progress bar indicating the
percentage of completion, as well as other information related
to the operation.If a job fails, information related to the cause of the
failure also displays.Spray FixedClick on the Spray Fixed
hyperlink below DFU in the menu
on the left.The Spray Fixed page
displays.Fill in Source
information (Machine, IP, File
Path, and record length) and the Destination information (Groupand Label
).Source:MachineUse the drop-list to select the machine where
the source file is located.IPIP address of machine from which to spray. This
is automatically completed when you select the
Source
Machine.Local
PathThe file path of source file to spray.Record
LengthThe size of each record.Destination:GroupSelect the name of THOR cluster to spray
to.LabelThe logical name that you choose for the
file.Options:OverwriteCheck this box to overwrite files of the same
name.ReplicateCheck this box to create backup copies of all
file parts exist in the backup directory (by
convention on the secondary drive of the node
following in the cluster).CompressCheck this box to compress the files.Press the Submit
button.The DFU Workunit
displays.Press the Refreshbutton
periodically until the status of your request indicates it is
Finished or click on the
View Progress hyperlink to see
a progress indicator.You can use the Choose File button to look up
files on the selected source machine.Spray CSVClick on the Spray CSV
hyperlink below DFU in the menu
on the left.The Spray CSV page
displays.Fill in Source
information (Machine, IP, File
Path, and record information) and the Destination information (Group and Label).Source:MachineUse the drop-list to select the machine where
the source file is located.IPIP address of machine from which to spray. This
is automatically completed when you select the
Source
Machine.Local
PathThe file path of source file to spray.Max Record
LengthThe length of longest record in the
file.SeparatorThe character used as a separator in the source
file.Line
TerminatorThe character used as a line terminator in the
source file.QuoteThe character used as a quote in the source
file.Destination:GroupSelect the name of THOR cluster to spray
to.LabelThe logical name that you choose for the
file.Options:OverwriteCheck this box to overwrite files of the same
name.ReplicateCheck this box to create backup copies of all
file parts exist in the backup directory (by
convention on the secondary drive of the node
following in the cluster).CompressCheck this box to compress the files.Press the Submit
button.The DFU Workunit
displays.Press the Refresh button
periodically until the status of your request indicates it is
Finishedor click on the
View Progress hyperlink to see
a progress indicator.You can use the Choose File button to look up
files on the selected source machine.Spray XMLClick on the Spray
XMLhyperlink below DFUin the menu on the left.The Spray XML page
displays.Fill in Source
information (Machine, IP, File
Path, and record information)and the Destination information (Group and Label
).Source:MachineUse the drop-list to select the machine where
the source file is located.IPIP address of machine from which to spray. This
is automatically completed when you select the
Source
Machine.Local
PathThe file path of source file to spray.FormatSelect the file format from the
drop-list.Max Record
LengthThe length of longest record in the
file.Row
TagThe record separator tag in the XML
fileDestination:GroupSelect the name of THOR cluster to spray
to.LabelThe logical name that you choose for the
file.Options:OverwriteCheck this box to overwrite files of the same
name.ReplicateCheck this box to create backup copies of all
file parts exist in the backup directory (by
convention on the d: drive of the node following in
the cluster).CompressCheck this box to compress the files.Press the Submit
button.The DFU Workunit
displays.Press the Refresh button
periodically until the status of your request indicates it is
Finished or click on the
View Progress hyperlink to see
a progress indicator.You can use the Choose File button to look up
files on the selected source machine.CopyLocate the file to copy in the list of files, then click
on the arrow icon, then select Copy
from the pop-up menu..Fill in Destination and
Options information.Destination:GroupSelect the name of THOR cluster to copy
to.NoteYou can only choose from THOR clusters within the
current environment.Logical
FileThe logical name for the copied file.File
MaskAutomatically updated based on logical file name
entered.Options:ReplicateCheck this box to create backup copies of all
file parts exist in the backup directory (by convention
on the second drive of the node following in the
cluster).WrapCheck this box to keep the number of parts the
same and wrap if the target cluster is smaller that the
original.OverwriteCheck this box to overwrite files of the same
name.CompressCheck this box to compress the files.Retain Superfile
StructureCheck this box to retain the superfile
structure.Press the Submit
button.The DFU Workunit
displays.Press the Refresh button
periodically until the status of your request indicates it is
Finished or click on the
View Progress hyperlink to see
a progress indicator.Remote CopyRemote Copy allows you to copy data from a Thor cluster
outside your environment to the one in your environment.Click on the Remote Copy
hyperlink below DFU
in the menu on the left.The Copy File page
displays.Fill in Source,
Destination, and Options
information.Source:Logical
FileThe logical file name in the remote
environment.Source
DaliThe Dali Server in the remote environmentSource
UsernameA valid user in the remote environmentSource
PasswordThe password for the user in the remote
environmentDestination:GroupSelect the name of THOR cluster to copy
to.NoteYou can only choose from THOR clusters within the
current environment.Logical
NameThe logical name for the copied file.Options:ReplicateCheck this box to create backup copies of all
file parts exist in the backup directory (by convention
on the d: drive of the node following in the
cluster).WrapCheck this box to keep the number of parts the
same and wrap if the target cluster is smaller that the
original.OverwriteCheck this box to overwrite files of the same
name.CompressCheck this box to compress the files.Retain Superfile
StructureCheck this box to retain the superfile
structure.Press the Submit
button.The DFU Workunit
displays.Press the Refresh button
periodically until the status of your request indicates it is
Finished or click on the
View Progress hyperlink to see
a progress indicator.Modifying a DFU
WorkunitFrom the DFU Workunit page, you can modify and run any DFU
Spray, Despray, Copy, or Remote Copy action using the modified
settings. This allows you to run similar jobs without filling in
similar details again. It also allows you to correct any errors that
might have caused a DFU workunit to fail.From any DFU Workunit page:Press the Modify button.The page for the original action displays.Modify the details, as needed.Press the Submit
button.The DFU Workunit
displays.Press the Refresh button
periodically until the status of your request indicates it is
Finished or click on the
View Progress hyperlink to see
a progress indicator.