Address, Postcode and Bank solutions

- Refiner Manual

OPENING YOUR DATABASE

From the Refiner 'File' menu, click on 'Open Database' . Locate and select the database to be cleaned and a pop-up menu will then allow selection of the specific database table containing the address data to be cleaned.

Supported Database Formats

Refiner can operate with Microsoft Access databases (Access 2 to Access 2002), xBase files (dBase, Foxpro, Clipper, etc), and Paradox tables.  It also has native support for delimited (e.g. quoted comma separated) and fixed record length ASCII Text files.  

Most other databases can be accessed through Refiner's ODBC support, providing a DSN for that data source has been set-up on your system.

Opening Non-ODBC Databases

Use the 'File' Menu 'Open Database' (Ctrl+D) option to open non-ODBC databases.  Select the database file you wish to work on and Refiner pops up a menu. From this menu choose whether to work on a straight table of data (an Access database may offer several tables within the database file), or whether to use an SQL query.  If you select to use SQL Query, a simple SQL Query is provided for access to the data in the first table. This is a starting point and the Query can be modified to suit your needs.

At the end of the SQL Database Query used to open the database, Refiner automatically adds 'ORDER BY Postcode', (where 'Postcode' is the name of your postcode field if Refiner can determine it). This causes records to be processed in postcode order which, due to the way caching works, improves Refiner's performance.  To process the database in record order simply remove this part of the query and click 'Query' to update.  If 'ORDER BY Postcode is not added, or the field name is incorrect, just add or correct it to enhance performance.

Opening Delimited Text Files

Refiner includes very flexible delimited text file support for working with data in text format.  This is particularly useful if your data format is not supported, as you can convert a copy to one which is, so that it can be processed. Most databases can export to Comma Separated Text (sometimes known as CSV or ASCII).  As delimited text has it's limitations, If your database supports exporting fixed format text you may prefer to use that instead (see next section). Refiner can only write back to fixed record length files it is reading data from - so if you process a delimited text file,  a new file is created with the file suffix '.new', e.g. processing 'Address.Txt' would create 'Address.New'. It is helpful if the first 'Record' in your text file has the names of the fields in each field - like 'Street', 'Town'.  Refiner can then display these during attachment. SQL cannot be used to work with Text Files - so queries and sorting should be carried out before processing through Refiner.

When you open a delimited text file you will be presented with the following options to specify the format of that text file:

The delimiter is the character which separates each field in the text file, e.g. a comma is the most common here.  The qualifier is the character used to help prevent field cross-over as mentioned previously.  If there is no qualifier present you can select None from the drop down list.  Address list data is especially susceptible to spurious delimiter characters - like quote marks or commas. House names entered as "The Willows" or address lines like Rose Cottage, High Street, can mean that any data process thinks there are more fields than there really are. This is a problem.  It's best to use Quoted Comma Delimited text which is less sensitive to errors.  If there are problems in your data export Refiner protects against the address record cross-over that can occur in these circumstances - and limits damage, usually just to one record .  It remains far better if these characters are removed before creation of your processing file.  The use first record for field names option specifies that the first record in the text file contains the field names - it is recommended you include these when exporting your data to make identifying and mapping fields much easier.  

Opening Fixed Record Length Text Files

Fixed record length text files are a very efficient file format and all Refiner features are supported when using them.  If your database is not directly supported, or you want maximum speed and flexibility from Refiner this may be the best format for you.  The only format which is as efficient as Fixed Record length text files for Refiner to process is dBase files, Paradox tables come a very close third.

When you open a fixed record length text file you will be presented with the following options to specify the format of that text file:

Refiner will attempt to automatically detect the format of the text file, but if you need to adjust this because there is no record separator so the record length cannot be determined or you need to tweak field positions then you can enter these directly here.  You can also review some of the records in the file on the right using the navigation controls above the Cancel button to ensure the format is correctly specified.

Opening ODBC Databases

Use the 'File' Menu 'Open ODBC Connection' option to open a connection to an ODBC database that you have configured a DSN for on your system.  The screen will change to allow you to select a DSN, and type your user identifier (uid) and password (pwd) if required and any additional parameters you may require to connect to that database through ODBC.  You can then enter the SQL Query to access the data to be cleaned.  Then click the 'Query' button to connect to your database.

To improve performance, add  'ORDER BY Postcode', (where Postcode is the name of your postcode field) at the end of the SQL Query when opening Microsoft Access or ODBC databases.  If the postcode is held in two separate 'outcode' and 'incode'  fields this can be specified by using 'ORDER BY Outcode,Incode' etc.

Using SQL

Using SQL to query a database is a powerful, flexible way to control the cleaning process. To use it means learning some simple SQL statements. SQL can be used to restrict cleaning to just parts of the database (perhaps where postcodes are missing), to re-start a cleaning session where it was left-off - or as shown to speed up processing by presenting data in postcode order - which is far more efficient on disk access.

The precise form of these examples will vary with the name of the database and the name of the fields within it, but should be easy to translate:

Select * from [Addresses]

       Selects all fields from a table called 'Addresses'. Every address record will be processed.

Select * from [Addresses] where Postcode is null

       As above - but only processes address records with a blank postcode field.

Select * from [Addresses] where Len(Postcode) < 6

       As above - but only processes records where the data in the postcode field is too short - e.g. B1 instead of B1 1AA.

Select * from [Addresses] where Postcode is not null order by Postcode

       As above - but only processes records with postcode field populated and processes the database in postcode order. (this can significantly improve processing speed!).

For full details of supported SQL statements please refer to your database engine's own documentation.

Use Progress Bar

The Progress Bar on the Process Tab gives an ongoing visual indication of how much of a database has been cleansed.  Refiner counts every record when a database is opened or re-queried so that an accurate Progress Bar can be displayed. This database scan takes time and may be undesirable.  The progress bar can be switched on / off from the 'Progress Bar' option on the ' Options' menu.

Any Questions? Call now on 01624 811711
  Refiner

  Manual Contents

  Appendix