DataU

intelligent data when YOU need it!

  • Self Service
  • News
    • Data Management
  • Data Acquisition
    • HOT Leads!
    • Email & Mobile Data
    • Business Data
    • Consumer Data
  • Value Add Services
    • Data Validation
    • Data Cleaning
    • Do Not Call Washing Service
  • Privacy
  • Number Removal
  • Contact Us
  • About Us
    • Meet the Team

2015-03-05 By Benjamin

Duplication and why it costs

In this quick post, we’ll touch on an issue that a number of data sets face, and that’s duplication.  A duplicate is essentially the same entity existing more than once within the same data set.  Duplicates are there for a number of reasons.  The main reason for their existence though, is due to multiple data sets being used to create a single repository of information.  This can be from purchased lists, web based data, and data from inbound leads.  If all of this data is combined, there will be leads having the same information, that have been imported from more than one source.  Thus, we have duplication within the database.  A simple enough concept to understand…

So, why do we care about duplicates?  Appearance, perception and minimising wastage.  Those would be three key points of consideration when debating why you care.

What makes us want to ensure that there are no duplicates within the database?  If we have duplicated data we’re creating a situation where we’re doubling, or even tripling, our effort for no further monetary gain.  Not only does this cost us time/money/effort, it’s creating a perception of unprofessionalism.  Perhaps this is the data geek inside me, but if a business that wants me as a customer doesn’t take its data seriously, then how seriously is it going to service me as a customer?

Organisations that care about their data can take a number of proactive steps to remove duplicates from their system.  A number of front end applications will allow for some form of rudimentary de-duplication when importing data.  An example of this would be excluding records based on a phone number, or an address.  Depending upon your application, this can often be sufficient.  Yet for the majority, this is just far too basic.  We want something that will not just handle the easy to find duplicates, but also something that will find those duplicates we could never have hoped to find through normal channels.  We personally use a selection of 3rd party and in-house algorithms to combat these issues.  An example that I’m sure will lend its self to the majority of users out there is this:

Most businesses have a database of Customers and Prospects.  Normally these two data sets are either in distinct tables, systems, or just identified as such by a varied status within the record.  If, as a business owner/manager, you want to generate additional long term revenue, you’ll look at sending out an acquisition campaign to increase your customer base.  This campaign could be something basic, just to entice them to get them onboard.  It’s usually time sensitive in nature, and has enticements embedded into the offer that are purely to acquire additional customers.  In other words, the ROI should an existing customer want this offer, isn’t feasible.  Each mail piece will cost in excess of $1.  The “enticement” has an associated cost, as does the labour involved in either the response or the subsequent follow-up.  You can see here that the costs of sending this offer out to existing customers quickly adds up.  If your database was effectively cleaned and duplicate free, then your exposure is completely negated.

There is a great deal to think about when you come to remove duplicates from your database, but the actual process isn’t complex at all.  Each business scenario is potentially different, which is why when we’re tasked with this, our approach will vary.

So just think about your database, and ask yourself, ‘When was the last time we did a proper audit on the level of duplication within the system?’

Filed Under: Data Management

2015-01-07 By Benjamin

Address validation, it’s not just for Australia Post’s benefit!

Address validation plays a number of roles in the business, foremost ensuring AMAS compliance. Secondary to this, yet equally as important, is its role in effective data management. Through the use of Industry Leading Software, we can “batch” process our databases offline and ensure a more effective and comprehensive approach is given to our data management, without any need for human interaction. These fully automated processes are easily integrated into existing database systems, and provide a wealth of additional information and surety.

Address validation:

• allows us to standardise our data.
• ensures all possible address points have a DPID.
• provides conformity to AMAS standards.
• provides a “leg up” to our de-duplication methods.
• allows us to export the data with multiple address elements.

When we think of address validation, we usually think of an activity that will save money when sending out direct mail, or a process that keeps the address data in our systems “clean”. We do this to ensure that we can append a DPID to as many records as possible, which in turn allows us to apply a barcode to the data, and ultimately save money on Direct Mail.

Our focus today however will be on the added benefits of a fully integrated and automated solution using the Industry Leading Software. Through the use of Industry Leading Software, we’re able to easily integrate components into any existing database solution. This provides a scenario that has the potential to reduce labour components within the day-to-day Data Management. This in turn means that this resource can be diverted to more intense processes, that still require human interaction.

Address validation, utilising Industry Leading Software, allows a data owner/manager to perform a number of tasks outside the scope of those already being used to manage data being sent to Australia Post. A key component of this is a highly effective ability to parse the different address elements.

Address validation and parsing technology comes into its own for a number of reasons, the most prevalent being:

• Because of the advanced nature of the certain parsing engines, you’re able to fix address records, which may not have been possible otherwise.
• By having an effective parsing engine, you can better manipulate and store your data.
• By ensuring that all addresses are in a “standard” format, it will mean that your de-duping algorithms will have one less issue to deal with (think of it as “apples & apples” vs “apples & oranges”)

Properly parsed data looks like this;

What Structured data can look like
Structured Data

You’ll notice in this example that the different address elements have been split into their respective fields.

Once you have your address data validated and normalised, the job of actually managing your data becomes a great deal easier. For a number of cases, you’ll be able to de-dupe directly on this data using basic SQL statements. You’ll be able to match this data to existing data sets, or new data sets as they come into the environment. And when it comes to any type of Migration work, then the old adage of “Rubbish in, Rubbish out”, really hits home.

So, as you’ve seen, all this highlights the opportunities of some Industry Leading Software becoming a fully integrated and automated component of your day-to-day data management practices. It’s something that every business should explore, as the benefits to ensuring compliance within your databases are easily measurable.

Filed Under: Data Management

Copyright © 2019 · DataU Pty Ltd ABN 37 520 672 829