My family began a quest this year to discover our roots. More specifically, we wanted to unearth the true identity of our grandfather. Having passed in 1969, he was never a figure in my life beyond a name. Every holiday we’d spin yarns about his true identity, comparing notes or pouring over scraps of paper in the hopes of discovering some clue. He left precious few, and the ones he did weren’t enough to nurture any family tree.

The advent of DNA testing changed that. Thanks to thousands of tiny bits of DNA, we began to piece together a tale of youth, excess, and intrigue with its origins in the sultry swamps of Nineteenth Century New Orleans. Born at the turn of the Twentieth Century, my grandfather grew up in a well to-do family as a bookkeeper — only to succumb to the temptation of finances. To escape his troubles, he boarded a ship for the tropics and never returned to New Orleans.

The DNA of Big Data in the Cloud_Wide

How Big Data tells a story

The family Bible was the original Big Data vault, storing marriages, births, and deaths. If the family tended to this tree with care and dedication, it often grew to immense proportions offering a rich and accurate history. Neglected, it became incomplete and filled with gaps. Worse, it became filled with faulty data and therefore rendered unreliable.

The same technical challenges that made it difficult to keep full and accurate historical records centuries ago remain today, as IT professionals grapple with data integrity, reliability, and visibility concerns.

1. Data Integrity

Keeping your data accurate is a two-fold responsibility. Those who report their data must do so honestly and in good faith. We found many family trees suffered from flawed reasoning, family shame, and bad grammar. Big Data also suffers from human ego, obfuscation, and inconsistent entry. The IT professional is always on the lookout for ways to minimize improper access and assure accurate representation.

2. Data Reliability

Having reliable access to the data is critical. Fault tolerant systems protect vast stores of information while geographically diverse locations not only speed up access, but assure business continuance. During our family search, we were confronted multiple times with a loss of information due to the effects of Hurricane Katrina. One storm wiped out hundreds of years of records in a single flood. While personal loss was great, official records suffered tragic amounts of damage as well.

3. Data Visibility

With large troves of data comes the need for effective methods of access. Today’s public and private cloud services offer access to these data stores in ways users can easily understand. Management of today’s Big Data is what solutions like Hadoop and MongoDB are meant to accomplish. Sites like Experion, Ancestry.com, and others rely on these vast collections to provide seamless access to data in a way users can make sense of.

The Future from the Past

Families carry on, and businesses carry on. In the wake of a natural disaster, businesses must recover. And with mysterious family histories, grandchildren are compelled to dig for details in the data. In an age where we’re concerned that too much data is being stored, it’s nice to know that our future is in the process of being stored securely today.

What are your thoughts about data integrity and creating a smarter business and personal future from historical information?