AI & Data Cleansing

Written by Riley Duncan

Just a guy being a dude. Lead Wordpress developer for BIF Technologies Website.

October 01, 2019

The future of our development efforts at BIF Technologies depends significantly on the ability to transition from passive BI to active BI. Our vision for BrainJack™ is enabling it to deliver business intelligence directly to users. This initiative requires BrainJack™ to have the ability to deliver dashboards and alerts. Based on either a schedule or on certain indicators. For example, a user may have a dashboard that provides information on CPU performance and memory utilization. The user could then indicate that an alert should be sent if CPU performance falls below a certain threshold.

This would be a basic implementation of Active BI, a business intelligence tool that delivers reports and data, only when significant to the user. The tool’s knowledge, however, is not truly its own knowledge, but rather the user’s knowledge expressed as a set of conditions. These conditions are inherently limited by the user’s own ability to interpret and anticipate the significance of data. A user cannot realistically leverage the same computational understanding that a machine can, but human insight has historically been necessary for decision making, as it pertains to interpreting data.

With that in mind, Artificial Intelligence becomes significant for the future development of tools, such as BrainJack™. Intrinsically, AI involves producing machines that can emulate the characteristics of human cognitive abilities. This means an AI engine could enable a system to learn to make decisions independently from human intervention or direction all through Machine Learning. One caveat is that massive amounts of data are required for a machine to learn to make decisions in a consistent and accurate manner.

An example would be AlphaGo Zero, a computer program that played the Chinese game of Go. This was accomplished over the course of four hours, as AlphaGo Zero taught itself to play through trial and error.

This showcases the level of experience a machine can acquire making decisions and learning from data directly correlates with the quality of the decisions and predictions that a machine can be expected to make.
It is important to note that the intelligence developed through Machine Learning is only as good as the data that informs it. It comes as no surprise, then, that up to 80% of data analysts’ time is spent on a process called data cleansing, according to Forrester Research.

Data cleansing is the process of ensuring high-quality data by detecting, removing, and correcting corrupt or inaccurate data from a data set. The implementation of this process can vary greatly in practice. It can involve taking a series of random samples from the data set to check for anomalies, This includes, inconsistent summary statistics for numerical data, data that does not abide by the rules for its expected field, and unexpected null values. Data cleansing can be done manually with tools such as Excel or Google sheets for relatively smaller sets of records (fewer than 1 million) with easily observable trends in data. Scripting languages can be used for more computationally demanding data sets that require more detailed documentation of data cleansing progress and more complex logical patterns for correcting and processing data. Various tools are available to companies looking to streamline the data cleansing process, such as Trifacta and Hi-Tech BPO Services.

Although data cleansing is not often discussed, it is becoming crucial as AI continues to permeate the world of data. According to New Vantage Partners’ annual Executive Survey, 92.7% of executives from companies such as American Express, Capital One, and Verizon report that their companies have launched AI or Big Data initiatives. As a business anticipating expansion into the field of artificial intelligence, it is imperative that we, along with any data-driven services, do not neglect the importance of data Intel.

You May Also Like…

Mind Your Business

Mind Your Business

The business intelligence solutions provided in most applications are passive. Users must come to the software’s...