• March 2015
    M T W T F S S
    « Feb   Apr »
     1
    2345678
    9101112131415
    16171819202122
    23242526272829
    3031  

Building Trustworthy Big Data Algorithms

Northwestern University Newscenter (01/29/15) Emily Ayshford

Northwestern University researchers recently tested latent Dirichlet allocation, which is one of the leading big data algorithms for finding related topics within unstructured text, and found it was neither as accurate nor reproducible as a leading topic modeling algorithm should be. Therefore, the researchers developed a new topic modeling algorithm they say has shown very high accuracy and reproducibility during tests. The algorithm, called TopicMapping, begins by preprocessing data to replace words with their stem. It then builds a network of connecting words and identifies a “community” of related words. The researchers found TopicMapping was able to perfectly separate the documents according to language and was able to reproduce its results. Northwestern professor Luis Amaral says the results show the need for more testing of big data algorithms and more research into making them more accurate and reproducible. “Companies that make products must show that their products work,” Amaral says. “They must be certified. There is no such case for algorithms. We have a lot of uninformed consumers of big data algorithms that are using tools that haven’t been tested for reproducibility and accuracy.”

MORE

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: