• June 2017
    M T W T F S S
    « May    

China’s Policing Robot: Cattle Prod Meets Supercomputer

Computerworld (10/31/16) Patrick Thibodeau

Chinese researchers have developed AnBot, an “intelligent security robot” deployed in a Shenzhen airport. The backend of AnBot is linked to China’s Tianhe-2 supercomputer, where it has access to cloud services. AnBot uses these technologies to conduct patrols, recognize threats, and identify people with multiple cameras and facial recognition. The cloud services give the robots petascale processing power, well beyond the processing capabilities in the robot itself. The supercomputer connection enhances the intelligent learning capabilities and human-machine interface of the devices, according to a U.S.-China Economic and Security Review report that focuses on China’s autonomous systems development efforts. The report found the ability of robotics to improve depends on the linking of artificial intelligence (AI), data science, and computing technologies. In addition, the report notes simultaneous development of high-performance computing systems and robotic mechanical manipulation give AI the potential to unleash smarter robotic devices that are capable of learning as well as integrating inputs from large databases. The report says the U.S. government should increase its own efforts in developing manufacturing technology in critical areas, as well as monitoring China’s growing investments in robotics and AI companies in the U.S.


Reducing Big Data Using Ideas From Quantum Theory Makes It Easier to Interpret

Queen Mary, University of London (04/23/15) Will Hoyles

Researchers from Queen Mary University of London (QMUL) and Rovira i Virgili University have developed a new method that simplifies the way big data is represented and processed. Borrowing ideas from quantum theory, the team implemented techniques used to understand the difference between two quantum states. The researchers applied the quantum mechanical method to several large publicly available data sets, and were better able to understand which relationships in a system are similar enough to be considered redundant. The researchers say their method can significantly reduce the amount of information that has to be displayed and analyzed separately and make it easier to understand. Moreover, the approach reduces the computing power needed to process large amounts of multidimensional relational data. “We’ve been trying to find ways of simplifying the way big data is represented and processed and we were inspired by the way that the complex relationships in quantum theory are understood,” says QMUL’s Vincenzo Nicosia. “With so much data being gathered by companies and governments nowadays, we hope this method will make it easier to analyze and make sense of it, as well as reducing computing costs by cutting down the amount of processing required to extract useful information.”


Building Trustworthy Big Data Algorithms

Northwestern University Newscenter (01/29/15) Emily Ayshford

Northwestern University researchers recently tested latent Dirichlet allocation, which is one of the leading big data algorithms for finding related topics within unstructured text, and found it was neither as accurate nor reproducible as a leading topic modeling algorithm should be. Therefore, the researchers developed a new topic modeling algorithm they say has shown very high accuracy and reproducibility during tests. The algorithm, called TopicMapping, begins by preprocessing data to replace words with their stem. It then builds a network of connecting words and identifies a “community” of related words. The researchers found TopicMapping was able to perfectly separate the documents according to language and was able to reproduce its results. Northwestern professor Luis Amaral says the results show the need for more testing of big data algorithms and more research into making them more accurate and reproducible. “Companies that make products must show that their products work,” Amaral says. “They must be certified. There is no such case for algorithms. We have a lot of uninformed consumers of big data algorithms that are using tools that haven’t been tested for reproducibility and accuracy.”


Stanford Researchers Use Big Data to Identify Patients at Risk of High-Cholesterol Disorder

Stanford University (01/29/15) Tracie White

Stanford University researchers have launched a project designed to identify hospital patients who may have a genetic disease that causes a deadly buildup of cholesterol in their arteries. The project uses big data and software that can learn to recognize patterns in electronic medical records and identify patients at risk of familial hypercholesterolemia (FH), which often goes undiagnosed until a heart attack strikes. The project is part of a larger initiative called Flag, Identify, Network, Deliver FH, which aims to use innovative technologies to identify individuals with the disorder who are undiagnosed, untreated, or undertreated. For the project, researchers will teach a program how to recognize a pattern in the electronic records of Stanford patients diagnosed with FH. The program then will be directed to analyze Stanford patient records for signs of the pattern, and the researchers will report their findings to the patients’ personal physicians, who can encourage screening and therapy. “These techniques have not been widely applied in medicine, but we believe that they offer the potential to transform healthcare, particularly with the increased reliance on electronic health records,” says Stanford professor Joshua Knowles. If the project is successful at Stanford, it will be tested at other academic medical centers.


Evolutionary Approaches to Big-Data Problems

MIT News (01/14/15) Eric Brown

The Massachusetts Institute of Technology’s (MIT) AnyScale Learning For All (ALFA) group investigates a wide range of big data challenges. ALFA focuses on working with raw data that comes directly from the source and then investigates the data with a variety of techniques, most of which involve scalable machine learning and evolutionary computing algorithms. “Machine learning is very useful for retrospectively looking back at the data to help you predict the future,” says ALFA director Una-May O’Reilly. “Evolutionary computation can be used in the same way, and it’s particularly well suited to large-scale problems with very high dimensions.” Within the evolutionary field, O’Reilly has particular interest in genetic programing. “We distribute the genetic programming algorithms over many nodes and then factor the data across the nodes,” she says. The researchers have shown ensemble-based models are more accurate than a single model based on all the data. One of ALFA’s most successful projects has been in developing algorithms to help design wind farms. “You must find out how much wind is required for the site and then acquire the finer detailed information about where the wind is coming from and in what quantities,” O’Reilly says. The researchers also are trying to discover useful information from the growing volume of physiological data collected from medical sensors.


NIH Makes $32 Million in Awards to Mine Big Data

Science Insider (09/10/14) Jocelyn Kaiser; Emily Underwood

The U.S. National Institutes of Health (NIH) recently announced $32 million in new awards that will support research designed to make it easier to analyze and use biological data sets. The awards are part of NIH’s Big Data to Knowledge (BD2K) initiative, which was launched last year to help foster efforts to make it easier for researchers to manipulate and make sense of large data sets such as those found in the study of genomics, proteins, and medical imaging. The awards will grant $2 million to $3 million a year over four years to 11 “centers of excellence” researching everything from modeling cell signaling in cancer to ways of integrating data gathered from wearable sensors worn by health study volunteers. One of the programs is a global brain data-collection effort called ENIGMA, which is studying DNA data to see if genetic causes can be found for psychiatric disorders. Another program being funded through BD2K is a data discovery coordination effort being run by the University of California, San Diego, which is working with eight other institutions to develop methods to make it easier for researchers to search for and use scientific data. NIH plans to commit $656 million to the BD2K initiative by 2020.


Big Data Reaches to the Stratosphere

HPC Wire (04/03/14) Tiffany Trader

A position paper by Berlin Technical University professor Volker Markl developed at the recent Big Data and Extreme-scale Computing workshop emphasizes the goals and challenges of big data analytics. “Today’s existing technologies have reached their limits due to big data requirements, which involve data volume, data rate and heterogeneity, and the complexity of the analysis algorithms, which go beyond relational algebra, employing complex user-defined functions, iterations, and distributed state,” Markl writes. To correct this requires deploying declarative language concepts for big data systems. However, the effort presents several challenges, including designing a programming language specification that does not demand systems programming skills; plotting out programs expressed in this language to a computing platform of their own choosing, and performing them in a scalable fashion. Markl says next-generation big data analytics frameworks such as Stratosphere can enable deeper data analysis. Stratosphere integrates the advantages of MapReduce/Hadoop with programming abstractions in Java and Scala and a high-performance runtime to facilitate massively parallel in-situ data analytics. Markl says Stratosphere is so far the only system for big data analytics featuring a query optimizer for advanced data analysis programs that transcend relational algebra, and the goal is to enable data scientists to concentrate on the main task without spending too much time on instilling scalability.