• October 2015
    M T W T F S S
    « Sep   Nov »
     1234
    567891011
    12131415161718
    19202122232425
    262728293031  

New Programming Approach Seeks to Make Large-Scale Computation More Reliable

UChicago News (IL) (10/07/15) Benjamin Recchie

As computer components become smaller and smaller, packing more transistors into a smaller space than ever before, more errors are likely to crop up in the computations the hardware carries out. This effect is likely to be especially pronounced in high-performance computers, and researchers at the University of Chicago’s Computation Institute are looking for a new method to correct for these errors. Most computers today use a technique called checkpoint restart to periodically save data at any given point mid-calculation, so if an error occurs, the computer can revert to an earlier state in the calculation without having to completely start over. However, even this method is likely to be insufficient as complexity grows and errors increase. Andrew Chien and his colleagues at the Computation Institute are experimenting with a new technique called Global View Resilience, which enables applications to not only save work that is underway, but also to offer flexible error-check and self-repair while in operation. Chien’s group has tested this method on the Midway supercomputing cluster located on the university’s Hyde Park campus. Chien says the new method has proven very reliable at compensating for errors introduced deliberately by the researchers.

MORE

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: