Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: On-Demand Supercomputing
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > IT > Cloud Computing > On-Demand Supercomputing
Big DataBusiness IntelligenceCloud Computing

On-Demand Supercomputing

TheodoreOmtzigt
TheodoreOmtzigt
9 Min Read
SHARE
Except for the folks at Cray, most people are unaware of the unique requirements that set apart supercomputing infrastructure from cloud computing infrastructure. In its simplest form the difference is between latency and capacity. For business intelligence applications such as optimization and logistics many servers are required to solve a single problem, and low latency communication between the servers is instrumental for performance.
Except for the folks at Cray, most people are unaware of the unique requirements that set apart supercomputing infrastructure from cloud computing infrastructure. In its simplest form the difference is between latency and capacity. For business intelligence applications such as optimization and logistics many servers are required to solve a single problem, and low latency communication between the servers is instrumental for performance. The intuition behind this is easy to understand: a modern microprocessor executes 4-5 instructions per 250ps, and thus packet latencies of 10GbE, (between 5-50usec), are roughly equivalent to 100k to 1M processor instructions. If a processor is dependent on the results computed by another processor, it will have to idle till the data is available. Cumulatively, across a couple hundred servers, this can lead to peak performance that is only 1-5% of peak.

Supercomputing applications are defined by these types of tightly connected concurrent processes, putting more emphasis on the performance of the interconnect, in particularly the latency. Running a traditional supercomputing application on an infrastructure designed for elastic applications, such as AWS or Azure, typically yield slow-downs by a factor 50 to 100. Measured in terms of cost, they would cost 50-100 times more to execute on a typical public cloud computing infrastructure.

Most supercomputing applications are associated with very valuable economic activities of the business. As mentioned earlier, production optimization and logistics applications save companies like Exxon Mobil and Fedex billions of dollars per year. Those applications are tightly integrated in the business operation and strategic decision making of these organizations and pay for themselves many times over. However, for the SMB market these supercomputing applications offer great opportunity for revenue growth and margin improvements as well. However, their economic value is attenuated by the revenue stream they optimize; 10% improvement for a $10B revenue stream yields a $1B net benefit, but for a $10M revenue stream the benefit is just a $1M, not enough to compensate for the risk and cost that deploying a supercomputer would require.

Enter On-Demand Supercomputing.

In 2011, we were asked to design, construct, and deploy an On-Demand supercomputing service for a Chinese cloud vendor. The idea was to build an interconnected set of supercomputer centers in China, and offer a multi-tenant on-demand service for high-value, high-touch applications, such as logistics, digital content creation, and engineering design and optimization. The pilot program consisted of a supercomputer center in Beijing and one in Shanghai. The basic building block that was designed was a quad rack, redundant QDR IB fat-tree architecture with blade chassis at the leaves. The architecture was inspired by the observation that for the SMB market, the granularity of deployment would fall in the range of 16 to 32 processors, which would be serviced by a single chassis, keeping all communication traffic local to the chassis. The topology is shown in the following figure:

Redundant QDR IB Network Topology for On-Demand Supercomputing
 
The chassis structure is easy to spot as the clusters of 20 servers at the leaves of the tree. The redundancy of the IB network is also clearly visible by the pairs of connections between all the layers in the tree. The quad configuration is a two rack symmetric setup, one pair holding one side of the redundant IB network/storage/computes. So half the quad can fall away, and the system would still have full connectivity between storage and computes. To lower the cost of the system, storage was designed around IB-based storage servers that plugged into the same infrastructure as the compute nodes. QDR throughput is balanced with PCIe gen2 and thus we were able to deliver ephemeral blades that get their personality from the storage servers and then dynamically connect via iSCSI services to whatever storage volumes they require. This is less expensive than designing a separate NAS storage subsystem, and it gives the infrastructure flexibility to build high-performance storage solutions. It was this system that set a new world record by being the first trillion triple semantic database system leveraging a Lustre file system consisting of 8 storage servers (trillion-triple-semantic-database-record).
 
The provisioning of on-demand supercomputing infrastructure is bare metal, mostly to avoid any of the I/O latency degradation that virtualization injects. Given the symmetry between storage and compute and the performance offered by QDR IB, a network boot mechanism can be used to put any personality on the blades without any impact on performance. The blades have local disk for scratch space, but run their OS and data volumes off the storage servers, thus avoiding the problem of DR of state on the blades.
 
The QDR IB infrastructure was based on Voltair switches and Mellanox HCAs. Intel helped us tune the infrastructure, using their cluster libraries for the processors we were using, and Mellanox was instrumental in getting the IB switches in shape. Over a three week period, we went from 60% efficiency to about 94% efficiency. The full quad has a peak performance of 19.2TFlops and after tuning the infrastructure we were able to consistently deliver 18TFlops of sustained performance.
 
The total cost of the core system was of the order of $3.6M. The On-Demand Supercomputing service offers a full dual socket server with 64GB of memory for about $5/hr, providing a cost-effective service for SMBs interested in leveraging high performance computing. For example, a digital content creation firm in Beijing leveraged about 100 servers as burst capacity for post-production. Their monthly cost to leverage a state of the art supercomputer was less than $20k per month. Similarly, a material science application was developed by a chemical manufacturer to study epitaxial growth. This allowed the manufacturer to optimize the process parameters for a thin-film process that would not have been cost-effective on a cloud infrastructure designed for elastic web applications.
 
The take-away of this project is echoing the findings in the missing middle reports for digital manufacturing (Digital Manufacturing Report). There is tremendous opportunity for SMBs to improve business operations by leveraging the same techniques as their enterprise brethren. But the cost of commercial software for HPC is not consistent with the value provided for SMBs. Furthermore, the IT and operational skills required both to setup and manage a supercomputing infrastructure is beyond the capabilities of most SMBs. On-demand HPC services, as we have demonstrated with the supers in Beijing and Shanghai, can overcome many of these issues. Most importantly, it enables a new level of innovation by domain experts, such as professors and independent consultants, who do have the skills necessary to leverage supercomputing techniques, but up to now have not had access to public supercomputing capability and services.
TAGGED:SMBSupercomputing
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

business intelligence
Business IntelligenceIT

Moving Beyond Common SMB BI Implementation Knowledge Challenges

3 Min Read
Data Monetization
AnalyticsBig Data

How Data Monetization Can Add Value To Your Analytics

6 Min Read

“Tech Savvy” Means “Customer Savvy” for Midsized Companies

4 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?