Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics and truck accident claims
    How Data Analytics Reduces Truck Accidents and Speeds Up Claims
    7 Min Read
    predictive analytics for interior designers
    Interior Designers Boost Profits with Predictive Analytics
    8 Min Read
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Data-Centric Firms Address Athena Shortcomings with Smart Indexing
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data-Centric Firms Address Athena Shortcomings with Smart Indexing
Big DataExclusive

Data-Centric Firms Address Athena Shortcomings with Smart Indexing

You have to take some Athena shortcomings into consideration as a data-driven business, so these guidelines will help you out.

Ryan Kh
Ryan Kh
7 Min Read
dealing with data limitations with athena
Shutterstock Photo License - Billion Photos
SHARE

There are a lot of benefits of data scalability. The size and the variety of data that enterprises have to deal with have become more complex and larger.

Contents
AWS Athena and S3Limits of AthenaShared resourcesIndexing capabilitiesPartition limitsHow to improve indexingWrapping up

Traditional relational databases provide certain benefits, but they are not suitable to handle big and various data. That is when data lake products started gaining popularity, and since then, more companies introduced lake solutions as part of their data infrastructure. As the demand for the data solutions increased, cloud companies like AWS also jumped in and began providing managed data lake solutions with AWS Athena and S3. These services have powerful and convenient features. However, they are not perfect for all users and use cases. In this article, we will discuss shortcomings of indexing in Athena and S3 and how we can deal with them.

AWS Athena and S3

AWS Athena and S3 are separate services. AWS Athena is a query service that allows users to analyze data in S3 using standard SQL syntax. Athena is serverless and managed by AWS. Athena and other AWS serverless services have a similar pricing structure – it lets you pay only for what you use. S3 is one of the first-generation services of AWS. You can store different types of files and use them like cloud storage. Both combined, you use SQL to query what’s stored in S3.

Limits of Athena

Although Athena has great features and provides cost benefits, as you use it, you will find some limitations of Athena.

More Read

call center support
Challenges Data Analytics Can Solve in the Call Center Industry
Marketers Leading the Charge to Unlock Value from Big Data
SAP BusinessObjects BI and EIM 4.0 Make a BIG Splash
Predictive Analytics Presents: A Typical Day in 2020
New to Data Quality Analysis Try These “9+1 Things To Do”!

Shared resources

When you use Athena, the computation resources to run your queries are not something you can control. When you execute an Athena query, a request goes to the shared queue that comes from all Athena users in your region and AWS processes the requested query sequentially. This means when you execute a query in a busy time, you will have to wait longer to get your query processed and result back. Under this environment, you can not guarantee consistent performance, which can have a negative impact on service agreement with your customers.

Indexing capabilities

In traditional relational database engines, users can plan indexing to improve performance. However, Athena does not use indexing by default. When you run a query, Athena goes to the targeted S3 bucket and starts opening each file until it meets the requests of your query. For example, when the data is located at the last file, your query will take longer than when you can find your data from the first scanned file. It might not make much difference when your data size is small. However, when your data is big, this makes a big difference. To mitigate this performance issue, AWS recommends partitioning.

Partition limits

You can improve query performance by partitioning your data. However, partitioning also has limits, and it is not easy to use. You have to carefully decide based on which column you want to partition. When you choose a wrong column, re-partitioning can make you move the entire data into a new bucket location, alter the table to refer to the new bucket location, and then delete the old data.

Because Athena uses the data storage that works like a file system, it does not allow you to update or delete at a row or a column level. Alternatively, you can run CTAS (Create Table AS) or INSERT INTO query. However, when you use them, you can only create up to 100 partitions in a destination table. That may sound large enough. Depending on what base column you use for partitioning, that limit can be reached unexpectedly fast.

How to improve indexing

When there is a problem, it becomes an opportunity. Since Athena is one of the most popular data lake query services, many users experience these problems and companies develop solutions to eliminate the inconvenience and performance issues. When it is hard to overcome shortcomings within AWS, people sometimes look outside to find a solution.

For the indexing and partitioning limitations of AWS, users could consider Varada’s big data indexing technology; it automatically indexes columns according to workload demands. Their indexing data breaks data, across any column, into nano blocks and then automatically selects the most efficient index for each nano-block considering data content and structure. In the back-end, their machine-learning optimization tools monitor cluster performance and data usage to detect bottlenecks and query performances. When it finds an optimization opportunity, it automatically applies improvements.

The result is a faster query result and optimized cost. This source shares performance comparisons across different metrics. One noticeable difference is the first experiment. The query was to find a specific ID and between specific time ranges as below.

...
FROM
	demo_trips.trips_data
WHERE
	rider_id = 3380311
AND    t_hour between 7 AND 10

The result showed that Athena took 40.96 seconds and 132.0GB scanned while Varada took 0.57 and 245KB scanned.

Wrapping up

The result tells you that depending on your partition, there can be a massive difference. In data engineering, besides partitioning, there are many areas to be taken care of. If engineers have to manage partitioning, it can slow down other important tasks. When you have data lake infrastructure in AWS, relying on a third party solution like Varada is something you can consider.

TAGGED:athenadata-driven businessdata-driven organizations
Share This Article
Facebook Pinterest LinkedIn
Share
ByRyan Kh
Follow:
Ryan Kh is an experienced blogger, digital content & social marketer. Founder of Catalyst For Business and contributor to search giants like Yahoo Finance, MSN. He is passionate about covering topics like big data, business intelligence, startups & entrepreneurship. Email: ryankh14@icloud.com

Follow us on Facebook

Latest News

data analytics and truck accident claims
How Data Analytics Reduces Truck Accidents and Speeds Up Claims
Analytics Big Data Exclusive
predictive analytics for interior designers
Interior Designers Boost Profits with Predictive Analytics
Analytics Exclusive Predictive Analytics
big data and cybercrime
Stopping Lateral Movement in a Data-Heavy, Edge-First World
Big Data Exclusive
AI and data mining
What the Rise of AI Web Scrapers Means for Data Teams
Artificial Intelligence Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

big data and social media analytics
Analytics

Data Analytics and Social Media: Twin Pillars in the Evolution of Business

8 Min Read
UX design
Big Data

The Massive Importance of UX Design for Data-Driven Online Businesses

13 Min Read
backing up oracle data
Data Management

Ways for Data-Driven Organizations to Backup Oracle With 3rd Party Tools

5 Min Read
data-driven business
Big Data

How to Start, Nurture, and Grow a Business with Big Data

7 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence
giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?