Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    predictive analytics risk management
    How Predictive Analytics Is Redefining Risk Management Across Industries
    7 Min Read
    data analytics and gold trading
    Data Analytics and the New Era of Gold Trading
    9 Min Read
    composable analytics
    How Composable Analytics Unlocks Modular Agility for Data Teams
    9 Min Read
    data mining to find the right poly bag makers
    Using Data Analytics to Choose the Best Poly Mailer Bags
    12 Min Read
    data analytics for pharmacy trends
    How Data Analytics Is Tracking Trends in the Pharmacy Industry
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Data-Centric Firms Address Athena Shortcomings with Smart Indexing
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data-Centric Firms Address Athena Shortcomings with Smart Indexing
Big DataExclusive

Data-Centric Firms Address Athena Shortcomings with Smart Indexing

You have to take some Athena shortcomings into consideration as a data-driven business, so these guidelines will help you out.

Ryan Kh
Ryan Kh
7 Min Read
dealing with data limitations with athena
Shutterstock Photo License - Billion Photos
SHARE

There are a lot of benefits of data scalability. The size and the variety of data that enterprises have to deal with have become more complex and larger.

Contents
  • AWS Athena and S3
    • Limits of Athena
    • Shared resources
    • Indexing capabilities
    • Partition limits
    • How to improve indexing
    • Wrapping up

Traditional relational databases provide certain benefits, but they are not suitable to handle big and various data. That is when data lake products started gaining popularity, and since then, more companies introduced lake solutions as part of their data infrastructure. As the demand for the data solutions increased, cloud companies like AWS also jumped in and began providing managed data lake solutions with AWS Athena and S3. These services have powerful and convenient features. However, they are not perfect for all users and use cases. In this article, we will discuss shortcomings of indexing in Athena and S3 and how we can deal with them.

AWS Athena and S3

AWS Athena and S3 are separate services. AWS Athena is a query service that allows users to analyze data in S3 using standard SQL syntax. Athena is serverless and managed by AWS. Athena and other AWS serverless services have a similar pricing structure – it lets you pay only for what you use. S3 is one of the first-generation services of AWS. You can store different types of files and use them like cloud storage. Both combined, you use SQL to query what’s stored in S3.

Limits of Athena

Although Athena has great features and provides cost benefits, as you use it, you will find some limitations of Athena.

More Read

how will AI change gaming industry
How Will AI Change Gaming Industry In The Next 20 Years?
Brands Are Developing AI-Driven Pre-Webinar Sales Strategies
Why It’s Time To Get Smart About Recovery
How BI and Data Analytics Professionals Used Twitter in November
Big Data Proliferation and the Growth of Robotic Process Automation Certification

Shared resources

When you use Athena, the computation resources to run your queries are not something you can control. When you execute an Athena query, a request goes to the shared queue that comes from all Athena users in your region and AWS processes the requested query sequentially. This means when you execute a query in a busy time, you will have to wait longer to get your query processed and result back. Under this environment, you can not guarantee consistent performance, which can have a negative impact on service agreement with your customers.

Indexing capabilities

In traditional relational database engines, users can plan indexing to improve performance. However, Athena does not use indexing by default. When you run a query, Athena goes to the targeted S3 bucket and starts opening each file until it meets the requests of your query. For example, when the data is located at the last file, your query will take longer than when you can find your data from the first scanned file. It might not make much difference when your data size is small. However, when your data is big, this makes a big difference. To mitigate this performance issue, AWS recommends partitioning.

Partition limits

You can improve query performance by partitioning your data. However, partitioning also has limits, and it is not easy to use. You have to carefully decide based on which column you want to partition. When you choose a wrong column, re-partitioning can make you move the entire data into a new bucket location, alter the table to refer to the new bucket location, and then delete the old data.

Because Athena uses the data storage that works like a file system, it does not allow you to update or delete at a row or a column level. Alternatively, you can run CTAS (Create Table AS) or INSERT INTO query. However, when you use them, you can only create up to 100 partitions in a destination table. That may sound large enough. Depending on what base column you use for partitioning, that limit can be reached unexpectedly fast.

How to improve indexing

When there is a problem, it becomes an opportunity. Since Athena is one of the most popular data lake query services, many users experience these problems and companies develop solutions to eliminate the inconvenience and performance issues. When it is hard to overcome shortcomings within AWS, people sometimes look outside to find a solution.

For the indexing and partitioning limitations of AWS, users could consider Varada’s big data indexing technology; it automatically indexes columns according to workload demands. Their indexing data breaks data, across any column, into nano blocks and then automatically selects the most efficient index for each nano-block considering data content and structure. In the back-end, their machine-learning optimization tools monitor cluster performance and data usage to detect bottlenecks and query performances. When it finds an optimization opportunity, it automatically applies improvements.

The result is a faster query result and optimized cost. This source shares performance comparisons across different metrics. One noticeable difference is the first experiment. The query was to find a specific ID and between specific time ranges as below.

...
FROM
	demo_trips.trips_data
WHERE
	rider_id = 3380311
AND    t_hour between 7 AND 10

The result showed that Athena took 40.96 seconds and 132.0GB scanned while Varada took 0.57 and 245KB scanned.

Wrapping up

The result tells you that depending on your partition, there can be a massive difference. In data engineering, besides partitioning, there are many areas to be taken care of. If engineers have to manage partitioning, it can slow down other important tasks. When you have data lake infrastructure in AWS, relying on a third party solution like Varada is something you can consider.

TAGGED:athenadata-driven businessdata-driven organizations
Share This Article
Facebook Pinterest LinkedIn
Share
ByRyan Kh
Follow:
Ryan Kh is an experienced blogger, digital content & social marketer. Founder of Catalyst For Business and contributor to search giants like Yahoo Finance, MSN. He is passionate about covering topics like big data, business intelligence, startups & entrepreneurship. Email: ryankh14@icloud.com

Follow us on Facebook

Latest News

street address database
Why Data-Driven Companies Rely on Accurate Street Address Databases
Big Data Exclusive
predictive analytics risk management
How Predictive Analytics Is Redefining Risk Management Across Industries
Analytics Exclusive Predictive Analytics
data analytics and gold trading
Data Analytics and the New Era of Gold Trading
Analytics Big Data Exclusive
student learning AI
Advanced Degrees Still Matter in an AI-Driven Job Market
Artificial Intelligence Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

data analytics helps with linkedin ads targeting
Analytics

6 Ways Data Analytics Can Improve Targeting with LinkedIn Ads

10 Min Read
ways to use big data in business
Big Data

4 Ways To Grow Your Business With Big Data

7 Min Read
prevent spam
Big Data

Why Spam Prevention is Crucial for for Data-Driven Business

12 Min Read
benefits of data-driven outsourcing for businesses
Big DataExclusive

Why Data-Driven Outsourcing Can Be a Massive Win for Your Business

10 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots
ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?