By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data-driven image seo
    Data Analytics Helps Marketers Substantially Boost Image SEO
    8 Min Read
    construction analytics
    5 Benefits of Analytics to Manage Commercial Construction
    5 Min Read
    benefits of data analytics for financial industry
    Fascinating Changes Data Analytics Brings to Finance
    7 Min Read
    analyzing big data for its quality and value
    Use this Strategic Approach to Maximize Your Data’s Value
    6 Min Read
    data-driven seo for product pages
    6 Tips for Using Data Analytics for Product Page SEO
    11 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Data-Centric Firms Address Athena Shortcomings with Smart Indexing
Share
Notification Show More
Latest News
ai in software development
3 AI-Based Strategies to Develop Software in Uncertain Times
Software
ai in ppc advertising
5 Proven Tips for Utilizing AI with PPC Advertising in 2023
Artificial Intelligence
data-driven image seo
Data Analytics Helps Marketers Substantially Boost Image SEO
Analytics
ai in web design
5 Ways AI Technology Has Disrupted Website Development
Artificial Intelligence
cloud-centric companies using network relocation
Cloud-Centric Companies Discover Benefits & Pitfalls of Network Relocation
Cloud Computing
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data-Centric Firms Address Athena Shortcomings with Smart Indexing
Big Data

Data-Centric Firms Address Athena Shortcomings with Smart Indexing

You have to take some Athena shortcomings into consideration as a data-driven business, so these guidelines will help you out.

Ryan Kh
Last updated: 2022/02/23 at 8:47 PM
Ryan Kh
7 Min Read
dealing with data limitations with athena
Shutterstock Photo License - Billion Photos
SHARE
- Advertisement -

There are a lot of benefits of data scalability. The size and the variety of data that enterprises have to deal with have become more complex and larger.

Contents
AWS Athena and S3Limits of AthenaShared resourcesIndexing capabilitiesPartition limitsHow to improve indexingWrapping up

Traditional relational databases provide certain benefits, but they are not suitable to handle big and various data. That is when data lake products started gaining popularity, and since then, more companies introduced lake solutions as part of their data infrastructure. As the demand for the data solutions increased, cloud companies like AWS also jumped in and began providing managed data lake solutions with AWS Athena and S3. These services have powerful and convenient features. However, they are not perfect for all users and use cases. In this article, we will discuss shortcomings of indexing in Athena and S3 and how we can deal with them.

- Advertisement -

AWS Athena and S3

AWS Athena and S3 are separate services. AWS Athena is a query service that allows users to analyze data in S3 using standard SQL syntax. Athena is serverless and managed by AWS. Athena and other AWS serverless services have a similar pricing structure – it lets you pay only for what you use. S3 is one of the first-generation services of AWS. You can store different types of files and use them like cloud storage. Both combined, you use SQL to query what’s stored in S3.

Limits of Athena

Although Athena has great features and provides cost benefits, as you use it, you will find some limitations of Athena.

More Read

data analytics is essential for website UX design

Advances in Data Analytics Key to Business Website Optimization

How To Cultivate Data-Driven Decision-Making In Your Workplace
Data-Driven Companies Leverage OCR for Optimal Data Quality
Benefits of High-Resolution Lidar Data for Data-Driven Companies
Migration Guidelines for Data-Driven Ecommerce Companies

Shared resources

When you use Athena, the computation resources to run your queries are not something you can control. When you execute an Athena query, a request goes to the shared queue that comes from all Athena users in your region and AWS processes the requested query sequentially. This means when you execute a query in a busy time, you will have to wait longer to get your query processed and result back. Under this environment, you can not guarantee consistent performance, which can have a negative impact on service agreement with your customers.

Indexing capabilities

In traditional relational database engines, users can plan indexing to improve performance. However, Athena does not use indexing by default. When you run a query, Athena goes to the targeted S3 bucket and starts opening each file until it meets the requests of your query. For example, when the data is located at the last file, your query will take longer than when you can find your data from the first scanned file. It might not make much difference when your data size is small. However, when your data is big, this makes a big difference. To mitigate this performance issue, AWS recommends partitioning.

Partition limits

You can improve query performance by partitioning your data. However, partitioning also has limits, and it is not easy to use. You have to carefully decide based on which column you want to partition. When you choose a wrong column, re-partitioning can make you move the entire data into a new bucket location, alter the table to refer to the new bucket location, and then delete the old data.

- Advertisement -

Because Athena uses the data storage that works like a file system, it does not allow you to update or delete at a row or a column level. Alternatively, you can run CTAS (Create Table AS) or INSERT INTO query. However, when you use them, you can only create up to 100 partitions in a destination table. That may sound large enough. Depending on what base column you use for partitioning, that limit can be reached unexpectedly fast.

How to improve indexing

When there is a problem, it becomes an opportunity. Since Athena is one of the most popular data lake query services, many users experience these problems and companies develop solutions to eliminate the inconvenience and performance issues. When it is hard to overcome shortcomings within AWS, people sometimes look outside to find a solution.

For the indexing and partitioning limitations of AWS, users could consider Varada’s big data indexing technology; it automatically indexes columns according to workload demands. Their indexing data breaks data, across any column, into nano blocks and then automatically selects the most efficient index for each nano-block considering data content and structure. In the back-end, their machine-learning optimization tools monitor cluster performance and data usage to detect bottlenecks and query performances. When it finds an optimization opportunity, it automatically applies improvements.

The result is a faster query result and optimized cost. This source shares performance comparisons across different metrics. One noticeable difference is the first experiment. The query was to find a specific ID and between specific time ranges as below.

...
FROM
	demo_trips.trips_data
WHERE
	rider_id = 3380311
AND    t_hour between 7 AND 10

The result showed that Athena took 40.96 seconds and 132.0GB scanned while Varada took 0.57 and 245KB scanned.

- Advertisement -

Wrapping up

The result tells you that depending on your partition, there can be a massive difference. In data engineering, besides partitioning, there are many areas to be taken care of. If engineers have to manage partitioning, it can slow down other important tasks. When you have data lake infrastructure in AWS, relying on a third party solution like Varada is something you can consider.

TAGGED: athena, data-driven business, data-driven organizations
Ryan Kh February 23, 2022
Share this Article
Facebook Twitter Pinterest LinkedIn
Share
By Ryan Kh
Follow:
Ryan Kh is an experienced blogger, digital content & social marketer. Founder of Catalyst For Business and contributor to search giants like Yahoo Finance, MSN. He is passionate about covering topics like big data, business intelligence, startups & entrepreneurship. Email: ryankh14@icloud.com
- Advertisement -

Follow us on Facebook

Latest News

ai in software development
3 AI-Based Strategies to Develop Software in Uncertain Times
Software
ai in ppc advertising
5 Proven Tips for Utilizing AI with PPC Advertising in 2023
Artificial Intelligence
data-driven image seo
Data Analytics Helps Marketers Substantially Boost Image SEO
Analytics
ai in web design
5 Ways AI Technology Has Disrupted Website Development
Artificial Intelligence

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

data analytics is essential for website UX design
Analytics

Advances in Data Analytics Key to Business Website Optimization

7 Min Read
big data can help with the decision-making process of your company
Big Data

How To Cultivate Data-Driven Decision-Making In Your Workplace

8 Min Read
AI helps with the growth of OCR technology
Big Data

Data-Driven Companies Leverage OCR for Optimal Data Quality

8 Min Read
benefits of lidar data for data-driven businesses
Data Collection

Benefits of High-Resolution Lidar Data for Data-Driven Companies

7 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive
ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?