Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    composable analytics
    How Composable Analytics Unlocks Modular Agility for Data Teams
    9 Min Read
    data mining to find the right poly bag makers
    Using Data Analytics to Choose the Best Poly Mailer Bags
    12 Min Read
    data analytics for pharmacy trends
    How Data Analytics Is Tracking Trends in the Pharmacy Industry
    5 Min Read
    car expense data analytics
    Data Analytics for Smarter Vehicle Expense Management
    10 Min Read
    image fx (60)
    Data Analytics Driving the Modern E-commerce Warehouse
    13 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: esProc Improves Text Processing: Fetching Data from a Batch of Files
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Uncategorized > esProc Improves Text Processing: Fetching Data from a Batch of Files
Uncategorized

esProc Improves Text Processing: Fetching Data from a Batch of Files

raqsoft
raqsoft
6 Min Read
SHARE
Sometimes we need to fetch certain data from multiple files of a multi-level directory during text processing. The operation is too complicated to be well performed at the command line. Though it can be realized in high-level languages, the code is difficult to write; and the involvement of big files will increase the difficulty. esProc, however, can import big files with cursors and call the script recursively and thus can process the data fetching in batch. The following example will show its way of doing it.
Sometimes we need to fetch certain data from multiple files of a multi-level directory during text processing. The operation is too complicated to be well performed at the command line. Though it can be realized in high-level languages, the code is difficult to write; and the involvement of big files will increase the difficulty. esProc, however, can import big files with cursors and call the script recursively and thus can process the data fetching in batch. The following example will show its way of doing it.
 

A directory – “D:\files” – has subdirectories of multiple levels. Each subdirectory has many files of text format. We are asked to fetch a specified line (say the second line) from each of these files and write them into a new file – result.txt. Part of the structure of D:\files is as follows:


esProc code for doing this: 


First define a parameter, path, and set its initial value as “D:\files” so as to get data from this directory, as shown below: 


A1=directory@p(path)


directory function is used to get the file list in the root directory of the parameter, path. @p option means file names should be presented with full path. The following shows some of the results: 

A2=A1.(file(~).cursor@s()) . This line of code opens A1’s files respectively in the form of cursors. A1.(…) means processing A1’s members in proper order; “~” represents the current member; filefunction is used to create a file object and cursor function will return a cursor object according to the file object.
 

Tab is used as the default separatorin cursor function. Default column names are 1,_2…_n. @s function means ignoring the separator and importing the file content as the strings in a single column with _1 being the column name. Note that the code only creates the cursor objects but doesn’t fetch data. The data fetching will be started by the use of fetchfunction. The results of A2 are as follows: 


A3=A2.((~.skip(1),~.fetch@x(1)))This line of code fetches the second row from A2’s each file cursor. A2.(…) means computing A2’s cursors one by one. (~.skip(1),~.fetch@x(1)) means computing the expression in the parentheses in order and returning the last computed result. ~.skip(1) means skipping a row. ~.fetch@x(1) means fetching the row at the current position (i.e. the second row) and closing the cursor. @x means closing the cursor automatically after the data are fetched. ~.fetch@x(1)represents the result which the parentheses operator will return.
 
skip function skips multiple rows. You can determine how many rows need to be skipped through a parameter. fetch function fetches multiple rows. Fetch two rows starting from the 10th row, for example, the code is ~.skip(10),fetch@x(2).
 

The following shows some of the results of A3: 


A4=A3.union()This line of code unions the results in A4 together. union function is used to realize the union operation, removing the duplicate data at the same time. For example, the code for computing the union of two sets: [1,2] and [2,3] is [1,2],[2,3]].union() and the result is [1,2,3]. If duplicate data are wanted, conjfunction (for concatenation) should be used. Some of the results of A4 are as follows: 


A5=file(“d:\\result.txt”).export@a(A4)This line of code exports the results of A4 to result.txt. export function is used to write data to a file. @a option means appending.

At this point, all data have been fetched as required from the current directory. The rest of the work is to fetch the subdirectories of the current directory and to call this script recursively.
 
A6=directory@dp(path)directory function is used to fetch all the subdirectories from the current directory. One of the options, d, means fetching the subdirectory names and the other one, p, means fetching the full paths. Thus A6 gets the subdirectories from D:\files: 


A7=A6.(call(“c:\\readfile.dfx”,~))This line of code deals with A6’s members (the subdirectories). The operation is to call the esProc script – c:\\readfile.dfx, and makes the current member (one of the subdirectories) as the input parameter. Note that readfile.dfx is the name of this script.
 

Through the recursive call in A7, esProc will fetch data from a batch of files of the multilevel directory of D:\files. You can see the final result in result.txt: 


TAGGED:codingesProc
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

student learning AI
Advanced Degrees Still Matter in an AI-Driven Job Market
Artificial Intelligence Exclusive
mobile device farm
How Mobile Device Farms Strengthen Big Data Workflows
Big Data Exclusive
composable analytics
How Composable Analytics Unlocks Modular Agility for Data Teams
Analytics Big Data Exclusive
fintech startups
Why Fintech Start-Ups Struggle To Secure The Funding They Need
Infographic News

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Vector Computing, Who Is More Powerful, R Language or esProc?

6 Min Read
which JS framework is best
Big DataExclusiveProgramming

Which JS Framework Is Best For Big Data Development?

6 Min Read
programming languages to learn
ExclusiveProgramming

Top Programming Languages For Data Developers In 2019

8 Min Read
benefits of no-code platforms for data science
Data Science

5 Reasons No-Code Platforms Are the Future of Data Science and AI

9 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive
ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?