Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    big data analytics in transporation
    Turning Data Into Decisions: How Analytics Improves Transportation Strategy
    3 Min Read
    sales and data analytics
    How Data Analytics Improves Lead Management and Sales Results
    9 Min Read
    data analytics and truck accident claims
    How Data Analytics Reduces Truck Accidents and Speeds Up Claims
    7 Min Read
    predictive analytics for interior designers
    Interior Designers Boost Profits with Predictive Analytics
    8 Min Read
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: esProc Improves Text Processing: Fetching Data from a Batch of Files
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Uncategorized > esProc Improves Text Processing: Fetching Data from a Batch of Files
Uncategorized

esProc Improves Text Processing: Fetching Data from a Batch of Files

raqsoft
raqsoft
6 Min Read
SHARE
Sometimes we need to fetch certain data from multiple files of a multi-level directory during text processing. The operation is too complicated to be well performed at the command line. Though it can be realized in high-level languages, the code is difficult to write; and the involvement of big files will increase the difficulty. esProc, however, can import big files with cursors and call the script recursively and thus can process the data fetching in batch. The following example will show its way of doing it.
Sometimes we need to fetch certain data from multiple files of a multi-level directory during text processing. The operation is too complicated to be well performed at the command line. Though it can be realized in high-level languages, the code is difficult to write; and the involvement of big files will increase the difficulty. esProc, however, can import big files with cursors and call the script recursively and thus can process the data fetching in batch. The following example will show its way of doing it.
 

A directory – “D:\files” – has subdirectories of multiple levels. Each subdirectory has many files of text format. We are asked to fetch a specified line (say the second line) from each of these files and write them into a new file – result.txt. Part of the structure of D:\files is as follows:


esProc code for doing this: 


First define a parameter, path, and set its initial value as “D:\files” so as to get data from this directory, as shown below: 


A1=directory@p(path)


directory function is used to get the file list in the root directory of the parameter, path. @p option means file names should be presented with full path. The following shows some of the results: 

A2=A1.(file(~).cursor@s()) . This line of code opens A1’s files respectively in the form of cursors. A1.(…) means processing A1’s members in proper order; “~” represents the current member; filefunction is used to create a file object and cursor function will return a cursor object according to the file object.
 

Tab is used as the default separatorin cursor function. Default column names are 1,_2…_n. @s function means ignoring the separator and importing the file content as the strings in a single column with _1 being the column name. Note that the code only creates the cursor objects but doesn’t fetch data. The data fetching will be started by the use of fetchfunction. The results of A2 are as follows: 


A3=A2.((~.skip(1),~.fetch@x(1)))This line of code fetches the second row from A2’s each file cursor. A2.(…) means computing A2’s cursors one by one. (~.skip(1),~.fetch@x(1)) means computing the expression in the parentheses in order and returning the last computed result. ~.skip(1) means skipping a row. ~.fetch@x(1) means fetching the row at the current position (i.e. the second row) and closing the cursor. @x means closing the cursor automatically after the data are fetched. ~.fetch@x(1)represents the result which the parentheses operator will return.
 
skip function skips multiple rows. You can determine how many rows need to be skipped through a parameter. fetch function fetches multiple rows. Fetch two rows starting from the 10th row, for example, the code is ~.skip(10),fetch@x(2).
 

The following shows some of the results of A3: 


A4=A3.union()This line of code unions the results in A4 together. union function is used to realize the union operation, removing the duplicate data at the same time. For example, the code for computing the union of two sets: [1,2] and [2,3] is [1,2],[2,3]].union() and the result is [1,2,3]. If duplicate data are wanted, conjfunction (for concatenation) should be used. Some of the results of A4 are as follows: 


A5=file(“d:\\result.txt”).export@a(A4)This line of code exports the results of A4 to result.txt. export function is used to write data to a file. @a option means appending.

At this point, all data have been fetched as required from the current directory. The rest of the work is to fetch the subdirectories of the current directory and to call this script recursively.
 
A6=directory@dp(path)directory function is used to fetch all the subdirectories from the current directory. One of the options, d, means fetching the subdirectory names and the other one, p, means fetching the full paths. Thus A6 gets the subdirectories from D:\files: 


A7=A6.(call(“c:\\readfile.dfx”,~))This line of code deals with A6’s members (the subdirectories). The operation is to call the esProc script – c:\\readfile.dfx, and makes the current member (one of the subdirectories) as the input parameter. Note that readfile.dfx is the name of this script.
 

Through the recursive call in A7, esProc will fetch data from a batch of files of the multilevel directory of D:\files. You can see the final result in result.txt: 


TAGGED:codingesProc
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

AI role in medical industry
The Role Of AI In Transforming Medical Manufacturing
Artificial Intelligence Exclusive
b2b sales
Unseen Barriers: Identifying Bottlenecks In B2B Sales
Business Rules Exclusive Infographic
data intelligence in healthcare
How Data Is Powering Real-Time Intelligence in Health Systems
Big Data Exclusive
intersection of data
The Intersection of Data and Empathy in Modern Support Careers
Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

programming languages to learn
ExclusiveProgramming

Top Programming Languages For Data Developers In 2019

8 Min Read
which JS framework is best
Big DataExclusiveProgramming

Which JS Framework Is Best For Big Data Development?

6 Min Read
Perl mod_rewrite
ExclusivePerlSoftware

The Syntax Structure Of Perl Mod_Rewrite Method

7 Min Read
benefits of no-code platforms for data science
Data Science

5 Reasons No-Code Platforms Are the Future of Data Science and AI

9 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?