By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data Analytics instagram stories
    Data Analytics Helps Marketers Make the Most of Instagram Stories
    15 Min Read
    analyst,women,looking,at,kpi,data,on,computer,screen
    What to Know Before Recruiting an Analyst to Handle Company Data
    6 Min Read
    AI analytics
    AI-Based Analytics Are Changing the Future of Credit Cards
    6 Min Read
    data overload showing data analytics
    How Does Next-Gen SIEM Prevent Data Overload For Security Analysts?
    8 Min Read
    hire a marketing agency with a background in data analytics
    5 Reasons to Hire a Marketing Agency that Knows Data Analytics
    7 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: esProc Improves Text Processing: Fetching Data from a Batch of Files
Share
Notification Show More
Aa
SmartData CollectiveSmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Uncategorized > esProc Improves Text Processing: Fetching Data from a Batch of Files
Uncategorized

esProc Improves Text Processing: Fetching Data from a Batch of Files

raqsoft
Last updated: 2015/06/02 at 4:31 AM
raqsoft
6 Min Read
SHARE
Sometimes we need to fetch certain data from multiple files of a multi-level directory during text processing. The operation is too complicated to be well performed at the command line. Though it can be realized in high-level languages, the code is difficult to write; and the involvement of big files will increase the difficulty. esProc, however, can import big files with cursors and call the script recursively and thus can process the data fetching in batch. The following example will show its way of doing it.

Sometimes we need to fetch certain data from multiple files of a multi-level directory during text processing. The operation is too complicated to be well performed at the command line. Though it can be realized in high-level languages, the code is difficult to write; and the involvement of big files will increase the difficulty. esProc, however, can import big files with cursors and call the script recursively and thus can process the data fetching in batch. The following example will show its way of doing it.
 

A directory – “D:\files” – has subdirectories of multiple levels. Each subdirectory has many files of text format. We are asked to fetch a specified line (say the second line) from each of these files and write them into a new file – result.txt. Part of the structure of D:\files is as follows:


esProc code for doing this: 


First define a parameter, path, and set its initial value as “D:\files” so as to get data from this directory, as shown below: 

More Read

benefits of no-code platforms for data science

5 Reasons No-Code Platforms Are the Future of Data Science and AI

The Syntax Structure Of Perl Mod_Rewrite Method
Top Programming Languages For Data Developers In 2019
Which JS Framework Is Best For Big Data Development?
Using esProc to Compute the Online Time of Users [PART 2]


A1=directory@p(path)


directory function is used to get the file list in the root directory of the parameter, path. @p option means file names should be presented with full path. The following shows some of the results: 

A2=A1.(file(~).cursor@s()) . This line of code opens A1’s files respectively in the form of cursors. A1.(…) means processing A1’s members in proper order; “~” represents the current member; filefunction is used to create a file object and cursor function will return a cursor object according to the file object.
 

Tab is used as the default separatorin cursor function. Default column names are 1,_2…_n. @s function means ignoring the separator and importing the file content as the strings in a single column with _1 being the column name. Note that the code only creates the cursor objects but doesn’t fetch data. The data fetching will be started by the use of fetchfunction. The results of A2 are as follows: 


A3=A2.((~.skip(1),~.fetch@x(1)))This line of code fetches the second row from A2’s each file cursor. A2.(…) means computing A2’s cursors one by one. (~.skip(1),~.fetch@x(1)) means computing the expression in the parentheses in order and returning the last computed result. ~.skip(1) means skipping a row. ~.fetch@x(1) means fetching the row at the current position (i.e. the second row) and closing the cursor. @x means closing the cursor automatically after the data are fetched. ~.fetch@x(1)represents the result which the parentheses operator will return.
 
skip function skips multiple rows. You can determine how many rows need to be skipped through a parameter. fetch function fetches multiple rows. Fetch two rows starting from the 10th row, for example, the code is ~.skip(10),fetch@x(2).
 

The following shows some of the results of A3: 


A4=A3.union()This line of code unions the results in A4 together. union function is used to realize the union operation, removing the duplicate data at the same time. For example, the code for computing the union of two sets: [1,2] and [2,3] is [1,2],[2,3]].union() and the result is [1,2,3]. If duplicate data are wanted, conjfunction (for concatenation) should be used. Some of the results of A4 are as follows: 


A5=file(“d:\\result.txt”).export@a(A4)This line of code exports the results of A4 to result.txt. export function is used to write data to a file. @a option means appending.

At this point, all data have been fetched as required from the current directory. The rest of the work is to fetch the subdirectories of the current directory and to call this script recursively.
 
A6=directory@dp(path)directory function is used to fetch all the subdirectories from the current directory. One of the options, d, means fetching the subdirectory names and the other one, p, means fetching the full paths. Thus A6 gets the subdirectories from D:\files: 


A7=A6.(call(“c:\\readfile.dfx”,~))This line of code deals with A6’s members (the subdirectories). The operation is to call the esProc script – c:\\readfile.dfx, and makes the current member (one of the subdirectories) as the input parameter. Note that readfile.dfx is the name of this script.
 

Through the recursive call in A7, esProc will fetch data from a batch of files of the multilevel directory of D:\files. You can see the final result in result.txt: 


TAGGED: coding, esProc
raqsoft June 2, 2015
Share This Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

ai low code frameworks
AI Can Help Accelerate Development with Low-Code Frameworks
Artificial Intelligence
data Analytics instagram stories
Data Analytics Helps Marketers Make the Most of Instagram Stories
Analytics
data breaches
How Hospital Security Breaches Devastate Local Communities
Policy and Governance
analyst,women,looking,at,kpi,data,on,computer,screen
What to Know Before Recruiting an Analyst to Handle Company Data
Analytics

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

benefits of no-code platforms for data science
Data Science

5 Reasons No-Code Platforms Are the Future of Data Science and AI

9 Min Read
Perl mod_rewrite
ExclusivePerlSoftware

The Syntax Structure Of Perl Mod_Rewrite Method

7 Min Read
programming languages to learn
ExclusiveProgramming

Top Programming Languages For Data Developers In 2019

8 Min Read
which JS framework is best
Big DataExclusiveProgramming

Which JS Framework Is Best For Big Data Development?

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?