Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
    data analytics for trademark registration
    Optimizing Trademark Registration with Data Analytics
    6 Min Read
    data analytics for finding zip codes
    Unlocking Zip Code Insights with Data Analytics
    6 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: esProc Improves Text Processing: Fetching Data from a Batch of Files
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Uncategorized > esProc Improves Text Processing: Fetching Data from a Batch of Files
Uncategorized

esProc Improves Text Processing: Fetching Data from a Batch of Files

raqsoft
raqsoft
6 Min Read
SHARE
Sometimes we need to fetch certain data from multiple files of a multi-level directory during text processing. The operation is too complicated to be well performed at the command line. Though it can be realized in high-level languages, the code is difficult to write; and the involvement of big files will increase the difficulty. esProc, however, can import big files with cursors and call the script recursively and thus can process the data fetching in batch. The following example will show its way of doing it.
Sometimes we need to fetch certain data from multiple files of a multi-level directory during text processing. The operation is too complicated to be well performed at the command line. Though it can be realized in high-level languages, the code is difficult to write; and the involvement of big files will increase the difficulty. esProc, however, can import big files with cursors and call the script recursively and thus can process the data fetching in batch. The following example will show its way of doing it.
 

A directory – “D:\files” – has subdirectories of multiple levels. Each subdirectory has many files of text format. We are asked to fetch a specified line (say the second line) from each of these files and write them into a new file – result.txt. Part of the structure of D:\files is as follows:


esProc code for doing this: 


First define a parameter, path, and set its initial value as “D:\files” so as to get data from this directory, as shown below: 


A1=directory@p(path)


directory function is used to get the file list in the root directory of the parameter, path. @p option means file names should be presented with full path. The following shows some of the results: 

A2=A1.(file(~).cursor@s()) . This line of code opens A1’s files respectively in the form of cursors. A1.(…) means processing A1’s members in proper order; “~” represents the current member; filefunction is used to create a file object and cursor function will return a cursor object according to the file object.
 

Tab is used as the default separatorin cursor function. Default column names are 1,_2…_n. @s function means ignoring the separator and importing the file content as the strings in a single column with _1 being the column name. Note that the code only creates the cursor objects but doesn’t fetch data. The data fetching will be started by the use of fetchfunction. The results of A2 are as follows: 


A3=A2.((~.skip(1),~.fetch@x(1)))This line of code fetches the second row from A2’s each file cursor. A2.(…) means computing A2’s cursors one by one. (~.skip(1),~.fetch@x(1)) means computing the expression in the parentheses in order and returning the last computed result. ~.skip(1) means skipping a row. ~.fetch@x(1) means fetching the row at the current position (i.e. the second row) and closing the cursor. @x means closing the cursor automatically after the data are fetched. ~.fetch@x(1)represents the result which the parentheses operator will return.
 
skip function skips multiple rows. You can determine how many rows need to be skipped through a parameter. fetch function fetches multiple rows. Fetch two rows starting from the 10th row, for example, the code is ~.skip(10),fetch@x(2).
 

The following shows some of the results of A3: 


A4=A3.union()This line of code unions the results in A4 together. union function is used to realize the union operation, removing the duplicate data at the same time. For example, the code for computing the union of two sets: [1,2] and [2,3] is [1,2],[2,3]].union() and the result is [1,2,3]. If duplicate data are wanted, conjfunction (for concatenation) should be used. Some of the results of A4 are as follows: 


A5=file(“d:\\result.txt”).export@a(A4)This line of code exports the results of A4 to result.txt. export function is used to write data to a file. @a option means appending.

At this point, all data have been fetched as required from the current directory. The rest of the work is to fetch the subdirectories of the current directory and to call this script recursively.
 
A6=directory@dp(path)directory function is used to fetch all the subdirectories from the current directory. One of the options, d, means fetching the subdirectory names and the other one, p, means fetching the full paths. Thus A6 gets the subdirectories from D:\files: 


A7=A6.(call(“c:\\readfile.dfx”,~))This line of code deals with A6’s members (the subdirectories). The operation is to call the esProc script – c:\\readfile.dfx, and makes the current member (one of the subdirectories) as the input parameter. Note that readfile.dfx is the name of this script.
 

Through the recursive call in A7, esProc will fetch data from a batch of files of the multilevel directory of D:\files. You can see the final result in result.txt: 


TAGGED:codingesProc
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

accountant using ai
AI Improves Integrity in Corporate Accounting
Exclusive
ai and law enforcement
Forensic AI Technology is Doing Wonders for Law Enforcement
Artificial Intelligence Exclusive
langgraph and genai
LangGraph Orchestrator Agents: Streamlining AI Workflow Automation
Artificial Intelligence Exclusive
ai fitness app
Will AI Replace Personal Trainers? A Data-Driven Look at the Future of Fitness Careers
Artificial Intelligence Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

benefits of no-code platforms for data science
Data Science

5 Reasons No-Code Platforms Are the Future of Data Science and AI

9 Min Read
which JS framework is best
Big DataExclusiveProgramming

Which JS Framework Is Best For Big Data Development?

6 Min Read

Simple Inter-row Computation: esProc Keeps It Simple!

7 Min Read
Perl mod_rewrite
ExclusivePerlSoftware

The Syntax Structure Of Perl Mod_Rewrite Method

7 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence
ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?