Welcome, Guest. Please Login.
IRobotSoft Robot
08/07/20 at 16:48:27
News: Welcome to the IRoboSoft Visual Web Scraping and Web Automation forum.
Home Help Search Login
Google
 


Pages: 1
Send Topic Print
Question about threads (Read 3934 times)
itsme2333
Ex Member



Question about threads
07/08/07 at 11:47:23
 
One of the options that your program offers is Multi-thread where you give the warning CAUTION.
 
I don't know if I will ever need that capacity or not, but I would like to know a little bit more about it.
 
Would you mind explaining a little bit about the Multi-thread option?
 
thanks
Back to top
 
 
  IP Logged
IRobotSoft Administrator
IRobotSoft Administrator
*****


IRobotSoft, the Best
Internet Robot
System

Posts: 1608
Gender: male
Re: Question about threads
Reply #1 - 07/08/07 at 16:38:05
 
The multi-thread mode is used to expedite the retrieval of Web pages.  It works in the following.  In the multiple-thread mode, some background threads are started to prefetch pages that are predicted to be retrieved soon.  The main thread will run in the regular mode.  However, if a page alreay has been prefetched by the background threads, then the main thread will use the prefetched page.  By this means the robot can run faster.  
 
Limitations for multiple-threading are:  
1) Only static links will be prefetched.  Dynamic links will not be prefetched.    
2) If the Web pages are dynamic pages that depend on some session cookies, then the prefetched page may not be correct.  Such error may not be noticed by the user if unaware.  This is the main reason for the CAUTION.  
 
Our experiments show that multi-threading may only reduce the running time by half.  It is not very large because we try to avoid unnecessary prefetching, and have to ensure the correct programming logic.  
Back to top
 
 

The Administrator.
WWW   IP Logged
p3lewis
IRobotSoft Newbies
*


I Love IRobotSoft

Posts: 4
Gender: male
Re: Question about threads
Reply #2 - 12/28/15 at 13:34:48
 
Hi Admin, i want to re-open this one, i am trying to do multi threading against a certain set of twitter feed.  
I can't get it to work. I tried using your guide in the documentation without much success. Can you help? What do you need from me to help with it?
 
Thank you for your time in advance!
Back to top
 
 
  IP Logged
IRobotSoft Administrator
IRobotSoft Administrator
*****


IRobotSoft, the Best
Internet Robot
System

Posts: 1608
Gender: male
Re: Question about threads
Reply #3 - 12/28/15 at 14:45:21
 
Currently, there are some limitations for irobot multithreading, which may affect your twitter feed:  
1. It doesnot support HTTPS websites
2. It cannot run any javascript in the web page.  
 
So I guess you are not able to do multithreading for twitter.  
Back to top
 
 

The Administrator.
WWW   IP Logged
p3lewis
IRobotSoft Newbies
*


I Love IRobotSoft

Posts: 4
Gender: male
Re: Question about threads
Reply #4 - 12/28/15 at 14:56:47
 
Quote from IRobotSoft Administrator on 12/28/15 at 14:45:21:
Currently, there are some limitations for irobot multithreading, which may affect your twitter feed:  
1. It doesnot support HTTPS websites
2. It cannot run any javascript in the web page.

So I guess you are not able to do multithreading for twitter.  

 
Thanks for the prompt response. Is there a possible workaround? Is there a plan to fix this limitation?
Back to top
 
 
  IP Logged
IRobotSoft Administrator
IRobotSoft Administrator
*****


IRobotSoft, the Best
Internet Robot
System

Posts: 1608
Gender: male
Re: Question about threads
Reply #5 - 12/29/15 at 06:21:08
 
We will try to add https support for mulithreading.  
 
As a work around, you can:  
1. Divide your feeds into multiple input files
2. Use command line to run multiple irobots simultaneously, each uses a different input file.  
 
Back to top
 
 

The Administrator.
WWW   IP Logged
IRobotSoft Administrator
IRobotSoft Administrator
*****


IRobotSoft, the Best
Internet Robot
System

Posts: 1608
Gender: male
Re: Question about threads
Reply #6 - 01/18/16 at 01:53:38
 
The new release would allow you to parallelize https scraping.  Note that you will need to extract https.dll from http://irobotsoft.com/https.zip and put it in the IROBOT\system directory.  
 
Back to top
 
 

The Administrator.
WWW   IP Logged
Pages: 1
Send Topic Print