Welcome, Guest. Please Login.
IRobotSoft Robot
08/07/20 at 17:43:54
News: Welcome to the IRoboSoft Visual Web Scraping and Web Automation forum.
Home Help Search Login
Google
 


Pages: 1
Send Topic Print
callParallel browser limitation... (Read 2991 times)
BrentH
Junior Member
**




Posts: 16
callParallel browser limitation...
09/20/16 at 12:58:38
 
On a particular website I can successfully get data using callTask.
 
When I switch to callParallel, I get data that says 'please upgrade your browser'.
 
I am guessing that the socket browser cannot handle certain pages or is out of date. Or perhaps that the page senses that javascript cannot be run and produces this message?
 
Can you enlighten me as to how callParallel works and suggest alternatives/workarounds if possible?
 
I know when I use callTask...irobot will eventually runs out of memory (as I understand this is a known issue with IE and garbage collection).  I am trying not to use callTask for parsing thousands of detail pages in new window.
 
Thanks
Back to top
 
 

Win7, 64bit, latest version of irobot (visual)
  IP Logged
IRobotSoft Administrator
IRobotSoft Administrator
*****


IRobotSoft, the Best
Internet Robot
System

Posts: 1608
Gender: male
Re: callParallel browser limitation...
Reply #1 - 09/20/16 at 20:45:28
 
The callParallel use socket browser to crawl through pages.  So it will not work for pages that requires running javascript.  
 
The typical design is to use the visual browser to handle those javascript heavy pages, and separate out actions that does not need javascript into a subtask.  Then you can callParallel to the subtask.  
 
Back to top
 
 

The Administrator.
WWW   IP Logged
BrentH
Junior Member
**




Posts: 16
Re: callParallel browser limitation...
Reply #2 - 09/21/16 at 12:09:44
 
I understand that javascript will not execute when using callParallel and IF javascript is needed to access the updated view/DOM to get at the data then callParallel should not be used.  
 
However, I would think that callParallel should still return source code for the page...regardless if it contains javascript.
 
Example page: http://www.yellowpages.com/minneapolis-mn/mip/tasty-pizza-4154101?lid=1000120596103
 
This page uses a lot of javascript; but the javascript DOES NOT need to execute to create the view/DOM that is needed for extraction...the source code is perfect.  
This is what happens...
 
callTask will returns source code data for that page (correct).
callParallel does not return source code for the page (incorrect)...it returns source code for another page that states 'please upgrade your browser'.
So it seems that callParallel/socket browser, is allowing for 'something' to execute which is returning a different result.
 
Note: I have tested using other tools that do not execute javascript to retrieve the the page...and they all return the source code as expected.
 
Is there a function or setting that forces initial/uninterpreted source code retrieval?
 
Thanks for your time
Back to top
 
 

Win7, 64bit, latest version of irobot (visual)
  IP Logged
BrentH
Junior Member
**




Posts: 16
Re: callParallel browser limitation...
Reply #3 - 10/10/16 at 21:58:14
 
yellowpages.com only supports IE 9 and above (I tested with IE emulation mode).
 
I now believe this is a browser version issue and nothing to do with javascript.
 
What version is the embedded socket browser? Is there a method to switch versions?
 
I also, tried to use the Advanced --> Main Browser --> IE Browser (run as admin).
With IE 11 installed I still get browser not supported page. This should work, right?
 
Thanks
 
Back to top
 
 

Win7, 64bit, latest version of irobot (visual)
  IP Logged
IRobotSoft Administrator
IRobotSoft Administrator
*****


IRobotSoft, the Best
Internet Robot
System

Posts: 1608
Gender: male
Re: callParallel browser limitation...
Reply #4 - 10/11/16 at 09:28:50
 
Socket Browser does not support any browser version.   You can use menu Advanced -> Server Test Mode -> Test Server-mode Full to see if the socket browser works for the website.   Otherwise, you have to use the embedded IE browser.   You can parallelize through dividing the job and running multiple irobot instances on your computer.  
Back to top
 
 

The Administrator.
WWW   IP Logged
BrentH
Junior Member
**




Posts: 16
Re: callParallel browser limitation...
Reply #5 - 10/11/16 at 21:49:17
 
I tried testing using: Advanced -> Server Test Mode -> Test Server-mode Full  
...this did not work.
 
Is a 'user agent' string being passed with the website request? Something is telling yellowpages.com that the request is coming from an unsupported browser.  What is it?
 
Thanks
Back to top
 
 

Win7, 64bit, latest version of irobot (visual)
  IP Logged
Pages: 1
Send Topic Print