Sites created with SPA cannot be scraped as is.
With PhantomJsCloud, you can convert pages made with JS like SPA to HTML and scrape them.
API KEY
is copy from PhantomJsCloud.
const URL = 'http://*.com/*'
const KEY = '*';
let payload =
{
url: URL,
renderType: 'HTML',
outputAsJson: true
};
payload = JSON.stringify(payload);
payload = encodeURIComponent(payload);
let fetchUrl = 'https://phantomjscloud.com/api/browser/v2/' + KEY + '/?request=' + payload;
let res = UrlFetchApp.fetch(fetchUrl).getContentText("UTF-8");
how to get API KEY
Sign up
Email
Sign up Email
get account
can check free credit balance. https://dashboard.phantomjscloud.com/dash.html
ApiKey is KEY
An error is returned if the free amount is exceeded.
like this ↓
Exception: Request failed for https://phantomjscloud.com returned code 402. Truncated server response: {"name":"HttpStatusCodeException","message":"OUT OF CREDITS: Your account is out of both Daily Subscription Credits and Prepaid Credits. Either wai... (use muteHttpExceptions option to examine full response)
status code
https://phantomjscloud.com/docs/ > Debugging Page Errors: Status Codes200
: OK The target page was captured properly.400
: Bad Request Your request had an error in it. Fix it before resubmitting.401
: Unauthorized You are using an invalid Api Key. Please check for typos, or create an account.402
: Payment Required Your account is out of credits. Login and either upgrade your Subscription or add Prepaid Credits.403
: Forbidden Your request was flagged due to abuse. Read the response for steps you should take to resolve the situation.424
: Failed Dependency The target page was not reachable (the request timed out). Check and make sure your target URL is valid
429: Too Many Simultaneous Requests You sent a sudden spike of simultaneous requests. PhantomJsCloud can handle hundreds of simultaneous requests, but we require you to gracefully increase the number of concurrent requests over time, not send a sudden spike. Please increase the number of your simultaneous requests according to the schedule shown in the 'Testing and Performance Optimization' section of the docs page. (add +1 simultaneous requests every 3 seconds, or +10 simultaneous every 30 seconds). You may retry this request immediately, with no modifications.
500: Internal Server Error The PhantomJsCloud instance suffered an internal error. You can retry your request immediately, without modifications. If errors still occur, these are the known causes:
More time needed, retry with larger pageRequest.requestSettings.maxWait value.
An incompatible webfont is causing PhantomJs to crash, try blacklisting any font resources (.otf, .ttf, .woff) for example:
pageRequest.requestSettings.resourceModifier:[{regex:'.*ttf.*|.*otf.*|.*woff.*',isBlacklisted:true}]
If you still have problems, please submit your request to Support@PhantomJsCloud for diagnosis.
502: Bad Gateway Your request did not reach PhantomJsCloud due to a network failure. You can retry your request immediately, without modifications. If errors still occur, see the "502 Bad Gateway" Troubleshooting item above.
503: Server Too Busy SERVER TOO BUSY: The serer is temporarily overwhelmed with other requests, and it's request backlog is very large. We are returning this to you to prevent risk of a http timeout occurring instead. You may immediately retry your request. Support@PhantomJsCloud.com has been notified and will investigate. You may retry this request with no modifications.