I'm trying to webscrap a website (actually I'm downloading a .xlsx file) with R, but I got stuck since I don't know too much about both R and HTML.
I found the code below here, but it is not doing all the job. On the website I need to select (on a dropdown menu) the department and the checklist type, and then click on a download button (which opens the "download url" from the code bellow). The problem is the website allows me to download data from one department at a time.
So I would like to find a way to do this with R, and download the files from all the departments just changing the ID (from both department and checklist).
install.packages("rvest")
library(rvest)
url <- "https://website.com/auth/login?redirectUrl="
download_url <- "https://website.com/departments/file.xlsx"
session <- html_session(url)
form <- html_form(session)[[1]]
filled_form <- set_values(form,
email = "myusername", # "email" is an element from website HTML code
password = "mypassword") # "mypassword" is an element from website HTML code
## Save main page url
main_page <- submit_form(session, filled_form)
#after login and submit_form do this:
download <- jump_to(main_page, download_url)
# write file to current working directory
writeBin(download$response$content, basename(download_url))
The area ID has the format in the HTML code bellow. I guess the JS function gets this setIdPlantaAtual(123) value and filter the database when I select the download button. The other filter seems to be based on the setModuloAtual parameter (equivalent to checklist type).
// Department HTML code
<h5><a onclick="setIdPlantaAtual(123)" href="javascript:;">Department ABC <span class="badge badge-success">selecionar</span></a></h5>
// Checklist HTML code
<li class="new" onclick="setModuloAtual('Checklist1')" style="cursor:pointer;">
Bellow are the two functions I found on the website code.
function setIdPlantaAtual(_idplantaatual) {
$.blockUI({
message: '<h1> Wait...</h1>'
});
$.post(baseUrl + '/index/idplantaatual', {
id: _idplantaatual
}, function(res) {
if (res == 'OK') {
location.reload();
} else {
$.unblockUI();
}
});
}
function setModuloAtual(modulo) {
if (modulo == "Tratativas de NCs") {
modulo = "Tratativas de NC\'s";
}
$.post(baseUrl + '/index/moduloatual', {
modulo: modulo
}, function(res) {
console.log('aaa',res)
if (res == 'OK') {
// location.reload();
window.location.replace(baseUrl);
} else if (res.split('$%$').length > 0) {
var new_modulo = res.split('$%$')[1];
if (new_modulo != undefined) {
window.location.replace(baseUrl + '/' + new_modulo);
} else {
window.location.replace(baseUrl + '/');
}
} else {
console.log(res);
}
});
}
I think the same approach from my first attempt would work to merge the department ID (setIdPlantaAtual()) and the checklist ID (setModuloAtual()) into the download url, but I don't know how to do that.
I appreciate any help!
Via Active questions tagged javascript - Stack Overflow https://ift.tt/Wiv3CQa
Comments
Post a Comment