Questions tagged [web-crawler]

Ask Question

A Web crawler (also known as Web spider) is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots, or – especially in the FOAF community – Web scutters.

9,707 questions

0 votes

0 answers

10 views

Android App Crawler is not running with API 34

App Crawler is working perfectly with API 28 but not working with API 34 I use the same command: java -jar crawl_launcher.jar --apk-file app-debug.apk --app-package-name com.example.myapplication --...

Ahmed Osama

asked 2 hours ago

0 votes

0 answers

11 views

Issues with Mapping Discovered URLs to Original URLs in Katana CLI Batch Processing

I'm using the Katana CLI for web crawling, with a Python wrapper to manage batch processing and output parsing. My goal is to map all discovered URLs back to their original URLs, but I'm facing issues ...

Luis Solorzano

asked Jul 23 at 21:47

0 votes

0 answers

15 views

what is the problem with this public api ? i can get response back

im trying to crawl FlyToday.ir for foreign hotels price rates . the problem is there is an api in the network tab called by post method and has response in return but when i want to call it (with all ...

Mohamad Alizade

asked Jul 23 at 13:50

0 votes

0 answers

15 views

How can I get the data fast from fireStore

I've a hotel booking website in which I have a /hotels-in-a-city page which is dynamic which get's list of all the hotels in a particular city through getServerSideProps() function and when I tap on a ...

Rishabh Sharma

asked Jul 23 at 8:13

0 votes

1 answer

57 views

Unable to Stop Running Sync Job in AWS Bedrock Knowledge Base

I have an issue with AWS Bedrock Knowledge Base, Web crawler as a data source, I have accidently put 2 URLs, of Wikipedia (e.g, "https://en.wikipedia.org/wiki/article1 and second URL: "https:...

Andrey

asked Jul 20 at 14:52

-3 votes

1 answer

45 views

Crawl data in Top 250 Movies IDMb

Please, i need someone help me. I can't understand why I only crawl 25 movies instead of 250. My code: import pandas as pd import requests from bs4 import BeautifulSoup headers = {'User-Agent': '...

Vu-Hoang Duong

asked Jul 20 at 4:31

-1 votes

0 answers

12 views

Weblow pagination hurt SEO? [closed]

I'm using Webflow for a certain website and a lot of paginated pages end up in the GSC tab: crawled, currently not indexed. For example: https://www.example.com/blog?65b097f7_page=5 Is this hurting ...

Ruben

asked Jul 17 at 11:39

0 votes

1 answer

30 views

How to exclude div classes 'modal-content' and 'modal-body' from pyppeteer web scraper?

I'm building a scraper that gets text data from a list of articles. A common specimen in the text content I'm scraping at the minute is that at the bottom there is this message: "As a subscriber, ...

Shehzadi Aziz

asked Jul 11 at 20:07

0 votes

0 answers

12 views

Sudden increase in requests received

my application suddenly had a huge increase in the number of requests being made to it. I believe the only change of merit was adding a sitemap.xml and I believe the increase in requests is due to ...

egauzens

asked Jul 11 at 18:46

0 votes

0 answers

10 views

Github Action _ Overwriting/replace/update .json file prblem

I want to use google web API to crap some coffee shop info from my country then there is already a original version .json file in my repo to use, but if some new coffee shop be created ,I need to ...

Yatayork

asked Jul 11 at 8:13

0 votes

0 answers

19 views

AWS crawler creating Null values for partion columns

I am having some country level partitioned data in s3 and crawler is crawling the this root folder and creating a table. No Null value is there for country code. But when looked in the Athena, there ...

Ananth

asked Jul 9 at 0:01

-3 votes

0 answers

50 views

Download ICD-10 codes (International Classification of Diseases)

We can easily browse the ICD-10 codes: https://icd.who.int/browse10/2019/en Unfortunately, there is no way to download all of the codes as TXT (or XLS) file in order to parse with Python, or import ...

JoyfulPanda

asked Jul 6 at 15:33

-1 votes

0 answers

20 views

crawler - rotten tomatoes website - problem with pages

im trying to crawl the website rotten tomatoes but i have a problem: to get the html for page 5 and above of the movies for example: https://www.rottentomatoes.com/browse/movies_at_home/?page=**8** ...

Nadav Goldin

asked Jul 3 at 18:13

1 vote

1 answer

62 views

Scrapy Spider does not work with multiple urls

I wrote a Scrapy spider and used Selenium in it to scrape the products in devgrossonline.com. It does not work with multiple category urls, but it works when I provide only one url. Here is my spider: ...

serkan ertas

asked Jul 1 at 15:52

-1 votes

0 answers

22 views

The time obtained by the Python crawler is incorrect when getting comments

When I use Python to crawl stock comments from a website, the time parsed from the website is different from the time obtained by my crawler. For example： when use the F12 to detect the website，i find ...

Ohhhhh

asked Jul 1 at 10:51

15 30 50 per page

2 3 4 5

…

648 Next

Collectives™ on Stack Overflow

Questions tagged [web-crawler]

Android App Crawler is not running with API 34

Issues with Mapping Discovered URLs to Original URLs in Katana CLI Batch Processing

what is the problem with this public api ? i can get response back

How can I get the data fast from fireStore

Unable to Stop Running Sync Job in AWS Bedrock Knowledge Base

Crawl data in Top 250 Movies IDMb

Weblow pagination hurt SEO? [closed]

How to exclude div classes 'modal-content' and 'modal-body' from pyppeteer web scraper?

Sudden increase in requests received

Github Action _ Overwriting/replace/update .json file prblem

AWS crawler creating Null values for partion columns

Download ICD-10 codes (International Classification of Diseases)

crawler - rotten tomatoes website - problem with pages

Scrapy Spider does not work with multiple urls

The time obtained by the Python crawler is incorrect when getting comments

Hot Network Questions

Collectives™ on Stack Overflow

Questions tagged [web-crawler]

Related Tags