Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform February Crawl #85

Closed
katehausladen opened this issue Feb 6, 2024 · 3 comments
Closed

Perform February Crawl #85

katehausladen opened this issue Feb 6, 2024 · 3 comments
Assignees

Comments

@katehausladen
Copy link
Collaborator

I talked to Daniel, and this week is the best week for me to have the computer. I started the crawl last night.

@katehausladen
Copy link
Collaborator Author

The crawl is finished! GPP implementation nearly doubled since December!

@katehausladen
Copy link
Collaborator Author

I reopened the issue to merge the code used for this crawl. Since I crawled the whole crawl set with these changes, I went ahead and just merged the changes. The changes were (1) cap the debugging table entries at 4,000 characters, since that is what our table allows (2) add another human check regular expression and (3) update the readme to reflect wellknown changes.

@katehausladen
Copy link
Collaborator Author

here's the updated analysis flow / architecture powerpoint
web-crawler-architecture.pptx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants