Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search all silo posts for links to users' sites and send mentions #456

Closed
snarfed opened this issue Sep 1, 2015 · 28 comments
Closed

search all silo posts for links to users' sites and send mentions #456

snarfed opened this issue Sep 1, 2015 · 28 comments
Assignees
Labels

Comments

@snarfed
Copy link
Owner

snarfed commented Sep 1, 2015

spun out of #51. from #51 (comment):

an idea for expanding this: search silos for any posts, from anyone, that link to the user's domain(s), and send wms for them too. these are effectively mentions.

silo support for this is mixed:

@snarfed snarfed changed the title seach all silo posts for links to users' sites and send mentions Sep 1, 2015
@snarfed snarfed self-assigned this Sep 12, 2015
@snarfed
Copy link
Owner Author

snarfed commented Sep 13, 2015

cc @kylewm in case you're interested in adding flickr search support... (see above)

@snarfed
Copy link
Owner Author

snarfed commented Sep 14, 2015

the remaining part here is to send mention posts themselves, not just their responses. this needs a new post response type connected to the post mf2 handler.

snarfed added a commit that referenced this issue Sep 19, 2015
snarfed added a commit that referenced this issue Sep 19, 2015
snarfed added a commit to snarfed/granary that referenced this issue Sep 20, 2015
snarfed added a commit that referenced this issue Sep 20, 2015
snarfed added a commit that referenced this issue Sep 20, 2015
snarfed added a commit that referenced this issue Sep 20, 2015
@snarfed
Copy link
Owner Author

snarfed commented Sep 20, 2015

finally soft launched this, and it worked well, but evidently has a memory leak, so i had to roll it back.

Exceeded soft private memory limit of 256 MB with 328 MB after servicing 2 requests total.

ugh.

there's FUD here and there about the sockets API maybe causing memory leaks due to badly handled range requests, but i can't tell how real it is or if it could be causing this. i suspect i've just been wasteful with memory, e.g. lots of string concatenations and copy.deepcopys, and it's finally time to pay the piper. whee, can't wait to heap profile. 😭

silver lining: at least i know the window of commits where the leak was introduced!

@snarfed
Copy link
Owner Author

snarfed commented Sep 20, 2015

the little orange bump of 500s here is our instances flapping (OOMing, restarting, and OOMing again):

chart

here's a snippet of individual requests at peak flap. the red !!! ones are OOMs. not pretty!

screen shot 2015-09-20 at 12 24 48 pm

@snarfed
Copy link
Owner Author

snarfed commented Sep 20, 2015

silver lining: it's working ok, at least! e.g. the top response here: https://www.brid.gy/twitter/kylewmahan#responses is this tweet: https://twitter.com/anarcho/status/643921641664200704 which propagated as a mention to https://kylewm.com/2015/09/repost-of-glenn-greenwald-the-new-revolving-door

@kylewm
Copy link
Contributor

kylewm commented Sep 21, 2015

wow, that mention is hidden behind a redirect too, pretty cool!

@snarfed
Copy link
Owner Author

snarfed commented Sep 24, 2015

some of this might be just because our slow poll frequency is once a day, so we're still working through the first set of search results for many users. that should be done by around noon PST. i'll revisit if latency is still consistently bad after that.

@snarfed
Copy link
Owner Author

snarfed commented Sep 24, 2015

scratch that, we'll be caught up by ~1:30pm PST today, since we're ~90m behind. math!

@snarfed
Copy link
Owner Author

snarfed commented Sep 25, 2015

poll latency is looking better now. averaging 5-10s, higher than ~4s before, but still reasonable.

screen shot 2015-09-25 at 11 42 22 am

@snarfed
Copy link
Owner Author

snarfed commented Sep 25, 2015

the poll queue is still behind by 45m :/, but i'm hoping some of that was due to #490. i pushed out a change there (1ebfe1c) a few hours ago that adds a bunch of shortlink generator domains to the blacklist and checks the blacklist before searching for a domain, so i'm hoping that will help some too.

@snarfed
Copy link
Owner Author

snarfed commented Sep 26, 2015

tentatively closing. this has been running in prod and stable for a few days. I'm sure there are more bugs left to fix, but we can open new issues for them.

@singpolyma
Copy link
Contributor

Does brid.gy also turn @ mentions to my twitter username to webmentions to my domain? That would be similar to this and very nice

@snarfed
Copy link
Owner Author

snarfed commented Oct 22, 2015

@singpolyma not right now, but that's an interesting feature request. just to confirm, you're proposing they'd be sent to your front page, e.g. target=https://singpolyma.net/?

@singpolyma
Copy link
Contributor

@snarfed yes. or whatever URL is on my twitter profile

@snarfed
Copy link
Owner Author

snarfed commented Dec 6, 2015

i currently craft search queries by stripping scheme (ie http://), putting quotes around the remaining domain and path, and ORing all of those together, e.g. "snarfed.org" OR "instagram.com/snarfed". sadly, this has been returning both false positive and false negatives in both G+ and Twitter. :/

i added the scheme back to G+ searches in 485af73, and it looks like that cut out the false positives but didn't add any false negatives.

still working on Twitter. here's some research so far for the example domain hypothes.is, including links to searches:

hrmph.

@snarfed
Copy link
Owner Author

snarfed commented Dec 6, 2015

i'm now thinking about still using the "hypothes.is" style search for twitter and filtering out the false positives manually.

@snarfed
Copy link
Owner Author

snarfed commented Dec 6, 2015

@singpolyma
Copy link
Contributor

Filtering false positives seems like an essential thing to do. Trying to get as much as possible is probably the best, then filter after

@snarfed
Copy link
Owner Author

snarfed commented Dec 6, 2015

i wish! sadly many users' domains are common words, or have common words in them, so their false positive rate can be 1K:1 or even 1M:1 for domains with words like blog or web. :/ and bridgy is approaching 1k twitter users, so I'd like to try to cut down that workload (and cost) a bit.

@singpolyma
Copy link
Contributor

filter out common words and only search for the unique part maybe?

@snarfed
Copy link
Owner Author

snarfed commented Dec 6, 2015

oh boy, and now i'm in the business of maintaining a stop word list and search query rewriter. :P you're definitely right, it's doable, i'm just not sure i want to take that plunge...

@singpolyma
Copy link
Contributor

Sorry. Was a thought

@snarfed
Copy link
Owner Author

snarfed commented Dec 6, 2015

np! definitely appreciated. 👬

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 participants