No internet connection
  1. Home
  2. Issues

Spam filter blocked legitimate post

By @daveb
    2024-05-07 11:11:18.853Z

    In our forum, a user tried to share some device logs as plain text. The spam filter blocked the post for containing too many URLs which was a frustrating experience for the user. We don't encourage users to share the plain text (we ask for a link to logs), but sharing the plain text happens occasionally.

    I'm unsure how the current spam filter works, so I don't know which URLs in the post triggered the filter. But I know that the logs users share with us contain URLs to our domain.

    Do multiple links to our own domain(s) trigger the spam filter? Is there a way to create a 'safe list' of URLs for our forum that don't trigger the spam filter?

    • 4 replies
    1. KajMagnus @KajMagnus2024-05-07 23:06:52.949Z2024-05-08 07:45:35.448Z

      Do multiple links to our own domain(s) trigger the spam filter?

      Yes, multiple links (not just 2 but many, doesn't matter what domains)

      Is there a way to create a 'safe list' of URLs

      Not currently. But maybe it's better if this spam check was removed completely?

      It seems to me that nowadays "everyone" generates spam using ChatGPT and other LLMs, and the days of long lists with spammy links are over?

      I'm trying to release a new version this weekend — then I can either disable or remove this too-many-links "spam check", or add a turn-it-off feature flag.

      Later there'll be some ways to plug in LLMs to try to detect spam ... generated by other LLMs hmm. Probably such spam-detector-LLMs would recognize spammy-links-lists as such, so there wouldn't be a need for any separate configuration (?). But, hmm, probably the LLM would want to know the addresses to one's own domains, so in that way there's a need for some configuration nevertheless.

      Sorry for the troubles (What did the user end up doing in the end? To be able to submit their post / contact you)

      1. @daveb
          2024-05-08 12:38:02.993Z

          Yes, multiple links (not just 2 but many, doesn't matter what domains)

          Thanks!

          maybe it's better if this spam check was removed completely?

          I'm not sure. Historically, we haven't encountered much spam - I don't know whether that's because the filter catches malicious posters or whether we just don't have many bad actors visit our forum. In any case, for any spam that has been posted, we've been able to delete it quickly.

          If the check only flags spam if the post contains multiple URLs, then it may be worth removing. The spam we have seen has only been a block of text with single links (just SEO spam I guess).

          This particular post was a reply to one of my posts, so perhaps being more lax with some types of post would be helpful? For example: a user creates a post, an admin replies, the same user replies to the admin and the spam filter is more lax for this reply because they're now deemed 'safe'?

          I think this instance was the second time a user has mentioned URLs in a log triggering the spam filter, so it's not something we encounter often.

          What did the user end up doing in the end?

          In the end, they just didn't share the log and expressed frustration that they couldn't post it, which prompted my questions here.

          1. If the check only flags spam if the post contains multiple URLs

            Yes that was what it did. It was in a simplistic "quick-spam-check" function, called before maybe doing network requests to check for spam.

            Will be gone in the upcoming version. (It's quick to add back if it actually did anything useful)

            The spam we have seen has only been a block of text with single links (just SEO spam I guess).

            Ok, interesting. Same here (from what I remember)

            This particular post was a reply to one of my posts, so perhaps being more lax with some types of post would be helpful?

            Yes that's a good idea. I think included in the prompt to a spam-check-LLM, could be:

            • The description of the forum, maybe from the forum-intro-text.
            • The description of the current forum category.
            • The text in the parent and ancestor comments, back up to the Original Post & title. If the thread is long, maybe just the OP and parent comment.
            • The addresses to one's own websites (domains), and brief descriptions about them
            • The post author:
              • For how long they've been a member,
              • their trust level,
              • how many OK posts they've posted before, how many spam posts.

            (Some organizations might want to disable this, once implemented — let's say it's an internal forum and everyone is an employee. Or use their own in-house AI, no 3rd parties.)

        • In reply todaveb:

          (Status update: I'm about to code review a huge bunch of changes from the last weeks & deploy a new server ... Then, this spam filter fix will go live. 1.5 weeks I'm guessing)

          1. Progress
            with handling this problem
          2. @KajMagnus marked this topic as Planned 2024-05-07 23:07:22.946Z.