Author Topic: New forum search  (Read 3535 times)

0 Members and 1 Guest are viewing this topic.

Offline ngld

  • Administrator
  • 29
  • Knossos dev
There've been quite a few complaints about the existing forum search. So I thought I'd write my own as a fun side project.
You can see the result here: https://hlp-search.tproxy.de/

It's indexing every publicly available post. However, the indexing process hasn't finished (at the time of writing) so quite a few posts might be missing. The above link will let you know how far along the indexing process is.
The search engine itself is fairly straightforward. You enter you search query and get results. Advanced searches (i.e. find this -but -not -this, "Find this exact sentence.", show only results where +this word appears, etc.) are also possible. Here's the full documentation of the query syntax. The input field has autocomplete but that's just based on all of the indexed thread titles so it might lead to some interesting results. You'll also get suggestions like "Did you mean ...?" if you misspelled something.

Anyway, this was a fun project. Let me know if this is actually useful to anyone, if you're interested in the source code / how it works or want some kind of change.

EDIT: Mentioned issues were fixed. New posts are added to the index every 2 hours.
« Last Edit: December 06, 2018, 05:21:06 pm by ngld »

 

Offline jr2

  • The Mail Man
  • 212
  • It's prounounced jayartoo 0x6A7232
    • Steam
Can we get a link to it in the top navbar?  Maybe Search > ngld improved search -OR- legacy search?

 

Offline Goober5000

  • HLP Loremaster
  • Administrator
  • 214
    • Goober5000 Productions
That's jumping the gun; this should be thoroughly tested before we put links anywhere.  We don't want to overload ngld with tech support requests for this and Knossos. :)  Plus this is limited to publicly available posts, not the whole forum.

But this is cool!  Forum search has sucked for quite a long time; this is a welcome development.  If it proves to be substantially more accurate than the forum search, then I would be in favor of indexing the entire forum and making this the default.

 

Offline Novachen

  • 28
  • The one and only capella supernova
A nice addition, sure :)

I personally had never issues with the forum search, however. Was able to find everything so far without problems :).

I think the most problem peoples in forums have, that they simple search wrong :D
Female FreeSpace 2 pilot since 1999.
Global moderator in the German FreeSpace Galaxy Forum.

Responsible for the german translation project FreiRaum - FreeSpace auf Deutsch.
Also responsible for the Nova Upgrade Project, which upgrades and fix older campaigns to make them play- and solveable again with current builds and MediaVPs.

Release List:
German Translations:
Between the Ashes 1, FreeSpace Port, Silent Threat: Reborn, The Destiny of Peace, Awakenings & Deneb III.

Nova Upgrades:
A Walk in the Sun, Into the Halls of Valhalla, Luyten Civil War, Renegade Resurgence, Rogues!, Storm Front Saga, The Deuterium Connection & Venice Mirror.

 

Offline ngld

  • Administrator
  • 29
  • Knossos dev
From now on the indexer should add new posts every 2 hours which should allow you to find most recent posts (can do this more frequently but I don't want to put unneccessary load on the HLP server).

I've noticed that it's kind of slow if noone's used the search in a while (results take ~6 seconds in that case) but quickly gets faster with every following search request (something like ~3s, ~1.5s, ~0.9s). I haven't looked into why this happens but I guess the search server (Elasticsearch) empties it's memory cache when it sits idle for too long. Either that or the memory gets swapped out. I might be able to fix that but I'm not sure if anyone's actually interested in that. So far, I've seen less than 1 search per day.

It's possible to add additional filters (i.e. show only results from these threads / forum /...) and fairly simple. Indexing private stuff is more complicated because then the search frontend somehow needs to know which forums the user can access and restrict the search results to those forums. Most of the complicated stuff here is somehow retrieving that list (either from the forum DB or through a PHP script).

Finally, it'd be possible to make the default search fields in the forum use my search but as Goober said, I'd rather make sure it works as expected first. I could probably add a link the my search engine which allows you to run the same search using the forum's search. Something like "retry with the old forum search". Though that'd only help if we made my search engine the new default. We could probably make this a profile setting or something like that... then everyone can decide for themselves which engine they want to use by default.

 

Offline Nightmare

  • 210
If you think it's stable enough you could post it in the "News:"-line on top of every page. Ask people for feedback/testing, stuff like that.

 

Offline ngld

  • Administrator
  • 29
  • Knossos dev
It's not going to crash or anything like that... worst possible outcome would be that it doesn't return the results you expect it to. Though that's pretty unlikely given that all of the actual searching and indexing are handled by Elasticsearch. The only thing I did was writing some python scripts to feed it the forum posts (~600 lines) and a simple web frontend that allows you to send queries to the server and formats the results (~100 lines). With that amount of code, there's not much that can go wrong...

  

Offline Nightmare

  • 210
Well than go for it! You're an Admin now! :D
(and I'd guess most people have read about FSMods in the meantime)

 

Offline ngld

  • Administrator
  • 29
  • Knossos dev
Eh, why not? Would be nice if there was a way to hide old news entries instead of just deleting them... would give you a way to reenable them later. Anyway, I've saved the old news entry in case someone wants me to restore it.

 

Offline Nightmare

  • 210
Couldn't you just put them under each other? If you turn the first line into a one-liner, say, "ngld is testing a new forum search which hopefully works better than the existing one. It currently only indexes public posts. Feedback is welcome!", there should be plenty of space (and the other message wasn't that long either).

 

Offline ngld

  • Administrator
  • 29
  • Knossos dev
I could just leave it enabled. The forum would then cycle between the two but as you said before... it's been a while since we restored FSMods so the message isn't that relevant anymore.

 

Offline PIe

  • 28
  • GTVA POLICE
    • freespace3.com
It's possible to add additional filters (i.e. show only results from these threads / forum /...) and fairly simple. Indexing private stuff is more complicated because then the search frontend somehow needs to know which forums the user can access and restrict the search results to those forums. Most of the complicated stuff here is somehow retrieving that list (either from the forum DB or through a PHP script).
That would be nice.  Another enhancement would be to search among a specific user's posts and topics.
Right now I don't have a lot of feedback because I don't search the forum that much, but when I do, I'll try to compare results.  One thing I do like is that, unlike in the forum search, the query is embedded in the URL.
[6:59 PM] Admiral Nelson: who made the "New Folder" campaign?
[7:00 PM] Cobbles: Never heard of that one
[7:00 PM] PIe: best campaign name ever
[7:00 PM] PIe: I really liked the sequel "Copy of New Folder"
[7:01 PM] Admiral Nelson: no way
[7:01 PM] Admiral Nelson: Copy of New Folder (2) was waayyy better
[7:01 PM] GayShivan: Now now
[7:01 PM] GayShivan: Let's talk about the spinoff, Shortcut of Copy of New Folder (3)

[6:23 PM] PIe: why do I have the feeling that I shouldn't be able to give orders to 22nd armored hq
[6:24 PM] Axem: 22nd armored hq, i order you to get me a cup of coffee
[6:24 PM] PIe: and donuts
[6:24 PM] PIe: BECAUSE THIS IS THE GTVA POLICE
[6:25 PM] Axem: :O
[6:25 PM] Axem: am i under arrest
[6:26 PM] [`_`]/: no, just please step out of the myrmidon
[6:26 PM] [`_`]/: you have so much to fred for

 

Offline jr2

  • The Mail Man
  • 212
  • It's prounounced jayartoo 0x6A7232
    • Steam
Discord has a nice list of options for searching:

from: user
mentions: user
has: link, embed, or file
before: date
during: date
after: date
(replace in: channel with in: thread # or title?)


Not saying I'm requesting all of that, just having a working forum search is great! But if the list inspires you to implement a few of those, even better!  :cool:

 

Offline Nightmare

  • 210
One thing I'm wondering about- does this search somehow mess with the way the HLP forum software registers page views? Yesterday evening, the "Shattered Stars"-thread had around 71.000 views, now it already has 2500 views more and I've noticed a similar rapid increase on a few other threads as well. I'm sorry if I'm wrong and some botnet is responsible for that (though it doesn't do any harm, it just alters the statistics), I'm just surprised as forum activity doesn't seem to explain that alone.

 

Offline ngld

  • Administrator
  • 29
  • Knossos dev
@PIe: Thanks!

@jr2: Noted. User / before / during / after are definitely possible since that's already info I have stored (just need to parse the dates). Links and mentions sound more complicated since it requires me to analyze the post content. Should be possible though.

The indexer just looks at the recent posts page and sometimes goes to a forum to get a list of recent topics. Those are the only pages it automatically accesses. If the forum software is counting accessing the recent posts page as a view for each thread listed... that would be pretty dumb and still couldn't explain this since that comes out at 10*12 = 120 requests pre day (the recent posts page has 10 pages and the indexer runs every 2 hours so 12 times per day). And that would be the worst case scenario since the indexer stops going through those pages once it sees a post it already has seen.
« Last Edit: December 08, 2018, 06:26:19 am by ngld »

 
Is there way to sort search results by a date?

 

Offline Nightmare

  • 210
Ah OK my bad, I don't know anything about how search engines work, sry... :sigh:

Still I'm curious what's going on there, the Shattered Stars thread gained another 10.000 views since yesterday (85.220).

 

Offline PIe

  • 28
  • GTVA POLICE
    • freespace3.com
I've submitted a DuckDuckGo bang for !hlp.  If you don't use DDG, bangs are a quick way of searching a specific site.  For instance,
Code: [Select]
!gh freespace 2 will search GitHub for "freespace 2".
I don't know how long it takes to get a new bang approved or even how strict they are at approving them.
As for results, after just a few searches, it does seem better.  For instance, I searched for "new leviathan" and the integrated forum search was practically useless while this one came up with some useful results.
[6:59 PM] Admiral Nelson: who made the "New Folder" campaign?
[7:00 PM] Cobbles: Never heard of that one
[7:00 PM] PIe: best campaign name ever
[7:00 PM] PIe: I really liked the sequel "Copy of New Folder"
[7:01 PM] Admiral Nelson: no way
[7:01 PM] Admiral Nelson: Copy of New Folder (2) was waayyy better
[7:01 PM] GayShivan: Now now
[7:01 PM] GayShivan: Let's talk about the spinoff, Shortcut of Copy of New Folder (3)

[6:23 PM] PIe: why do I have the feeling that I shouldn't be able to give orders to 22nd armored hq
[6:24 PM] Axem: 22nd armored hq, i order you to get me a cup of coffee
[6:24 PM] PIe: and donuts
[6:24 PM] PIe: BECAUSE THIS IS THE GTVA POLICE
[6:25 PM] Axem: :O
[6:25 PM] Axem: am i under arrest
[6:26 PM] [`_`]/: no, just please step out of the myrmidon
[6:26 PM] [`_`]/: you have so much to fred for

 

Offline jr2

  • The Mail Man
  • 212
  • It's prounounced jayartoo 0x6A7232
    • Steam
I've submitted a DuckDuckGo bang for !hlp.  If you don't use DDG, bangs are a quick way of searching a specific site.  For instance,
Code: [Select]
!gh freespace 2 will search GitHub for "freespace 2".
I don't know how long it takes to get a new bang approved or even how strict they are at approving them.
As for results, after just a few searches, it does seem better.  For instance, I searched for "new leviathan" and the integrated forum search was practically useless while this one came up with some useful results.

Nice