Feature request: Search API
I don't remember if we've discussed this elsewhere.
As I do remember, we talked about it quite a bit, but it's a very long time ago.
We're doing various integrations now, where we try to make it easier for users to find the forum and to use it more.
A great way to get them to do that, is if we can integrate forum search into our app and website.
The API should ideally be accessible publicly without authentication, but we can also live with it if it has to go through our server first (at least for a start - even though it will make things a bit slower for the user).
We could live with the API just returning the top 5 or top 10 results as a starting point.
It would be ideal if it returns:
- Title
- Short summary (possibly with emphasized keywords)
- Link
- KajMagnus @KajMagnus2020-04-20 03:43:29.932Z2020-04-20 03:55:10.522Z
Good idea, what about this public API in the upcoming version:
GET http:// ty server /-/v0/search?q=UX+improvements
and the response: (note: matching phrases are marked with the HTML
<mark>
tag, inhtmlWithMarks: ...
below){ "searchResults" : [ { "pageTitle" : "support-chat", "pageUrl" : "http://site-3.localhost/-31", "postHits" : [ { "isPageTitle" : false, "isPageBody" : false, "htmlWithMarks" : [ "Probably such an iframe could be a bit better looking and <mark>UX</mark> friendly (maybe clickable author names)" ] } ] }, { "pageTitle" : "Potential UX improvements", "pageUrl" : "http://site-3.localhost/-334", "postHits" : [ { "isPageTitle" : false, "isPageBody" : true, "htmlWithMarks" : [ "All of this is about trying to <mark>improve</mark> the forum so it doesn't require so much interaction from us on", "Right now it isn't an <mark>improvement</mark> with the draft UI interleaved.", "Generally, <mark>UX</mark> changes that are only half-done means we have to spend time with reporting feedback when", "helpful if there could be a more well-tested/documented approach for TY to introduce changes to the <mark>UX</mark>" ] } ] ...
The response, as Typescript interfaces:
interface SearchResultsApiResponse { searchResults: PageAndHits[]; } interface PageAndHits { pageTitle: string; pageUrl: string; postHits: PostHit[]; } interface PostHit { isPageTitle?: boolean; isPageBody?: boolean; htmlWithMarks: string[]; }
isPageTitle
can be good to know, because maybe you don't want to both show the title, and include a highlighted matching phrase from the title (because the the title text gets inclued twice).(What's a good thing to call the Original Post? Above, it's "Page body":
isPageBody?: boolean
. But maybe people confuse that with the<body>
html tag? MaybeisOrigPost
would be better? But what if it's not a forum post, but an article? What aboutisArticleText
? But what if it's not an article, but a forum post? Hmm)If
!isPageTitle && !isPageBody
, then the post is a reply (to the orig post, or to someone else).Maybe some time later, there could be an
isAcceptedSolution
field too?- CChristian Scheuer @chrscheuer
Yay - looks great!
I think we'd like to have the category path and the last modified date. By category path I mean for example Packages -> Soundminer (since we have subcategories). These paths should have some kind of ID with them as well.
Would it make sense to have the username of the posts and/or pages that were hit? At least the author of the page I think would be good to have so we can show them with a little image.Wrt using GET and querystring, I'm thinking this would be the start of the API, but it would likely be something that we'd want to augment in the future.
For example to add:- Search only in certain categories
- Search only in certain tags
- Potentially paging
For these reasons, I feel like a POST with json could potentially be more flexible. I seriously hate URL serialization/deserialization haha, everybody always gets it wrong.
We also need to think about if it returns only public material (I think it should by default)
- KajMagnus @KajMagnus2020-04-25 06:45:34.105Z
category path and the last modified date
Yes (and that'd be nice to include on Talkyard's own search results page too).
the username of the posts and/or pages that were hit? [...] author of the page
Yes
so we can show them with a little image
The person's avatar image?
POST with json could potentially be more flexible
I think so too — internally, Talkyard has both a GET API, so queries can be linked via a URL. And a POST API, for the reasons you mentioned. Now I changed the public API to POST. A basic version (without the things mentioned above unfortunately) will be included in the upcoming version.
The API wants JSON that looks like:
{ searchQuery: { queryText: "..... " }, pretty?: bool }
. If thequeryText
is like:" ... text text categories:category-url-slug,another-cat-slug"
then only those categories will get searched.We can add a separate
categoryRefs: ...
field next toqueryText
later, and then you can refer to the categories via ext-id instead, so the search functionality won't break if you change their URL slugs.- KajMagnus @KajMagnus2020-04-26 22:14:16.189Z
@chrscheuer — I'm adding author names and avatar url, + category name and URL (not the complete category path yet though).
Someone mentioned an API endpoint for listing popular pages in a category, (here)
and I thought it'd be nice to implement both the search API, and that other list-things API,
and see how they a bit can share code and Typescript interfaces, with author names etc included.- CChristian Scheuer @chrscheuer
Super cool. Let me know when it's up on either server so I can make some tests :)
With regards to the tagging system, also let me know when/if you'd like to discuss it further. I think we may start implementing our own tagging system for now so we can get something up and running very quickly and then we can switch to the forum's system once it's ready.
- KajMagnus @KajMagnus2020-04-30 07:09:59.122Z2020-04-30 07:19:28.020Z
I've upgraded this server Ty .io — your server, Ty .net, in 2 days I'd think (that is, Saturday).
Meanwhile — here's the modified Search API:
https://github.com/debiki/talkyard/blob/40ff70deb434d16f5d833ae8005158f873671637/tests/e2e/pub-api.ts#L292(The changes: Search query field renamed from
queryText
tofreetext
. And the search results are in athingsFound
array, instead ofsearchResults
, andpostsHit
is nowpostsFound
. "Found" sounds more nice than "Hit" I think, some time later, when searching for people:ParticipantFound[]
instead ofParticipantHit[]
.)(If you scroll up and look at
type FindWhat = 'Pages' | 'Members' | ...
andinterface LookWhere { ...
, then, Ignore the comment aboutReferencedThings object
— I forgot to delete that comment.)B.t.w. the only thing I've actually implemented this far, is:
POST /-/v0/search { searchQuery: { freetext: "... search query ..." } }
( + a list query, for listing the most popular pages, in a specific category:
/-/v0/list { listQuery: { findWhat: 'Pages', lookWhere: { inCategories: ['extid:the_categorys_ext_id'] }, } }
)
- CChristian Scheuer @chrscheuer
This all looks brilliant - great with your ElasticSearch guides on compound queries as well!
Love the scrollCursor placeholder too.- CChristian Scheuer @chrscheuer
Does lookWhere.writtenBy accept ssoid user IDs?
- KajMagnus @KajMagnus2020-05-02 18:08:00.815Z2020-05-02 18:15:34.228Z
accept ssoid user IDs?
Not yet, but yes, that's the idea:
writtenBy: ['ssoid:...', 'username:...', 'username:could_be_a_group' ]
.Sorry seems I won't upgrade the server until tomorrow
- In reply tochrscheuer⬆:KajMagnus @KajMagnus2020-04-30 08:07:28.750Z
tagging system, also let me know when/if you'd like to discuss it further. I think we may start implementing our own tagging system for now so we can get something up and running very quickly and then we can switch to the forum's system once it's ready
I think the nearest weeks I won't have time to look into the tagging system. Probably I should do OpenID Connect first. — Also, maybe in a way it'd be good if you build your own tags? Then, you can tell me how to implement tags in Talkyard in a way that works for you (and you seem to have a slightly more advanced need for tags than most organizations (?), so, what works fine for you, would work fine for almost everyone I'd think).
B.t.w. one thing: I think I'd like the unique identifier of a tag to be a numeric ID, but not the tag label. So one can rename a tag, without having to re-index all pages tagged with that tag. (In ElasticSearch, the page would be connected to that never changing tag numeric ID, no need to reindex the pages, if renaming a tag label — the ID didn't change)
- CChristian Scheuer @chrscheuer
Completely agree. That's also why I just thought we could start on our own - it will be easier to show you what we want by having something that already works :)
- Progresswith doing this idea
- KajMagnus @KajMagnus2020-05-03 14:21:49.969Z
@chrscheuer Now I upgraded the Ty .net server,
So now you can try out the search API / start using it. Here's an end-to-end test:
https://github.com/debiki/talkyard/blob/40ff70deb434d16f5d833ae8005158f873671637/tests/e2e/specs/api-search-full-text.test.tsHere's the search API request:
https://github.com/debiki/talkyard/blob/40ff70deb434d16f5d833ae8005158f873671637/tests/e2e/utils/server.ts#L494 - CChristian Scheuer @chrscheuer
I'm getting a CORS error when trying to test this:
Access to fetch at 'https://forum.soundflow.org/-/v0/search' from origin 'http://localhost:8080' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.
- CChristian Scheuer @chrscheuer
It's okay that I can't test from localhost, but would it be possible to white-list the soundflow.org domain for CORS requests?
- KajMagnus @KajMagnus2020-05-31 17:32:46.328Zreplies tochrscheuer⬆:
What if I add a CORS domain whitelist config field in the admin area? Then the site admins can decide (i.e. you'd type
soundflow.org
or maybe*.soundflow.org
in that field) - CChristian Scheuer @chrscheuer
Yea that'd be great!
We have a very tight beta deadline again this round by the way :) Releasing a large new version on June 15, which means this coming week is the cutoff for features so we have enough time to beta test.
Do you think this and the other issue is possible to get looked at this week? If not that's just good to know, then we'll build in some workarounds for the features (search via our server etc.). - KajMagnus @KajMagnus2020-06-01 07:48:26.028Zreplies tochrscheuer⬆:
Do you think this and the other issue is possible to get looked at this week?
1) The other issue, yes. 2) This CORS issue: I think so but I'm not totally certain — looks more complicated to add per site CORS to the web framework, than what I thought.
I can post a status update tomorrow (that'd be fine? I mean, not too late)
- CChristian Scheuer @chrscheuer
That would be great - update tomorrow is fine. We can work around the CORS issue by sending through our own servers (even though it will make it slow for users) so would be great to get the markdown issue fixed and then see how far we can get with CORS.
- KajMagnus @KajMagnus2020-06-02 15:21:24.114Zreplies tochrscheuer⬆:
I got the markdown issue fixed (not code reviewed yet).
I can add CORS headers via Nginx, I'll give this a try later today or tomorrow. (That's a better approach than using the app server for that, anyway, long term, I think.)
- CChristian Scheuer @chrscheuer
Cool - thanks for the update!
- KajMagnus @KajMagnus2020-06-03 17:25:17.577Zreplies tochrscheuer⬆:
Hi @chrscheuer — I could repro the "Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header ... " problem and make it work by adding CORS headers (and some other changes — lots of CORS related things to read about).
These CORS requests are for
/-/v0/search
, right? Would you want people to do these search requests as "strangers", meaning, not logged in?Or, if a user is logged in, would you want his/her session cookie to be included in the CORS request, so the response can include topics not publicly visible? (but visible to him/her if s/he is logged in)
(Or, not sure if / how this could work, but maybe in the distant future, using a Bearer token or Basic Auth password somehow. Seems tricky to distribute such secret things on a per user basis though, hmm)
- CChristian Scheuer @chrscheuer
Nice!
Yea these are for/-/v0/search
. We log users in to their SF account when displaying the help panel which this is part of. But also roundtripping the forum SSO might be overkill for the search for now.
So basically yea it would be okay for the search to only display public results for now, and probably for any foreseeable future.I don't think the setting allowing the soundflow.org domain to post requests has anything to do with this question though, right? The CORS header is about allowing Javascript hosted on the domain soundflow.org to send HTTP POST requests to the search endpoint - not about whether or not we should send session cookies - or am I misunderstanding something?
- KajMagnus @KajMagnus2020-06-04 14:33:22.309Zreplies tochrscheuer⬆:
The CORS header is about allowing Javascript hosted on the domain soundflow.org to send HTTP POST requests to the search endpoint
Yes (and also GET, PUT, DELETE).
not about whether or not we should send session cookies
Those POST requests can optionally include one's session cookie. Then, there'd be CORS POST requests with the Soundflow user's Talkyard session cookie — then, Talkyard would look at the cookie and know who the user over at
soundflow.org
is and could include in the response access restricted topics s/he is allowed to see.But by default CORS requests don't send cookies. I think that's a good start (i.e. to skip cookies), to see how things work out, with a bit less complexity now in the beginning, fewer things that can go wrong.
(Here's a bit about CORS and cookies — they call it "Requests with credentials": https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS#Requests_with_credentials )
it would be okay for the search to only display public results for now, and probably for any foreseeable future
Ok that sounds good (simpler & safer for now :- ))
- CChristian Scheuer @chrscheuer
Great :) Yea makes perfect sense to take it in two separate steps. Both in terms of ease of implementation now, security aspects and the fact that we can get very far with just un-authenticated (public) search.
How do you think the timeline would look for the simple CORS case? - KajMagnus @KajMagnus2020-06-05 05:52:22.629Zreplies tochrscheuer⬆:
think the timeline would look for the simple CORS case?
I've made that work (I think) — I have in mind to do code review today, then I can deploy an early version only to this server Ty. io tomorrrow, allowing CORS requests from
soundflow.org
, and then you can try it out?
Maybe you / we will find something we didn't think about. - CChristian Scheuer @chrscheuer
Sounds great - that would be perfect.
Our release is scheduled for June 15, which means we'll be doing videos and integration tests next week. So it's important we plan for whatever we make work now to not stop working before the release... Just so we get the admin option to add the domain added before rolling back the feature (if we get it working). Hope that makes sense. - KajMagnus @KajMagnus2020-06-06 18:51:29.126Zreplies tochrscheuer⬆:
I'm building the new server now, will upgrade early tomorrow morning it seems.
whatever we make work now to not stop working before the release
I think it's very unlikely that there's anything in this CORS stuff that needs to be rolled back
(Actually I don't completely understand this sentence: "Just so we get the admin option to add the domain added before rolling back the feature (if we get it working" maybe there's some cut and paste weirdness?)
- KajMagnus @KajMagnus2020-06-07 04:22:44.024Z2020-06-07 04:43:38.422Zreplies tochrscheuer⬆:
Now the new server is running here on Ty .io, and it allows CORS from:
http://localhost:8080
andhttps://soundflow.org
.You can do this:
-
Copy this HTML page with CORS test helper Javascript and cURL to your localhost:
https://raw.githubusercontent.com/debiki/talkyard/master/tests/e2e/utils/ext-cors-site.html
(it's this: https://github.com/debiki/talkyard/blob/master/tests/e2e/utils/ext-cors-site.html ) -
Start a server at 8080:
./node_modules/.bin/http-server -p8080 dir/with/that/html/page/
-
Open dev-tools and type:
corsFetch({ url: 'https://www.talkyard.io/-/v0/search', POST: { searchQuery: { freetext: 'pri' + 'sm' }}, onDone: function(rsp) { logToPageAndConsole(rsp) }});
You can also try the cURL examples, and change:
-H "Origin: http://localhost:8080"
to-H "Origin: http://the.wrong.origin"
to see what'll happen -
- CChristian Scheuer @chrscheuer
It works!! Thank you so much for this quick fix. Let me know when this is ready on forum.soundflow.org :)
If there's any chance of it working tomorrow morning (we're doing a live demo with a press reporter) that would be amazing. - KajMagnus @KajMagnus2020-06-09 09:22:59.563Zreplies tochrscheuer⬆:
I just upgraded the server — you can go here:
/-/admin/settings/features
and check Enable Cross-Origin Resource Sharing (CORS)
and then type, on 2 separate lines:
http://localhost:8080 https://soundflow.org
in the text box that then appears (but don't end with a slash, don't:
https://soundflow.org/
) - CChristian Scheuer @chrscheuer
AMAZING!! It works :)
Here it is in action in our app:
THANK YOU MAGNUS!!!
- KajMagnus @KajMagnus2020-06-09 09:39:59.419Zreplies tochrscheuer⬆:
Ok :- ) Looks nice in the app I think, seems user friendly
- CChristian Scheuer @chrscheuer
Haha yea... One of these days when you get a Mac we have to get you on board ;)