No internet connection
  1. Home
  2. Ideas

API for ChatGPT

By @Mr.Nobody
    2024-01-23 04:26:36.699Z

    ChatGPT now allows users to train their ChatGPT with own data. This is very useful for forums and for companies that would like to train custom AI bots for many applications.

    Are there any plans of making an API like that available on Talkyard?

    From the ChatGPT Help Files:



    It would be nice to get an API that returned:

    • each article with title + body + replies+ URL's.
    • and only including "public" data: no data from hidden forum categories, no user data like emails, no forum settings or any metadata. Just the content the AI needs to make smart decisions.
    • Include info like "best answer", "likes" "replies" etc to help the AI weigh what content to prioritize.

    Simplified example:

    {
        "forum": {
            "topics": [
                {
                    "topic_id": "001",
                    "title": "The Future of AI",
                    "comments": [
                        {
                            "comment_id": "001-01",
                            "author": "User123",
                            "timestamp": "2024-01-23T10:00:00Z",
                            "content": "What do you think the future of AI looks like in the next 10 years?",
                            "url": "https://forum.example.com/the-future-of-ai/001-01"
                        },
                        {
                            "comment_id": "001-02",
                            "author": "User456",
                            "timestamp": "2024-01-23T11:00:00Z",
                            "content": "I believe AI will become even more integrated into our daily lives, from smart homes to autonomous vehicles.",
                            "url": "https://forum.example.com/the-future-of-ai/001-02"
                        }
                    ]
                },
                {
                    "topic_id": "002",
                    "title": "Healthy Eating Habits",
                    "comments": [
                        {
                            "comment_id": "002-01",
                            "author": "User789",
                            "timestamp": "2024-01-23T12:00:00Z",
                            "content": "What are some healthy eating habits that you have incorporated into your lifestyle?",
                            "url": "https://forum.example.com/healthy-eating-habits/002-01"
                        },
                        {
                            "comment_id": "002-02",
                            "author": "User234",
                            "timestamp": "2024-01-23T13:00:00Z",
                            "content": "I've started meal prepping with a focus on whole foods and balanced macros.",
                            "url": "https://forum.example.com/healthy-eating-habits/002-02"
                        }
                    ]
                }
            ]
        }
    }
    
    Solved in post #19, click to view
    • 19 replies

    There are 19 replies. Estimated reading time: 21 minutes

    1. KajMagnus @KajMagnus2024-01-24 16:24:19.937Z2024-01-24 16:31:44.598Z

      I have questions. But first:

      From what I've read: https://news.ycombinator.com/item?id=38759877, one typically won't train a LLM on one's data, but instead use RAG (Retrieval Augmented Generation): One generates a search query based on an end user's question, and then the AI gets the question and search query responses. It then writes something helpful to the end user, looking at all that.

      Talkyard has a search API that could be used for that, if ElasticSearch search query responses are good enough. (Talkyard uses ElasticSearch.)

      However, vector search databases are better than standard ElasticSearch for these things (there's an Elastic Search Relevance Engine though for vector search but I don't know much about it).

      Maybe you've read about RAG and vector databases already? If not, then, in the above-linked discussion, someone describes vector databases & RAG & LLMs working together: https://news.ycombinator.com/item?id=38761209 (If you start reading there, & continue a little bit downwards to "Essentially you take any decent model trained on ..." β€” that comment explains vector databases).

      Now, questions:

      Would you want to 1) input the forum data into a vector database yourself, and thereafter connect that vector database with a GPT?

      Or, 2) maybe the GPT custom actions would connect directly to Talkyard's API, and build a search database for you? And you instead just need an API that lets the GPT download all public data? (I'm thinking this looks closest to what you're describing in the OP?)

      Or 3) you'd like to tell the GPT what API endpoints it can send search queries to, and get back a list of results and links? (that'd be /-/v0/search?q=...)

      Or 4) something else :- )

      It would be nice to get an API that returned:
      each article with title + body + replies+ URL's. [...]

      There's an API for exporting one's whole site as JSON. Maybe if this API could be restricted so only publicly visible things were exported in the JSON dump, that'd be what you needed?

      I'd guess the current JSON structure is not the best for consumption by a vector database. Maybe there could be an external tool that reshapes the JSON into [a better structure for vector databases], or there could be some Talkyard API parameter that says "Give me a JSON dump in vector-database-input friendly format"? Hmm.

      1. @Mr.Nobody
          2024-01-24 17:44:37.926Z2024-01-24 17:55:00.507Z

          Hey Kaj, that goes over my head. I'm not coding AI tools from the ground up so I don't go that deep into how it works in the background.

          I simply provide the AI with access to files, and it learns about the things in there. This can be as simple as exporting a text file. I am currenlty using the json export from talkyard:
          https://forum.talkyard.io/-/export-site-json

          And this works really well. The AI can answer any questions in any way I phrase it. This worked in both ChatGPT's AI builder as well as Taskade's AI builder.

          However, while this export works for testing purposes, it exports all the user's private data with it, which we can't use for public access AI's. Also: I have to first save it as a text file, since the AI can't access that URL on its own. So it takes a lot of manual work each time.

          So for my current use, a url like the site export, but without hidden categories, without forum configuration/settings, and without user personal data, would be more than enough. API access would be safer, of course, but not a deal breaker.

          I read the link you shared and that makes sense, but as an end-user I don't really care HOW the AI learns about the data, as long as it does πŸ˜†
          I am guessing that the AI tools do the work described in the article on their own when they say they are "training the AI based on the data".

          1. KajMagnus @KajMagnus2024-01-29 14:36:54.079Z2024-01-29 14:45:35.660Z

            Then I understand better :- ) I just signed up for an OpenAI account (was going to do anyway) and now I can have a better look at GPTs and how to create them.

            How interesting that you've tried the JSON export already :- ) Yes sounds like a good idea to, for a start, use that JSON export, but with some changes: no private user data, and only public categories.

            And, sounds as if it'd be nice if there was some way to automatically & incrementally update one's GPT with new topics & comments that has been posted in the forum, since forum data was uploaded to the GPT the first time. Maybe it's possible somehow, I guess I'll find out (or I or you can ask in their support forum).

            And if it was possible to not just add more & more recently created topics to the GPT, but also update old already existing topics in the GPT: Maybe an answer got lost of Like votes and was selected as a good answer, some days after that discussion had already been uploaded to the GPT.

            So, incrementally "backing up" one's forum to the GPT's "memory", and also incrementally overwriting out-of-date parts of its "memory". (I'd suppose the "memory" in this case, is a vector database.)

            1. @Mr.Nobody
                2024-01-30 02:26:19.536Z

                I have been wondering if I could store the data in a google docs online document, and share that with the GPT. That way, whenever the file is updated, the GPT can in theory resync.
                I use Taskade (https://www.taskade.com/) and they have some really nice customizable AI Agents, and those accept Google Docs url's as input files for the knowledge base.

                In other words: if you could serve it into an online document, or in a format that can be cleanly imported to a sheets (csv?), then it could become super easy to update the data.

                I might try that with a simple word document to store the json, just to confirm that it updates the knowledge when the document is updated. But I'm 90% sure it works, there is even a "resync" button to rescan the knowledgebase.

                1. I'll write a bit more later, just a quick reply: I spoke with ChatGPT a bit yesterday, and I came to think that what you (and I) want, is in fact the search API, and a fetch-the-relevant-parts-of-a-discussion API. Then, the GPT will use that API, and always have up-to-date data, and there's no need to upload files.

                  So, the GPT would use the /-/v0/search?q=... and /-/v0/get API endpoints, to search for and then fetch the relevant Q&A pages. Instead of export-site-json. It wants an Open API specification of these endpoints, so it understands how to use them ... I'll look into creating such a specification, and maybe tweaking the endpoints a bit so they'll be more AI friendly. Probably while doing this I'll experiment with creating a GPT me too, to know that it actually works.

                  1. @Mr.Nobody
                      2024-02-11 21:20:57.497Z

                      That sounds amazing! This will be so good to have!

                      1. And fun too :- )

            2. In reply toMr.Nobody⬆:

              About the Custom Actions in the GPT docs:

              You can make third-party APIs available to your GPT by providing details about the endpoints, parameters, and a description about how the model should use it

              How would you want the model to use the API? β€” I guess that's what I'm trying to ask in my other comment.

              If you had the API in the "Simplified example" in the OP, what would [the description to a GPT about how to use the API] look like?

              1. @Mr.Nobody
                  2024-01-30 02:23:06.225Z

                  Oh, for some reason I was assuming that the ChatGPT endpoint would return something akin to your json export. Just an endpoint that returns "everything".
                  However, after reading your question, I realize that might not be the way it's intended to be used.

                  I dont have the answer to that. I'll see if I can dig anything up on that.

                • In reply toMr.Nobody⬆:

                  Status update: I'm adding some API data that ChatGPT said it wants, hmm I hope I asked the right questions. Mostly done, code review remains.

                  1. In reply toMr.Nobody⬆:

                    Now a new get-Q&A-pages API endpoint for LLMs is live here at the Ty .io forum :- ) β€” about to try it out by creating an OpenAI GPT, ... not impossible I'll need to make small changes.

                    1. The GPT works, a little bit β€” this is using an OpenAPI spec and the /-/v0/search endpoint only: (I'll test with the new fetch-whole-Q&A-pages endpoint too soon)

                      You
                      Does Talkyard have a ChatGPT integrartion? When calling TySeV0Search endpoint, always specify a relevant search query in the "params.searchQuery.freetext" request body field (otherwise the server will error out).

                      Ty Test Bot
                      [debug] Calling HTTP endpoint

                      {
                        "domain": "forum.talkyard.io",
                        "method": "post",
                        "path": "/-/v0/search",
                        "operation": "TySeV0Search",
                        "operation_hash": "4f0ff6ed9290a6c56f2c5eb74294f70e120dc13d",
                        "is_consequential": true,
                        "params": {
                          "searchQuery": {
                            "freetext": "ChatGPT integration"
                          }
                        }
                      }
                      

                      [debug] Response received

                      Talked to forum.talkyard.io

                      Yes, there is a discussion on the Talkyard forum about the possibility of integrating ChatGPT into Talkyard. A user initiated a topic titled "API for ChatGPT", expressing interest in the usefulness of ChatGPT for forums and companies wishing to train custom AI bots for various applications. The discussion explores the idea of making an API available on Talkyard that could facilitate this kind of integration​​.

                      This indicates that there is interest and consideration for ChatGPT integration within the Talkyard community, and efforts may be underway to explore or implement features that allow ChatGPT to be utilized within Talkyard's platform. However, for the most current status or detailed information on such integration, it would be best to check directly on the Talkyard forum or contact their support.

                    2. In reply toMr.Nobody⬆:

                      B.t.w. what user interface are your end users going to use? Will there, for example, be a chat widget on your main website? Which uses a OpenAI GPT that looks in the Talkyard forum, but maybe also looks at your website and documentation?

                      (Or, if not a chat widget, how is it going to work :- ))

                      1. @Mr.Nobody
                          2024-02-16 22:12:45.930Z

                          We want to have a chat widget and a Discord Bot, but for starters we would use it in the default ChatGPT view, just add it as a custom GPT so we can use it internally for our team, and later expand on that. ChatGPT has been releasing new products one after another, so I'm hoping they will release new tools to make those integrations easier.
                          But our main use case is to help with the team's documentation effort.

                          1. I'll have a look at this again tomorrow or on Wednesday, sorry for the delay. Thanks for the info

                        • In reply toMr.Nobody⬆:
                          KajMagnus @KajMagnus2024-02-28 05:26:13.448Z2024-02-28 05:36:03.989Z

                          Here's a GPT (anyone with the link can view) that answers questions about Talkyard by searching this forum:

                          https://chat.openai.com/g/g-9wQgy4F2g-ty-test-bot

                          Here's its configuration: (it's called Ty Test Bot)

                          Instructions

                          Ty Test Bot specializes in answering questions related to the software called Talkyard. It leverages an API to access a forum dedicated to Talkyard, where it can search for, fetch, and analyze pages of questions and ideas previously posted by users. When a GPT user asks a question, Ty Test Bot will run a search query through the forum's API (that's the "/-/v0/search" URL path), look at the search response text and compose a reply. When calling the search API, always specify a relevant search query in the "params.searchQuery.freetext" request body field (otherwise the server will error out).

                          The bot is designed to navigate the specifics of Talkyard software efficiently, making it an invaluable resource for users seeking solutions or insights into using Talkyard. Ty Test Bot avoids topics outside the scope of Talkyard software to ensure relevance and accuracy in its answers.

                          If the search API response is not enough, Ty Test Bot will also retrieve relevant Q&A pages, by calling the TySeV0Get API to fetch the most relevant looking pages in the search response, and use the information found to compose accurate and helpful responses.

                          (I haven't added any custom action that makes that last paragraph work, i.e. retrieve Q&A pages. Right now it uses the search API only. β€” B.t.w. ChatGPT generated most of the above instructions itself, by asking me questions. Seems it likes buzzwords like "leverage", "invaluable", "dedicated" etc :- ))

                          Custom Action authentication

                          None. It finds only public pages. Hmm, but your forum is private, so this won't work for you? I wonder, what parts of your forum do you want a GPT to have access to? Maybe some parts are for moderators only?

                          You can generate an API secret, but it would let the GPT search all parts of the forum (the secrets are too coarse grained, currently). There's going to be API secrets that can see only what a user group of your choosing can see, but this hasn't been implemented β€” maybe now is the time to do that?

                          Custom Action API spec

                          Here's an OpenAPI spec for a Custom Action that searches this forum:
                          https://github.com/debiki/talkyard/blob/main/docs/api/openapi-TySeV0Search-test.json

                          ReplySolution
                          1. In reply toMr.Nobody⬆:
                            @Mr.Nobody
                              2024-03-29 21:12:28.934Z

                              Hi @KajMagnus, it took me a while to find time to test all this out. I had to tinker a bit until I got it running, but now it seems to be working fine!

                              it is a bit "dumb" as in: it will only mention things found in the current search results, which makes sense.
                              so to add to this, I'm gonna add an additional knowledgebase file for the bot to have an overview of things, as well as better knowledge of the forum structure and categories.

                              but all in all, this seems to be working nicely, good job man! and as a bonus: the bot doesn't have access to the restricted categories, which is a win for us :)

                              I couldn't get the API access to work for some reason, but once the forum is live we wouldn't need that anyway :)

                              1. it will only mention things found in the current search results

                                Yes, I think this is a first step only. Some time later, I can look at making a get-whole-Q&A-pages endpoint working, so ChatGPT can fetch more details, so it won't be constrained to only the info in the search response.
                                And write docs about ChatGPT and API access (if you want some private categories to be searchable e.g. for people who are already members).

                                It'll still be based on the search results β€” that's how sort of all AI answer bots work: RAG, retrieval-augmented generation (retrieving, via search). Still, I think there's lots of room for improvements, e.g. improving the search results, and combining with different search techniques, e.g. vector based search.

                                1. In reply toMr.Nobody⬆:

                                  add an additional knowledgebase file for the bot to have an overview of things, as well as better knowledge of the forum structure and categories

                                  That sounds like a good idea. If you get it working, could you send me a message? Maybe it'll be something Talykard can generate automatically some time in the future, in an AI friendly format somehow

                                2. Progress
                                  with doing this idea
                                3. @KajMagnus marked this topic as Planned 2024-01-29 14:46:56.789Z.
                                4. @KajMagnus marked this topic as Started 2024-02-13 14:01:09.168Z.