No internet connection
  1. Home
  2. Documentation
  3. How To

How to Import Discussions to Talkyard

By KajMagnus @KajMagnus2021-10-18 21:34:07.101Z2021-10-19 12:58:00.439Z

If you have an already existing discussion forum, you can import it to Talkyad, by generating JSON for the users, discussion pages and categories there.

Do as follows:

  1. Somehow generate JSON with the structure Import Format v0.2021 shown below. This includes writing (lots of?) code. (Or re-using some import script someone else has already created, for the discussion software you're exporting from. Currently we've done this for Disqus blog comments only.)

  2. As admin, generate an API secret — see Talkyard API authentication.

  3. Clone the Talkyard Git repository and build a certain to-talkyard Nodejs app:

    git clone https://github.com/debiki/talkyard.git
    cd talkyard/
    cd to-talkyard/
    yarn build
    

    Instead of installing Nodejs and Yarn yourself, you can install Nix-shell, and type nix-shell in the talkyard/ directory — then you'll get a shell with Nodejs 14 and Yarn. See the docs, scroll down to "Install Nix-shell".

  4. Send (via a HTTP requsest) the JSON to the /-/v0/upsert endpoint: (Which later probably will get renamed to /-/v0/import)

    nodejs to-talkyard/dist/to-talkyard/src/to-talkyard.js  \
        --talkyardJsonPatchFile=JSON_FILE  \
        --sysbotApiSecret=SECRET  \
        --sendTo=TALKYARD_SITE_ORIGIN
    

Import Format v0.2021

Concepts

ID numbers

(These ID numbers might be a bit unnecessarily complicated. Some day in the future, they won't be needed.)

The things in your JSON import file will refer or "link" to each other — for example, a reply in a discussion, refers / links to the reply it's replying to, and also to the discussion page (forum topic) where they both are placed. And the replies and pages refer to the users (people) who authored the replies and pages.

However the things in the JSON import file don't yet have any real ID numbers — because they haven't yet been imported into Talkyard; they don't yet "exist" from Talkyard's perspective. So there're no real IDs to refer to.

Instead, your script that generates a JSON file, need to create its own ID sequences — and they should all start at 2 000 000 001 (or - 2 000 000 001 for guest accounts, see below).

For example:

  • If you import 3 pages, you'd give them ID numbers 2 000 000 001 and 2 000 000 002 and 2 000 000 003.

  • And 3 replies (need not be on the same page) would get ID numbers 2 000 000 001, 2....02, 2...03.

  • Replies also need a "post number", which is a per discussion page unique number. It's good if you sort these by time, so older posts, get lower numbers. These numbers should start at 2 000 000 001. (Post numbers need to be unique per page only.)

  • The Original Post, which is also a "post", shall have post number 1. And the title shall have post number 0. These are special post numbers, so Talkyard knows where the discussion starts.

  • Guests IDs (one-time light weight blog commenting accounts), should start at at -2 000 000 001. That is, guest accounts have negative id numbers; they aren't "real" accounts.

  • For simplicity, you could probably reuse the same ID sequence for everything. Talkyard doesn't care what the numbers are, as long as they are >= 2 000 000 001 (or <= - 2 000 000 001).

External IDs

If you re-generate your JSON import file some time later, maybe now it'll include more things — maybe your previous forum was still active and people posted new things. Then, it's good if you can re-import the new (and a bit bigger) JSON file, without Talkyard duplicating everything you imported the first time.

To prevent Talkyard from duplicating the things you insert, you can give each thing (each page, reply, user account) its own "external ID". They could be Disqus comment unique identifiers or maybe Reddit comment permalinks. (Thus they refer to things in some external system unknown to Talkyard — that's why they're called "external IDs" in Talkyard.)

Talkyard remembers the external IDs, and when you re-import the almost-same-but-a-bit-bigger JSON file, Talkyard will look at the external IDs in the JSON file and ignore the things that have already been imported.

The JSON structure

Here's the JSON structure, top level fiels:

{
  guests: [{... a guest...}, {... another...},  ...],
  users: [...],
  groups: [...],
  pages: [...],
  pagePaths: [...],
  posts: [...],
  categories: [...]
}

Guests

It's simplest if you import your external users as guest accounts (instead of as users and groups), for now. Then you don't need their email address, which you might not have if they're Reddit or Facebook or Disqus users.

When you import someone as a guest, currently s/he won't be able to login to that guest account, in Talkyard — but if you do have his/her email address, you can include it in the JSON object, and in that way you'd have a way to communicate with the person (by looking at his/her user profile and sending an email, which you can see if you're admin).

Here're two Guest JSON objects: (with quotes " removed around the field names)

guests: [{
  id: -2000000001,  // guest account ID, so a post can refer to this account as the author
  extId: "user-id-123001-in-external-system",
  createdAt": 1551661323000,   // Unix time, milliseconds
  fullName": "Sandra",
  emailAddress": "sandra@example.com"
}, {
  id: -2000000002,   // increases downwards; guests have negative ids
  extId: "user-id-123002-in-external-system",
  createdAt: 1557104523000,
  fullName: "Sandy",
  emailAddress: "sandy@example.com"
}, {
  ...

Pages

Here's JSON for two pages, annotated with comments:

{
  // An ID, so the posts (replies, the Original Post and title post)
  // have something to refer to.
  // (Currently page ids are actually strings (text); hence the quotes.)
  id: "2000000001",

  // So Talkyad won't import this page many times.
  extId: "maybe-a-permalink",

  // 12 means Discussion (it's a Typescript enum value).
  pageType: 12,

  // Hmm shouldn't be needed. You can set to 1. Talkyard uses this field to
  // know if a page should get re-rendered, and bumps it itself e.g. if a
  // new reply gets posted, or something gets edited.
  version: 1,

  // Unix time, milliseconds. You can set all these to the same value.
  createdAt: 1549882800000,
  updatedAt: 1549882800000,
  publishedAt: 1549882800000,

  // Refers to a category id in the `categories: [...]` list, see below.
  categoryId: 2000000001,

  // If you're importing embedded blog comments, Talkyard needs to know what
  // URL this discussion page is for.
  // embeddingPageUrl: "https://blog.example.com/2019/a-blog-post.html",
  // (then also use: pageType: 5  — that's the Embedded Comments page type).

  // Hmm shouldn't be needed, when importing. Talkyard could figure this out,
  // by looking at the Original Post.
  authorId: 1
}, {

  // Another page:

  id: "2000000002",  // 2000000002 comes after 2000000001.
  extId: "permalink-2",
  pageType: 12,
  version: 1,
  createdAt: 1546426800000,
  updatedAt: 1546426800000,
  publishedAt: 1546426800000,
  categoryId: 2000000001,
  authorId: 1
}

Page Paths

Above, we have pages. But what are the URL paths to the pages? — That's specified by the Page Path objects, and they look as follows:

pagePaths: [{
  // The page this path is for.
  pageId: "2000000001",

  // The "folder" is the URL path up to the page id and slug,
  // e.g.:  "/fol/der/-123/page-slug" if you set folder to "fol/der/".
  folder: "/",

  // Weather or not the page id should be included in the URL path. It's usually good
  // to include it, because then links will work also if someone types the wrong slug.
  // A page path with id shown:  /-1234/imported-page-one
  //          and if not shown:  /imported-page-one
  showId: true,

  // The "name" of the page in the URL.
  slug: "imported-page-one",

  // The "canonical" path is the main URL path to a page. — A page can have many URL paths
  // and all of them will redirect to the canonical URL path.
  // If you change a page URL path, all old paths will still work — and they'll redirect
  // to the new main (canonical) path.
  canonical: true
}, {
  pageId: "2000000002",  // that's the other page ID in this sample JSON
  folder: "/",
  showId: true,
  slug: "imported-page-two",
  canonical: true
}]

Posts

Now we have pages, but what about the page text and replies? That's the Post objects. Below, we have a page title, then a page body. Then, a reply. And then a reply to the reply. So, 4 sample "posts".

posts: [{
  // Another ID sequence, for posts. For example, Like votes need a post ID to refer to.
  id: 2000000001,

  // Post nr 0 is the page title. (The "title post".)
  nr: 0,

  // So won't get duplicated, if re-imported.
  extId: "/page/peramlink:title-post",

  // Refers to the first pages above.
  pageId: "2000000001",

  // The post type. Always 1 works fine.
  // (There other types too, e.g. for "Meta Posts" e.g. page status changes.)
  postType: 1,

  createdAt: 1549882800000,

  // The post author. Should be the same as the Original Post author (since this is the page title).
  createdById: 1,

  // Who composed the latest version of the post (latests edit).
  currRevById: 1,

  // You can set this to the same as createdAt.
  currRevStartedAt: 1549882800000,

  // Leave at 1. It's a revision number, in case the post has gotten edited.
  // (Post edit revisions aren't yet documented.)
  currRevNr: 1,

  // The actual text.
  approvedSource: "Page One Title",
  approvedAt: 1549882800000,

  // Who approved the text. 1 = System works fine.
  approvedById: 1,

  // Same as currRevNr = 1 above.
  approvedRevNr: 1
}, {
  // After 2000000001 comes 2000000002.
  id: 2000000002,

  // This post is the page body, a.k.a. Original Post. Always nr 1.
  nr: 1,

  extId: "/page/peramlink:body-post",

  // The same page id as for the title post.
  pageId: "2000000001",

  postType: 1,
  createdAt: 1549882800000,
  createdById: 1,
  currRevById: 1,
  currRevStartedAt: 1549882800000,
  currRevNr: 1,

  // The page text.
  approvedSource:
      "Page body text. And a sample link, <a href=\\"https://example.com/link\\">link text</a>.",

  approvedAt: 1549882800000,
  approvedById: 1,
  approvedRevNr: 1
}, {
  // This is a reply to the Original Post.

  id: 2000000003,

  // So won't get duplicated, if re-importing.
  extId: "permalink-to-this-reply-e.g.:",

  // The same page id.
  pageId: "2000000001",

  // Only the page title and body have special post numbers (namely 0 and 1).
  // The replies instead have numbers 2 000 000 001, 2, 3, 4 ... .
  nr: 2000000001,

  postType: 1,
  createdAt: 1551661323000,
  createdById: -2000000001,
  currRevById: -2000000001,
  currRevStartedAt: 1551661323000,
  currRevNr: 1,
  approvedSource: "<p>The first reply on the page</p>",
  approvedAt: 1551661323000,
  approvedById: 1,
  approvedRevNr: 1
}, {
  // This is a reply to the reply above.

  id: 2000000004,
  extId: "...",
  pageId: "2000000001",  // the same page id

  // The post numbers need to be unique, per page only.
  nr: 2000000002,

  // This parentNr says that this post replies to the reply above, that is,
  // post nr 2000000001. (Which in turn replies to the Original Post, nr 1.)
  parentNr: 2000000001,

  postType: 1,
  createdAt: 1559880306000,
  createdById: -2000000003,
  currRevById: -2000000003,
  currRevStartedAt: 1559880306000,
  currRevNr: 1,
  approvedSource: "<p>A reply to a reply.</p>",
  approvedAt: 1559880306000,
  approvedById: 1,
  approvedRevNr: 1
},

  // ... more posts, for the other page, and they'd have:
  // pageId: "2000000002",   // page two
]

Categories

Lastly, Talkyard wants to know in which category to place the imported pages.

Let's say we'd like to place the imported pages in a category that already exists. Then, first you'd need to give that category an "external ID" that we can refer to, from the JSON import file.

Do that by going to the category (in your browser), click Edit Category, then, scroll down to the bottom of the dialog, and click the "External ID" field and type, say, "imported_from_reddit".

Then, in the JSON file:

categories: [{
  // An ID, so the pages have a category ID to refer to.
  id: 2000000001,

  // You can open 
  extId: "imported_from_reddit"
}],

This category won't get created (since it already exists). It's however neeeded, in the JSON file, so Talkyard knows that category id: 2000000001 is the one you just gave the external id "imported_from_reddit".

  • 0 replies