Dokuwiki To Markdown



  • This page is a replacment for http://danpeirce.net46.net/dokuwiki/doku.php?id=python_markdown

Look no more, upload your DokuWiki markup files and convert them to original unextended Markdown files. Yes, it’s that easy. Converting from DokuWiki markup DokuWiki is basically the Mediawiki syntax. And Mediawiki powers Wikipedia, so I think it’s fair to say it’s a widely used format. I am in the process of migrating my old wiki from a Dokuwiki installation to a collection of markdown pages that I can maintain with git and some web tools (GitHub, ReadtheDocs, etc). Steps for this are: Use dokuwiki2git to export a git repository of the pages Copy pages to new directory and change permissions of those pages.

Dokuwiki

New updates will be made here rather than there (at least that is what I am thinking now).

Dokuwiki To Markdown

Table of Contents
  • Python Markdown
    • Auto-Generating HTML5 with a Table of Contents
    • Syntax Highlighting Using Pygments

Auto-Generating HTML5 with a Table of Contents

One of the things I like about dokuwiki is that it automatically generates a table of contents.When I saw the extensions for Markdown at pythonhosted.orgit occurred to me that Markdown might actually provide what I need. ThePython file that was used to convert the markdown file make_toc.md toan HTML5 file is make_toc.py. The Python file is short enough to show here.

This is a combination of actual Python with multi-line string objects thatcontain some pre-written HTML.

The line that converts the Markdown text to html is

Markdown does not include code for opening the top of the HTML file or closingat the end so literal multi-line strings are used to add those and the HTMLconverted from Markdown is inserted in the middle.

The Python file was invoked as follows:

The CSS Style Sheets Used

I copied the CSS style Sheets used here from the web since they were availableand shared freely.

  • white.css was modified from avenir-white.css
  • default.css was taken from default.css -- used with Pygments for syntax highlighting.

Syntax Highlighting Using Pygments

Installing Pygments

On my Debian system Pygments was not pre-installed but it is a standard DebianPackage.

For a MS Windows computer that is not going to work.See installing-python-pygments-on-windows

Example of C with Syntax Highlighting

Meta Extension to Read Title from Markdown File

The Markdown file requires the following on the first line:

For this to be seen as meta data it must occur on the first line. The first blank line in thefile indicates the end of the meta data.

Markdown Syntax Cheat Sheet

Code as been added to the make_toc.py file to read the title and add it between titletags in the generated HTML5 file.

Converting from DokuWiki

Dokuwiki Markdown Import

I have worked out a method to convert my dokuwiki pages to static html using markdown as the new working file format.The method was inspired by http://donaldmerand.com/code/2012/07/20/how-i-actually-convert-dokuwiki-to-latex.html. Despite the name given to that page this is relevant since he actually talks a lot about converting dokuwiki to markdown

The steps I am using are

  1. Convert the DokuWiki page to HTML. This is done using dokucli.php which should be put in the dokuwiki/bin folder. On my Rasperry Pi it is at /var/www/danp/dokuwiki/bin/dokucli.php.

    The command is

    The file dokuwiki.php comes from https://www.dokuwiki.org/tips:dokuwiki_parser_cli

  2. The new HTML file may or may not require some cleanup as references to the path on the raspberry pi need to be removed and some non-Ascii characters may be pressent and need to be removed or converted to Ascii (Markdown will not work with non-Ascii in the file.

    (note that the input file ends in one underscore and the output file name ends in two underscores)

    I wrote cleanhtml.py myself and I am adding new non-ascii characters as I go. Notes on the webpage that gave me insight into this have be lost. cleanhtml.py

  3. Now the file should be ready to be converted to Markdown markup language. ./html2text.py < newfile.html__ > newfile.md -> {if cleanup was needed use the cleaned file}

    html2text.py is from https://github.com/aaronsw/html2text

  4. A few adjustments are edited into the markdown file. This generally easy to do in a text editor (I typically using vim).

    1. I am using the meta extension and put

      at the top of each markdown page.

    2. I put TOC in square brackets where I want the Table of Contents to go (using the TOC extension).

    3. In dokuwiki the local links are just the name of the page without an extension. I add the html extension and tweek the path if needed.

    4. Images that are in a subdirectory require the : be replaced with a / in the path.

  5. The markdown file is converted to an HTML file.

    My makehtml.py was modified from make_toc.py. See make_toc.html.

BeatifulSoup for Extracting Section of Web Page

Sometimes one needs only part of an existing webpage for conversion to Markdown.BeatifulSoup can be used to extract some part of a bigger page. A simple example in which I wanted to backup only particular divisions of a page generated by a CMS before the site was to be migrated to a different CMS.

Dokuwiki To Markdown Download

Dokuwiki

Github Markdown Syntax

This finds the div tag in which the id was set to 'content' and the id for which the id was set to 'sidebar'