Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!

Create a CSV list of all pages in an Apple WikiServer wiki OS X Server
An interesting request came in today from a coworker. She wanted to create a spreadsheet that contained all of our intranet's wiki pages (which uses the Apple WikiServer), presumably because Apple doesn't provide an easy way to "list all pages" in the wiki itself. Along with the page title, she also wanted to extract its internal ID, its URL, and the time the page was created as well as the time it was last modified.

I spent about an hour looking into this this afternoon, and it turns out that much of this information is readily available on the filesystem in the Apple WikiServer's data store. I whipped up the following shell script to extract this information in CSV format, exactly as requested. I'm submitting this script here in case someone else wants similar "export a list of WikiServer pages to a comma-separated values (CSV) file" functionality, but isn't sure how to go about getting it.

To use this script, just edit the line that reads http://my-server.example.com/groups/wiki/ so that it refers to the wiki base URI of your own server, make it executable (chmod a+x script_name), and then run it.
#!/bin/sh -
#
# Script to extract data from an Apple WikiServer's data store by querying the
# filesystem itself. Creates a 'wikipages.csv' file that's readable by any
# spreadsheeting application, such as Numbers.app or Microsoft Excel.app.
#
# USAGE:   To use this script, change to the WikiServer's pages directory, then
#          just run this script. A file named wikipages.csv will be created in
#          your current directory. For instance:
#
#              cd /Library/Collaboration/Groups/mygroup/wiki  # dir to work in
#              wikipages2csv.sh                               # run the script
#              cp wikipages.csv ~/Desktop                     # save output
#
# WARNING: Since the WikiServer's files are only accessible as root, this script
#          must be run as root to function. Additionally, this is not extremely
#          well tested, so use at your own risk.
#
# Author:  Meitar Moscovitz
# Date:    Mon Sep 22 15:03:54 EST 2008

##### CONFIGURE HERE ########

# The prefix to append to generated links. NO SPACES!
WS_URI_PREFIX=http://my-server.example.com/groups/wiki/

##### END CONFIGURATION #####
# DO NOT EDIT PAST THIS LINE
#############################

# debugging
set -e

WS_CSV_OUTFILE=wikipages.csv
WS_PAGE_IDS_FILE=`mktemp ws-ids.tmp.XXXXXX`

function extractPlistValueByKey () {
   head -n \
     $(expr 1 + `grep -n "<key>$1</key>" page.plist | cut -d ':' -f 1`) page.plist | \
       tail -n 1 | cut -d '>' -f 2 | cut -d '<' -f 1
}

function linkifyWikiServerTitle () {
   echo $1 | sed -e 's/ /_/g' -e 's/&/_/g' -e 's/>/_/g' -e 's/</_/g' -e 's/\?//g'
}

function formatISO8601date () {
   echo $1 | sed -e 's/T/ /' -e 's/Z$//'
}

function csvQuote () {
   echo $1 | grep -q ',' >/dev/null
   if [ $? -eq 0 ]; then
       echo '"'$1'"'
   else
       echo $1
   fi
}

ls -d [^w]*.page | \
 sed -e 's/^\([a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]\)\.page$/\1/' > $WS_PAGE_IDS_FILE

echo "Title,ID,Date Created,Last Modified,URI" > $WS_CSV_OUTFILE
while read id; do
   cd $id.page
   title=$(extractPlistValueByKey title)
   created_date="$(formatISO8601date $(extractPlistValueByKey createdDate))"
   modified_date="$(formatISO8601date $(extractPlistValueByKey modifiedDate))"
   link=$WS_URI_PREFIX"$id"/`linkifyWikiServerTitle "$title"`.html
   cd ..
   echo `csvQuote "$title"`,$id,$created_date,$modified_date,`csvQuote "$link"` >>>< $WS_CSV_OUTFILE
done < $WS_PAGE_IDS_FILE
rm $WS_PAGE_IDS_FILE
Note: This script was originally posted on my own personal weblog.
    •    
  • Currently 2.70 / 5
  You rated: 3 / 5 (10 votes cast)
 
[9,773 views]  

Create a CSV list of all pages in an Apple WikiServer wiki | 6 comments | Create New Account
Click here to return to the 'Create a CSV list of all pages in an Apple WikiServer wiki' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Other useful script for wiki server
Authored by: eableson on Sep 24, '08 08:13:22AM
Here's a link to another useful script for the wiki/blog server. If your server is public facing and you want to use the google webmaster tools to ensure that it's indexed and find out all sorts of useful stuff, you need a sitemap file.

Here's a script to generate a sitemap of all of the blog and wiki content ready for pickup by Google (and other services as well)

http://www.infrageeks.com/groups/infrageeks/wiki/7e740/OS_X_Wiki_Sitemap_.html


[ Reply to This | # ]
Create a CSV list of all pages in an Apple WikiServer wiki
Authored by: dblack on Sep 24, '08 07:49:49PM
You can also just query the wiki database with a 1-liner to get all the pages.
For example....

sqlite3 /Library/Collaboration/Groups/MyGroupWiki/index.db "select title from pages"

[ Reply to This | # ]
Create a CSV list of all pages in an Apple WikiServer wiki
Authored by: dblack on Sep 24, '08 08:00:11PM
A quick run of....
sqlite3 /Library/Collaboration/Groups/<mygroup>/index.db ".schema pages"
will show the schema of the 'pages' database....
uid
path
kind
title
author
authorLongName
createdDate
lastModifiedAuthor
lastModifiedAuthorLongName
modifiedDate
content
strippedContent
deleted
tombstoned
edited
tags


[ Reply to This | # ]
Create a CSV list of all pages in an Apple WikiServer wiki
Authored by: bblog on Sep 25, '08 12:31:33PM
The shell script got mangled!
Authored by: guns on Sep 26, '08 10:23:39AM

Curious, I pasted your script into a text editor and noticed some really strange shell redirections: "&gt" "&lt" ?!

But then I realized that either you had encoded your [&<>]s for html, or that it got mangled by the comment processor.

Just thought I'd put it out there before someone tries to run the script as is.



[ Reply to This | # ]
The shell script got mangled!
Authored by: meitar on Sep 28, '08 12:55:58AM

You're right, it did get mangled. Since it's interesting to a few folks, I've decided to host it in proper source code controlled fashion on Github. You can get a copy of the latest version here.

---
-Meitar Moscovitz
Professional: http://MeitarMoscovitz.com/
Personal: http://maymay.net/

[ Reply to This | # ]