<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Installing Wikipedia part 4 of N: getting additional wikipedia metadata</title>
	<atom:link href="http://arunxjacob.wordpress.com/2008/01/30/installing-wikipedia-part-4-of-n-wikipedia-metadata/feed/" rel="self" type="application/rss+xml" />
	<link>http://arunxjacob.wordpress.com/2008/01/30/installing-wikipedia-part-4-of-n-wikipedia-metadata/</link>
	<description>sometimes that's nice, sometimes not so much...</description>
	<lastBuildDate>Tue, 05 May 2009 15:19:45 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: arunxjacob</title>
		<link>http://arunxjacob.wordpress.com/2008/01/30/installing-wikipedia-part-4-of-n-wikipedia-metadata/#comment-47</link>
		<dc:creator>arunxjacob</dc:creator>
		<pubDate>Thu, 19 Jun 2008 04:40:40 +0000</pubDate>
		<guid isPermaLink="false">http://arunxjacob.wordpress.com/?p=39#comment-47</guid>
		<description>I was more interested in the raw schema than the rendered contents, which is why I didn&#039;t go the wikimedia route. Pagelinks shows the inter page links, which correlates to the relevancy of the page being linked to.</description>
		<content:encoded><![CDATA[<p>I was more interested in the raw schema than the rendered contents, which is why I didn&#8217;t go the wikimedia route. Pagelinks shows the inter page links, which correlates to the relevancy of the page being linked to.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: wantondevious</title>
		<link>http://arunxjacob.wordpress.com/2008/01/30/installing-wikipedia-part-4-of-n-wikipedia-metadata/#comment-45</link>
		<dc:creator>wantondevious</dc:creator>
		<pubDate>Fri, 13 Jun 2008 00:08:25 +0000</pubDate>
		<guid isPermaLink="false">http://arunxjacob.wordpress.com/?p=39#comment-45</guid>
		<description>Hi,

I hadn&#039;t found this before I started this project myself. However, the approach I took was to install Wikimedia, which in turn creates the database (plus allows you to render the dump). Currently Im at 4.7 mill pages after 1 day - Im cursing that I didnt turn off Binary Logging and DISABLE the page and text indexes.

I&#039;m not sure whether this would make it faster, but turning off indexes usually does. 
In general binary log is ok in MySQL (doc claims only 1% overhead - but... we&#039;re loading 12-13 GB of text - and so this creates 12-13 GB worth of logs.

My question is - what does the pagelinks actually DO? MediaWiki seems to render everything just fine, without this extraction. How would it generate this table in normal usage? Or is it an adhoc table thats only populates on demand?

I&#039;d like to have this data, but if mediawiki will generate for me, seems silly to upload the pagelink table.</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>I hadn&#8217;t found this before I started this project myself. However, the approach I took was to install Wikimedia, which in turn creates the database (plus allows you to render the dump). Currently Im at 4.7 mill pages after 1 day &#8211; Im cursing that I didnt turn off Binary Logging and DISABLE the page and text indexes.</p>
<p>I&#8217;m not sure whether this would make it faster, but turning off indexes usually does.<br />
In general binary log is ok in MySQL (doc claims only 1% overhead &#8211; but&#8230; we&#8217;re loading 12-13 GB of text &#8211; and so this creates 12-13 GB worth of logs.</p>
<p>My question is &#8211; what does the pagelinks actually DO? MediaWiki seems to render everything just fine, without this extraction. How would it generate this table in normal usage? Or is it an adhoc table thats only populates on demand?</p>
<p>I&#8217;d like to have this data, but if mediawiki will generate for me, seems silly to upload the pagelink table.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: arunxjacob</title>
		<link>http://arunxjacob.wordpress.com/2008/01/30/installing-wikipedia-part-4-of-n-wikipedia-metadata/#comment-42</link>
		<dc:creator>arunxjacob</dc:creator>
		<pubDate>Sun, 11 May 2008 06:06:39 +0000</pubDate>
		<guid isPermaLink="false">http://arunxjacob.wordpress.com/?p=39#comment-42</guid>
		<description>That&#039;s strange, we actually had issues getting it to run under postgres. I wasn&#039;t the guy trying to do this, but he was bummed he couldn&#039;t get it working on &#039;a real database instead of that toy MySQL database!&#039; (his words, not mine, I&#039;m not religious about this stuff :)</description>
		<content:encoded><![CDATA[<p>That&#8217;s strange, we actually had issues getting it to run under postgres. I wasn&#8217;t the guy trying to do this, but he was bummed he couldn&#8217;t get it working on &#8216;a real database instead of that toy MySQL database!&#8217; (his words, not mine, I&#8217;m not religious about this stuff <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: geoffsyndicate</title>
		<link>http://arunxjacob.wordpress.com/2008/01/30/installing-wikipedia-part-4-of-n-wikipedia-metadata/#comment-41</link>
		<dc:creator>geoffsyndicate</dc:creator>
		<pubDate>Sun, 04 May 2008 04:58:42 +0000</pubDate>
		<guid isPermaLink="false">http://arunxjacob.wordpress.com/?p=39#comment-41</guid>
		<description>Although I found this series of posts to be very useful for first-time Wikipedia importers (thank you for posting this BTW), I eventually gave up on MySQL.  It turns out that MySQL was crashing pretty severely when I tried to import the categorylinks data, so I have written a conversion script and imported the whole lot into PostgreSQL (without any problems).

There you go.  MySQL == bad, PostgreSQL == good.</description>
		<content:encoded><![CDATA[<p>Although I found this series of posts to be very useful for first-time Wikipedia importers (thank you for posting this BTW), I eventually gave up on MySQL.  It turns out that MySQL was crashing pretty severely when I tried to import the categorylinks data, so I have written a conversion script and imported the whole lot into PostgreSQL (without any problems).</p>
<p>There you go.  MySQL == bad, PostgreSQL == good.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: geoffsyndicate</title>
		<link>http://arunxjacob.wordpress.com/2008/01/30/installing-wikipedia-part-4-of-n-wikipedia-metadata/#comment-40</link>
		<dc:creator>geoffsyndicate</dc:creator>
		<pubDate>Thu, 01 May 2008 22:43:29 +0000</pubDate>
		<guid isPermaLink="false">http://arunxjacob.wordpress.com/?p=39#comment-40</guid>
		<description>I&#039;m on a Macbook Pro, 2.4Ghz with 4 gigs of RAM.  I&#039;m thinking about cancelling it and removing the locks on the sql so I can monitor it&#039;s progress.  Thoughts?</description>
		<content:encoded><![CDATA[<p>I&#8217;m on a Macbook Pro, 2.4Ghz with 4 gigs of RAM.  I&#8217;m thinking about cancelling it and removing the locks on the sql so I can monitor it&#8217;s progress.  Thoughts?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: arunxjacob</title>
		<link>http://arunxjacob.wordpress.com/2008/01/30/installing-wikipedia-part-4-of-n-wikipedia-metadata/#comment-39</link>
		<dc:creator>arunxjacob</dc:creator>
		<pubDate>Thu, 01 May 2008 21:32:01 +0000</pubDate>
		<guid isPermaLink="false">http://arunxjacob.wordpress.com/?p=39#comment-39</guid>
		<description>It didnt take more than an hour, what kind of hardware are you running on? I was running this on a quad core 64 bit server.</description>
		<content:encoded><![CDATA[<p>It didnt take more than an hour, what kind of hardware are you running on? I was running this on a quad core 64 bit server.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: geoffsyndicate</title>
		<link>http://arunxjacob.wordpress.com/2008/01/30/installing-wikipedia-part-4-of-n-wikipedia-metadata/#comment-38</link>
		<dc:creator>geoffsyndicate</dc:creator>
		<pubDate>Thu, 01 May 2008 21:23:00 +0000</pubDate>
		<guid isPermaLink="false">http://arunxjacob.wordpress.com/?p=39#comment-38</guid>
		<description>Hi,

How long did it take to load the category links and page links?  The category links script locks the table, so I don&#039;t know of any way to tell how far through it is.  It&#039;s been going for a while now...</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>How long did it take to load the category links and page links?  The category links script locks the table, so I don&#8217;t know of any way to tell how far through it is.  It&#8217;s been going for a while now&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: indigene</title>
		<link>http://arunxjacob.wordpress.com/2008/01/30/installing-wikipedia-part-4-of-n-wikipedia-metadata/#comment-37</link>
		<dc:creator>indigene</dc:creator>
		<pubDate>Mon, 21 Apr 2008 13:27:29 +0000</pubDate>
		<guid isPermaLink="false">http://arunxjacob.wordpress.com/?p=39#comment-37</guid>
		<description>Arun

Read with great interest your travails to  get the local Wikipedia up and running. Phew! 

I am building a web application using the mediawiki platform on LAMP. Currently doing the DB design with MySQL Work Bench.

I have imported the mediawiki schema you mention in your article: 

http://www.mediawiki.org/wiki/Manual:Database_layout  

using this schema xml file:
 http://files.nickj.org/MediaWiki/mediawiki-dbdesigner-schema-data.xml) 

which also generated this cute layout pic: 

http://upload.wikimedia.org/wikipedia/commons/4/41/Mediawiki-database-schema.png

But would love to look at an actual layout of the wikipedia db. Can you generate a db schema file from your local Wikipedia and send me in a format (sql/xml) that I can feed into Work Bench?

I am indigene2007 on gmail

Thanks in advance,
Warm Regards</description>
		<content:encoded><![CDATA[<p>Arun</p>
<p>Read with great interest your travails to  get the local Wikipedia up and running. Phew! </p>
<p>I am building a web application using the mediawiki platform on LAMP. Currently doing the DB design with MySQL Work Bench.</p>
<p>I have imported the mediawiki schema you mention in your article: </p>
<p><a href="http://www.mediawiki.org/wiki/Manual:Database_layout" rel="nofollow">http://www.mediawiki.org/wiki/Manual:Database_layout</a>  </p>
<p>using this schema xml file:<br />
 <a href="http://files.nickj.org/MediaWiki/mediawiki-dbdesigner-schema-data.xml)" rel="nofollow">http://files.nickj.org/MediaWiki/mediawiki-dbdesigner-schema-data.xml)</a> </p>
<p>which also generated this cute layout pic: </p>
<p><a href="http://upload.wikimedia.org/wikipedia/commons/4/41/Mediawiki-database-schema.png" rel="nofollow">http://upload.wikimedia.org/wikipedia/commons/4/41/Mediawiki-database-schema.png</a></p>
<p>But would love to look at an actual layout of the wikipedia db. Can you generate a db schema file from your local Wikipedia and send me in a format (sql/xml) that I can feed into Work Bench?</p>
<p>I am indigene2007 on gmail</p>
<p>Thanks in advance,<br />
Warm Regards</p>
]]></content:encoded>
	</item>
</channel>
</rss>
