I have been trying to do some development on media wiki search. It turns out that setting up an environment is not so simple. I
decided to document the process in case I need to do it again and for other peoples benefit.
MediaWiki Development Environment Setup On Labs[edit]
To start of there is a bunch of technology being used in labss that is a little intimidating.
So the tutorial (geared at developers) should start by letting them know what is what for example:
what is PUPPET
what is BASTION
what is an INSTANCE (an instance of)
next I tried to access both existing and non existing instances - but it was not successful :-(...
Instance Creation - what are the option during an instance creation ? (even my helpers seemed confused what cache to use for my use case)
block diagrams would reduce my panic level almost as much as a panic button :-)
for me a page with
a block diagram (apache,php,mediawiki,cache,search extention,) on one machine + a script of how to realise it would be great.
also a diagram of the real search (sub) cluster's setup and how to set it up as an instance would be interesting in a week or two.
some issues which should be in the labs documentation are things like:
where to and how to get extentions?
where to find a dump, how to get it into the instance, how to import it, how to track the import?
wget it to where
run what command
how to check it's progress (the wiki's stats page v.s. a console)
how to back up an instance.
how to set up java, ant, maven ....
another thing is that even though I used to work in security startups for 4 years - using SSH tunneling is now vauge
and geting into the instance is realy difficult to get right.
what is a security group - is it like port forwarding on my router ??? It could use an introduction like
"if you don't setup a security group all the ssh tunells you set up won't work since (... the port will be blocked - or the real reason)".
to set up the security group go to ... "Manage Security group list" and add rules like ...
also the Manage Security group list itself is bare and could give/reffer to some sample setutps.
I found the Instance console realy helpfull and even after a couple of tips to test from within the instance I got nowhere. so diagnostic tips which are obvious to an op are great for noobs. e.g.
use the Instance>Console Output if you cannot access port XXX. if it says ... refused, you need to set up a security group ...
I'm also worried to no end that setting up and working with an instance of servral machines like the real search cluster would be (which is mount improbable for me at this time) would be mission imppossible when adding in virtualization.
MediaWiki Development Environment Setup On Ubuntu[edit]
w
df
ls
cd /tmp
ls
rm simplewiki-latest-pages-meta-current.xml
df
rm simplewiki-latest-pages-articles.xml
df
7z x simplewiki-latest-pages-meta-history.xml.7z
ls
rm *xml
df
7z
ls
cp simplewiki-latest-pages-meta-history.xml.7z /mnt
cd /mnt
df
ls
mkdir petrb
mkdir extract
mv simplewiki-latest-pages-meta-history.xml.7z extract/
nohup php w/maintenance/update.php &
df
top
df
ls
df
top
df
cd w
vi LocalSettings.php
ls extensions/
vi LocalSettings.php
cd ..
nohup php w/maintenance/update.php &
php w/maintenance/update.php
vi w/LocalSettings.php
php w/maintenance/update.php
vi w/LocalSettings.php
php w/maintenance/update.php
df
vi w/LocalSettings.php
cd /tmp
wget download:simplewiki/latest/simplewiki-latest-category.sql.gz
wget download:simplewiki/latest/simplewiki-latest-page_props.sql.gz
ls
wget download:simplewiki/latest/simplewiki-latest-interwiki.sql.gz
gzip -d *
wget download:simplewiki/latest/simplewiki-latest-iwlinks.sql.gz
ls
mysql --password=puppet data < simplewiki-latest-interwiki.sql
vi /var/www/w/LocalSettings.php
top
df
whereis tomcat6
exit
ls
aptitude
df
ls /home
ls
df
MediaWiki Development Environment Setup On Windows[edit]
MediaWiki is one of the oldest and slowest WebApplicationFrameworks / Content Managment System. Users are rarely aware of this issue due to extensive use of hardware and a sophisticated cacheing strategy.
The good news is that it is possible to speed things up.
PHP accelerator - eaccelerator (skip to APC)[edit]
To enable eaccelerator edit php\php.ini and uncomment ";zend_extension = "\xampp\php\ext\php_eaccelerator.dll"
However the binary is not available in the XAMPP distribution and needs to be downloaded separately. This is no easy task. You need to check what type of php installation you have then.
what is the PHP version?
Is your version built as ThreadSafe?
which version of VisualStuio it was built with?
The answers is available from the PHP info page in the XAMPP main page.
However MediaWiki is best configured with APC and not eaccelerator.
as before XAMPP does not bundle the php_apc.dll I searched the forums and came up with http://downloads.php.net/pierre/
of the various distrubution I was able to use php_apc-20110109-5.3-vc9-x86.zip .
To enable APC edit php\php.ini and add
"zend_extension = "\xampp\php\ext\php_apc.dll"
Next update MediaWiki LocalSettings.php to use APC by adding
For a development MediaWiki instllation is is neccessary to (periodicaly) get the latest version of MW and Extentions from Subversion. Since my project is a java based extention I used the following setup.
set up an Eclipse workspace
Add One PHP project for MediaWikiTrunk (from subversion)
Add One PHP project for MediaWikiExtentions (from subversion)
Check out using svn+ssh a Java project for dev.
Check out using svn+ssh a Java project for making paches.
add to APACHE's httpd.conf the location and the Alias (url mapping) to the MediaWiki.
<Directory "d:/ws/MediaWikiTrunk">
Order allow,deny
Allow from all
</Directory>
Alias /mwt "d:/ws/MediaWikiTrunk"
or
open D:\xampp\apache\conf\extra\httpd-vhosts.conf
Un-comment line 19 (NameVirtualHost *:80).
<VirtualHost *:80>
DocumentRoot d:/ws/MediaWiki/core
ServerName transitcalculator.localhost
<Directory d:/ws/MediaWiki/core>
Order allow,deny
Allow from all
</Directory>
</VirtualHost>
Open your hosts file (C:\Windows\System32\drivers\etc\hosts).
Add 127.0.0.1 MW #MediaWiki to the end of the file
get the diff utility [2] and edit LocalSettings.php adding
# Path to the GNU diff3 utility. Used for conflict resolution.
$wgDiff = 'C:/Server/xampp/htdocs/MW/bin/GnuWin32/bin/diff.exe';
$wgDiff3 = 'C:/Server/xampp/htdocs/MW/bin/GnuWin32/bin/diff3.exe';
the alternative is to do nothing. Dif would be available but slower.
It turns out that dumping a MediaWiki via the dumphtml command-line extension is not compatible with some of the other extensions.
It is prudent to turn such extensions off for making static dumps.
The worst culprit is the syntax highlighting extension, which is not important in the main Wiktionary pages, but useful when developing scripts in user namespace.
Once imported there appeared to be a crash. This was due to database time outs. Some were caused by simultaneous db imports of required SQL tables. However even when these were done the time outs persisted.
The solution involved five days of diagnostic ad various attempts at patching things up. The actual solution came from:
Turning on traces and diagnostics.
Trying to dump pages via the dumpHtml extension.
Enabling Zend-Eaccelerator seems to have reduced the problem. Once done time outs no longer occurred on most pages.
resolution: installed zend e-accelerator from [4] and time outs have stopped.
Increasing the database time out from 30 seconds to 60.
Running the rebuild script should restore the DB to fully functional status. However this script runs three tasks each taking an order of magnitude longer than its predecessor. The recreation of links seems to be impractical in a large project. Also there is no indication of progress.
I wanted to have a fully functioning version of Wiktionary at this point (perhaps without the pictures) to allow a static dump which could be used to make an offline version.
Then the wiki decided to crash. It gives the error: "Fatal error: Maximum execution time of 30 seconds exceeded in D:\xampp\htdocs\mediawiki\includes\db\DatabaseMysql.php on line 23"
It turns out that the problem is not a crash but a slow response on many pages
random link works
pages generated by random link work too
Switching to a second empty db with different table suffix works fine too