Nevyn's Blog

Thursday, 29 January 2009

To tweak, or not to tweak? (mysql)

Assuming you have a cluster of mysql servers serving your application, and you have configured them to share the load, yada yada, your next step is probably going to be asking 'how can I improve the design of database schema?'

Here are some links which cover that field (may I remind you to backup your database before starting, follow them at your own risk, and you cannot hold me responsible for any resulting issues!)

This one illustrates tweeking a Joomla CMS application on a VPS (shared hosting server):
http://www.joomlaperformance.com/articles/server_related/tweaking_mysql_server_23_16.html

This one discusses the differences between innodb and myisam db engines:
http://worthposting.wordpress.com/2008/06/22/innodb-or-myisam/

This appears to be the command to convert a myisam (the default) table to an innodb one, if you wanted to persue that, but beware the monolithic file may need more proactive maintenance and monitoring (for corruption etc, but not fragmentation issues). Again BACKUP!!

-- "ALTER TABLE foo ENGINE=InnoDB"

This one discusses locking, and mysql transactions - no transactions on myisam though! (hence you might need innodb for that)
http://forums.whirlpool.net.au/forum-replies-archive.cfm/498624.html

There's also the problem of race conditions: two select queries (two mysql sessions, webrowsers etc) return the same row, which the application wants to update. The first to 'accepted, the second to 'rejected' - row or table locking will prevent them from writing the new value at the same time, but what happens if both processes complete in lockstep, first 'accept' ing, then 'reject'ing the same row - if only one possible outcome was acceptable? (in other words, a row cannot be accepted after it has been rejected, and vice versa.) We could prevent accidental overwrites by increasing the criteria of the where clause:

UPDATE `orders`
SET `status`='accepted'
-- the first where clause identifies the row
WHERE `order_id`='".$this->DB->quotestring($order_id)."
-- and the second helps prevent 'logical' data collisions!
AND `status` != 'rejected'';

Lastly, this one illustrates a number of different ways one could design a schema
"including discussions on schema architecture, common data access patterns, and replication/scale-out guidelines" using web 'tagging' as an example.

http://forge.mysql.com/wiki/TagSchema

Labels: , , , , , , ,

Tuesday, 13 January 2009

MySQL multimaster cluster

I've been looking at ways of clustering my MySQL servers, to provide failover capabilities and also to distribute the load coming in from a farm of web servers.

There are many ways of loadbalancing, from DNS round-robin through Apache ServerMap directives and so on. Once you get it sorted, and have implemented some sort of session system so a visitor can be serverd by any web node - how do you split the load on the database servers yet maintain consistency?

If you do not have machines powerful enough to look at the MySQL Cluster methods (from 5.0 upwards) then you will probably have looked into replication. But now you have another problem.... the write-to-master, read-from-slave scenario.

OK, it isn't a problem by itself. But if you run an application like wordpress, for example, which keeps the server address in the database - you now have the choice of modifying the application to be aware of more than one server (the master for writes, the slaves for reads) -or- modify each database so it has a different value.
This could be very tricky in a replicated slave environment, as changes there will be overwritten again by the master, and lost?!

So, if you don't want to have a major headache rewriting your application, or customising your databases, have a look at multimaster replication.

For my example, it involves running a mysql server instance on each webserver - and setting the ip address in the database to 127.0.0.1 instead of using a separate real ip each time. This may not be applicable to you, but it means for me, I can run any number of nodes, each with their own dedicated MySQL datasource, which will be identical across all the nodes. That's it!

http://www.onlamp.com/pub/a/onlamp/2006/04/20/advanced-mysql-replication.html

* again thanks to Matt for pointing out that replication traffic could spiral exponentially, with every additional node, if packets are sent to unicast addresses. I will try and configure each node to send updates to the multicast address, so each SQL SELECT, UPDATE, and so on is replicated to the network only once.

Labels: , , ,

Friday, 9 January 2009

Make a redundant load balanced web or db server

OK, a clustered web server is not the same as a Beowulf (I never said it was).
However both projects can share practices such as diskless booting, BOOTP, DHCP (with fixed IPs for server MAC addresses) and so on. One thing I want to implement now is to have a service, such as mysqld or apache, on more than one node, with a load balancer (which also has a redundant node) serving requests on one incoming IP.

This can be acheived with UltraMonkey and heartbeat:

http://www.debianhelp.co.uk/ultramonkey.htm

Labels: , , , , , , , , , ,

Monday, 5 January 2009

Howto make your server highly available

Setting up a highly available NFS server (5 pages)
http://www.howtoforge.com/high_availability_nfs_drbd_heartbeat

Mirror Your Web Site With rsync (2 pages)
http://www.howtoforge.com/mirroring_with_rsync

How To Set Up A Loadbalanced High-Availability Apache Cluster (4 pages)
http://www.howtoforge.com/high_availability_loadbalanced_apache_cluster

How To Set Up A Load-Balanced MySQL Cluster (8 pages)
http://www.howtoforge.com/loadbalanced_mysql_cluster_debian

How To Set Up Database Replication In MySQL (2 pages)
http://www.howtoforge.com/mysql_database_replication

Labels: , , , , , , , , , ,

Friday, 2 January 2009

Convert MYSQL to UTF8 character set on Ubuntu 8.10

I used the following methods to convert my server to UTF-8, using standard Ubuntu packages, eg mysql-server from synaptix, nothing had to be recompiled.

They may not work in your environment. Use them at your own risk, and BACKUP everything first. (Preferably test before using it on any production server.) I post them only as a personal memory aid. If you use the information here and your server bursts into flames, you accept responsibility for that.

PuTTY /SSH:
In PuTTY 0.60 you can change the option "Window > Translation > Received data assumed to be in which character set:" to 'UTF-8'. After this all received data will be interpreted as UTF-8 and displayed correctly. (For anyone ssh'ing to the mysql server and using the mysql> prompt from there!)

Bash:
try adding something like this to your ~/.bash_profile
echo -ne '\e%G\e[?47h\e%G\e[?47l'
There's no point trying to work in UTF-8 if your computer thinks you are speaking ISO 8859-1!! This should even fix vi, so AltGr+4 correctly gives the Euro '€' and AltGr+` [backtick] gives the broken pipe '¦', instead of displaying nothing, or something odd like an accented A¦ pair. (or a dot in bash). Of course, Windows notepad will choke on the extended character, but something like notepad++ will work to display it.

MySQL:
A) Edit /etc/mysql/my.cnf, locate the section that starts [mysqld], add the lines below to configure your server, and then restart mysql. This should switch the server to UTF-8 'mode'.

[mysqld]
# utf8
init-connect='SET NAMES utf8'
character-set-server=utf8
collation-server=utf8_general_ci

B) # /etc/init.d/mysqld restart

afterwards you should find the following all correctly updated to utf-8
> Variable Session value Global value
> character set client utf8 latin1
> character set connection utf8 latin1
> character set database latin1 latin1
> character set results utf8 latin1
> character set server latin1 latin1
> character set system utf8 utf8
> collation connection utf8_general_ci latin1_swedish_ci
> collation database latin1_swedish_ci latin1_swedish_ci
> collation server latin1_swedish_ci latin1_swedish_ci

This pinched from https://lists.ubuntu.com/archives/ubuntu-users/2006-February/067367.html because it was so very useful =)

C) Now follow this guide: http://www.nicknettleton.com/zine/php/php-utf-8-cheatsheet which you can also find copied below. Note that I was able to convert my latin1 encoded database (containing a wordpress blog) and all its tables (one-by-one) into utf8 using the below commands whilst the server was 'hot' - but take care! (and backups!!). Put db/table names with hyphens into backticks, e.g. `wp-blog`.

--------
To support worldwide languages, you need to use UTF-8 encoding for your web pages, emails and application, rather than ISO 8859-1 or another common western encoding, since these don't support characters used in languages such as Japanese and Chinese.

Happily, UTF-8 is transparent to the core Latin characterset, so you won't need to convert all your data to start using UTF-8. But there are a number of other issues to deal with. In particular, because UTF-8 is a multibyte encoding, meaning one character can be represented by more one or more bytes. This causes trouble for PHP, because the language parses and processes strings based on bytes, not characters, and makes mincemeat multibyte strings - for example, by splitting characters 'in half', bodging up regular expressions, and rendering email unreadable.

There are a number of great articles online about UTF-8 and how it works - Joel Spolski's comes to mind - but very few about how to actually get it working with PHP and iron out all the bugs. So, here to save you the time we put in, is a quick cheatsheet and info about a few common issues.

1. Update your database tables to use UTF-8

CREATE DATABASE db_name
CHARACTER SET utf8
DEFAULT CHARACTER SET utf8
COLLATE utf8_general_ci
DEFAULT COLLATE utf8_general_ci
;

ALTER DATABASE db_name
CHARACTER SET utf8
DEFAULT CHARACTER SET utf8
COLLATE utf8_general_ci
DEFAULT COLLATE utf8_general_ci
;

ALTER TABLE tbl_name
DEFAULT CHARACTER SET utf8
COLLATE utf8_general_ci
;

2. Install the mbstring extension for PHP

Windows: download the dll if it's not in your PHP extensions folder, and uncomment the relevant line in your php.ini file: extension=php_mbstring.dll
Linux: yum install php-mbstring

3. Configure mbstring

Do this in php.ini, httpd.conf or .htaccess. (Remember to prepend these with 'php_value ' in httpd.conf or .htaccess.)

mbstring.language = Neutral ; Set default language to Neutral(UTF-8) (default)
mbstring.internal_encoding = UTF-8 ; Set default internal encoding to UTF-8
mbstring.encoding_translation = On ; HTTP input encoding translation is enabled
mbstring.http_input = auto ; Set HTTP input character set dectection to auto
mbstring.http_output = UTF-8 ; Set HTTP output encoding to UTF-8
mbstring.detect_order = auto ; Set default character encoding detection order to auto
mbstring.substitute_character = none ; Do not print invalid characters
default_charset = UTF-8 ; Default character set for auto content type header

4. Deal with non-multibyte-safe functions in PHP

The fast-and-loose way to do this is with the following php configuration:

mbstring.func_overload = 7 ; All non-multibyte-safe functions are overloaded with the mbstring alternatives

But there are problems with this. php.net has a warning about this potentially affecting the whole server. And even if this isn't an issue for you, mbstring can make a mess of binary strings.

So, a better route is to search your application code for the following functions, and replace them with mbstring's 'slot-in' alternatives:

mail() -> mb_send_mail()
strlen() -> mb_strlen()
strpos() -> mb_strpos()
strrpos() -> mb_strrpos()
substr() -> mb_substr()
strtolower() -> mb_strtolower()
strtoupper() -> mb_strtoupper()
substr_count() -> mb_substr_count()
ereg() -> mb_ereg()
eregi() -> mb_eregi()
ereg_replace() -> mb_ereg_replace()
eregi_replace() -> mb_eregi_replace()
split() -> mb_split()

5. Sort out HTML entities

The htmlentities() function doesn't work automatically with multibyte strings. To save time, you'll want to create a wrapper function and use this instead:

/**
* Encodes HTML safely for UTF-8. Use instead of htmlentities.
*
* @param string $var
* @return string
*/
function html_encode($var)
{
return htmlentities($var, ENT_QUOTES, 'UTF-8') ;
}

6. Check content-type headers

Check through your code for any text-based content-type headers, and append the UTF-8 charset, so the browser knows what it's working with:

header('Content-type: text/html; charset=UTF-8') ;

You should also repeat this at the top of HTML pages:
(replace square brackets with angle ones!!)

[meta http-equiv="Content-type" value="text/html; charset=UTF-8" /]

7. Update email scripts

Email can be tricky. You'll need to update the content-type for any emails and text-based mime parts to use UTF-8 encoding. You'll also need to alter the way in which headers are encoded to use UTF-8. mbstring provides a function mb_encode_mimeheader() to handle this for you, but it does make a mess of address lists: you'll need to encoding the name and address parts seperately, then compile them into an address list.

Be sure to encode the subject and other headers too - Korean speakers will tend to put Korean text for the subject.

9. Check binary files and strings

Finally, double check any binary files and strings handled by PHP, particularly uploads, downloads and encryption. In some cases it may be necessary to revert to ASCII just before a download or processing a binary string.

--------

MySQL clients:
Check the options in your mysql clients - should be switched to UTF8 mode also, or you might get your results in latin1, or something wierd like that.


Perhaps use tips from http://tlug.dnho.net/?q=node/276 if you are trying to mysqldump & restore from latin1 to utf8.

PHP:
Use the above to set your PHP / front end into UTF8 (see mbstrings section).

Labels: , , , , ,

Tuesday, 22 July 2008

Apache, MySQL and PHP (AMP Servers)

What is an AMP server? (also known as LAMP, WAMP, XAMPP...)
As the title suggests, Apache, MySQL and PHP (or sometimes PERL) running together make an AMP server. LAMP suggests that they are running on a Linux platform, WAMP denotes a Windows server and XAMPP is a package that can be used to setup an AMP server on Apple's OS X.

Apache is a web server
MySQL is a database engine
PHP & PERL are scripting languages typically used on web servers.

Whilst PERL (Practical Extraction and Reporting Language) is a very mature and powerful language, novice programmers may find it difficult to start with. PHP on the other hand, was intentionally written with the novice developer in mind, deliberately making it easy for novices to create web pages and applications. It has also matured, and supports Object Oriented Programming concepts, such as classes, Inheritance and so on, has a clean looking syntax resembling C/C++ (simmilar comments, and bracing styles).

It does upset some OO purists as in keeping with PHP's 'lets keep things simple for the webbie' approach meant it has done things in a way that might not be considered 'true'. For example, it is not a strongly typed language, and it recently changed its default behaviour for passing by value and passing by reference.

If none of that made any sense to you, don't worry!! You could start out with PHP writing things in a procedural style, and find that as your skills develop, you will discover that PHP can grow with you, and you can begin to adopt the OOP style.
(Object Oriented Programming, vs Procedural style.)

Now this post is deliberately short, and meant as an introduction to new users. There are other alternative databases, webservers, and scripting languages, and I am not suggesting they cannot be used instead. In fact, the concepts which apply to an AMP server could equally apply to IIS, PoorMan, Lighthttpd, PostgreSQL, Oracle, SQL server, and so on - feel free to experiment and find what works best for you! =)

The reason for covering AMP servers is that I use them myself, and they form the fundamental basis from which I create my sites and applications. When I cover the topics in more detail, I will most likely use AMP projects in the examples. By keeping to a common platform in this way, I only need to explain the basics once - and can focus on writing about the concepts, instead of the differences between all the competing technologies.

Labels: , , , , , , , , , , ,