grypto just another tech blog

3Jul/090

Tuning MySQL Performance with a neat little perl script

Searching randomly through Google for my next post topic, I came across a little MySQL Tuner perl script written by a RackSpace Senior Systems Engineer named Major Haden. Two things caught my eye.

1. It's always great to optimize things, especially MySQL.

2. Major is a seriously cool name.

Nothing about this script was lacking in coolness, so I decided to run it on my crappy little home FreeBSD development box.


fetch http://mysqltuner.com/mysqltuner.pl
chmod +x mysqltuner.pl
./mysqltuner.pl

The result?


>> MySQLTuner 1.0.0 - Major Hayden
>> Bug reports, feature requests, and downloads at http://mysqltuner.com/
>> Run with '--help' for additional options and output filtering
Please enter your MySQL administrative login:
Please enter your MySQL administrative password:

-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.1.35
[OK] Operating on 32-bit architecture with less than 2GB RAM

-------- Storage Engine Statistics -------------------------------------------
[--] Status: -Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 756M (Tables: 26)
[!!] InnoDB is enabled but isn't being used
[!!] Total fragmented tables: 2

-------- Performance Metrics -------------------------------------------------
[--] Up for: 6d 9h 42m 52s (5K q [0.010 qps], 23 conn, TX: 482K, RX: 5M)
[--] Reads / Writes: 58% / 42%
[--] Total buffers: 34.0M global + 2.7M per thread (151 max threads)
[OK] Maximum possible memory usage: 439.8M (43% of installed RAM)
[OK] Slow queries: 0% (0/5K)
[OK] Highest usage of available connections: 1% (2/151)
[OK] Key buffer size / total MyISAM indexes: 8.0M/395.4M
[OK] Key buffer hit rate: 98.5% (32K cached / 472 reads)
[!!] Query cache is disabled
[OK] Sorts requiring temporary tables: 0% (0 temp sorts / 3 sorts)
[OK] Temporary tables created on disk: 7% (3 on disk / 38 total)
[!!] Thread cache is disabled
[OK] Table cache hit rate: 87% (49 open / 56 opened)
[OK] Open file limit used: 0% (98/11K)
[OK] Table locks acquired immediately: 100% (5K immediate / 5K locks)
[!!] Connections aborted: 13%

-------- Recommendations -----------------------------------------------------
General recommendations:
Add skip-innodb to MySQL configuration to disable InnoDB
Run OPTIMIZE TABLE to defragment tables for better performance
Enable the slow query log to troubleshoot bad queries
Set thread_cache_size to 4 as a starting value
Your applications are not closing MySQL connections properly
Variables to adjust:
query_cache_size (>= 8M)
thread_cache_size (start at 4)

Clearly I haven't gone anywhere near my MySQL config.

27Jun/092

10 Simple, Elegant, and Free Wordpress Themes

1. stylized

stylized

2. curiouslygreen

curiouslygreen

3. standardissue

standardissue

4. nonzero

nonzero

5. emporium

emporium

6. splendid

splendid

7. unembellished

unembellished

8. reference

reference

9. shallowgrunge

shallowgrunge

10. plainoffice

plainoffice

27Jun/090

FreeBSD 8.0 – Code Freeze – Whats cooking? Why FreeBSD? Why are you using linux?

FreeBSD 8.0 – Code freeze [www.evilcoder.org]

In continued preparation to the FreeBSD 8.0 Release, the tree had been frozen by re@. This means that development had been slowed down a lot because everything needs to be reviewed by the Release Engineers first. This step will make sure that the system will get as stable as possible before a fork will be done to RELENG_8. Stay tuned!

What's cooking for FreeBSD 8? Plenty. Jails v2, Xen dom-U support, ULE 3.0, and DTrace will all likely make it into the release, not to mention all the other new features, stability and code improvements. What else is cooking in FreeBSD 8? Here is a very long post to pump you up.

FreeBSD - What's cooking for FreeBSD 8? [ivoras.sharanet.org]

Dizzy? Pull yourself together. From Wikipedia [en.wikipedia.org]:

As of 2009, FreeBSD 8.0 is the "bleeding edge" development version, called -CURRENT in FreeBSD development terminology. It will feature superpages [en.wikipedia.org], Xen [en.wikipedia.org] DomU support, network stack virtualization, stack-smashing protection [en.wikipedia.org], TTY layer rewrite, much improved ZFS support and, new USB stack, and rewritten NFS client/server introducing NFSv4. Inclusion of improved device mmap() extensions will allow the technical implementation of a 64-bit Nvidia display driver for the amd64 platform. FreeBSD 8.0 is planned to be released in the 3rd quarter of 2009.

Yahoo! has been using it for years. I have been using it for years. Why FreeBSD? IBM tells you why in 2005.
Why FreeBSD - A quick tour of the BSD alternative [www.ibm.com]

FreeBSD tells you why.
About FreeBSD's Internetworking [www.freebsd.org]

FreeBSD made you a slideshow about 7.0 explaining why.
Introducing FreeBSD 7.0 [people.freebsd.org]

Right now, you might be feeling anxious. You've been using some linux distro with a name you can't even pronounce or spell properly and a confusing package management system. You thought it was the best. Better than Windows right? You don't know anymore. It doesn't make sense. You're not sure if it ever did make sense. The room starts spinning. You realize this must have been a joke. A Penguin? Some guy named Linus? Ubuntu? Why didn't you see the clues? Is this the real life?

Why are you using linux?

27Jun/092

How to install Lighttpd, PHP 5, and MySQL 5.1 on FreeBSD 7.2

Installing MySQL

Before we start, make sure your ports are updated.

portsnap fetch
portsnap update

Right. Now we're ready.

cd /usr/ports/databases/mysql51-server
make install clean

This might take a while. Do something productive.

When that finishes we'll install the database, and make sure user and group mysql is the owner of the database folder.

/usr/local/bin/mysql_install_db
chown -R mysql /var/db/mysql/
chgrp -R mysql /var/db/mysql/

Next, we'll set MySQL to start at FreeBSD boot.

echo 'mysql_enable="YES"' >> /etc/rc.conf

Finally, we'll start MySQL and change the root password to keep out the hackers.

/usr/local/etc/rc.d/mysql-server start
/usr/local/bin/mysqladmin -u root password newpassword

If you did everything right, MySQL should now be running.

Installing Lighttpd

Assuming our ports are still up to date after that MySQL compile, we'll start the Lighttpd install.

cd /usr/ports/www/lighttpd
make install clean

When that's finished, we'll create the document root folder.

mkdir -p /usr/local/www/data

Now, create the access and error log files. Those files are:

/var/log/lighttpd.access.log
/var/log/lighttpd.error.log

Next, we'll give Lighttpd ownership of it's log files.

chown www:www /var/log/lighttpd.access.log
chown www:www /var/log/lighttpd.error.log

Now it's time to open up the Lighttpd configuration file. Great.

vi /usr/local/etc/lighttpd.conf

Uncomment the line "mod_fastcgi", in "server.modules".

Append the following code in the fastcgi module section.

fastcgi.server = ( ".php" =>
( "localhost" =>
(
"socket" => "/tmp/php-fastcgi.socket",
"bin-path" => "/usr/local/bin/php-cgi"
)
)
)

Finally, we'll set Lighttpd to start with FreeBSD.

echo 'lighttpd_enable="YES"' >> /etc/rc.conf

Before we start Lighttpd, Lets install PHP.

Installing PHP

Installing PHP is very easy. Ready? Ok.

cd /usr/ports/lang/php5
make install clean

Now we'll install some PHP extensions. Make sure you check the mysql and or mysqli boxes to be able to use MySQL in PHP.

cd /usr/ports/lang/php5-extensions
make config install clean

Next, copy php.ini-recommended to php.ini.

cp /usr/local/etc/php.ini-recommended /usr/local/etc/php.ini

With PHP installed, we can now start Lighttpd.

/usr/local/etc/rc.d/lighttpd start

It should work now, unless I missed a few steps. Create a phpinfo script in /usr/local/www/data and load it in your browser to test!

27Jun/091

How to install MySQL 5.1 on FreeBSD 7.2

I've had trouble in the past with installing MySQL on FreeBSD. This is the best method I've used that works.

Before we start, make sure your ports are updated.

portsnap fetch
portsnap update

Right. Now we're ready.

cd /usr/ports/databases/mysql51-server
make install clean

This might take a while. Do something productive.

When that finishes we'll install the database, and make sure user and group mysql is the owner of the database folder.

/usr/local/bin/mysql_install_db
chown -R mysql /var/db/mysql/
chgrp -R mysql /var/db/mysql/

Next, we'll set MySQL to start at FreeBSD boot.

echo 'mysql_enable="YES"' >> /etc/rc.conf

Finally, we'll start MySQL and change the root password to keep out the hackers.

/usr/local/etc/rc.d/mysql-server start
/usr/local/bin/mysqladmin -u root password newpassword

Full code:

cd /usr/ports/databases/mysql51-server
make install clean
/usr/local/bin/mysql_install_db
chown -R mysql /var/db/mysql/
chgrp -R mysql /var/db/mysql/
echo 'mysql_enable="YES"' >> /etc/rc.conf
/usr/local/etc/rc.d/mysql-server start
/usr/local/bin/mysqladmin -u root password newpassword

Tagged as: , 1 Comment
26Jun/092

Create a PHP web crawler or scraper in 5 minutes using php-CURL, and php-DOM

This post is in response to the articles floating around the internet named, for the most part, "Create a PHP web crawler or scraper in 5 minutes." Here we use PHP DOM and some more complex methods to create "infinitely extendable web crawler in under 5 minutes." Actually it depends on how fast you can read and copy the code.

The Framework

We start with a blank DOM_Crawler class:

class DOM_Crawler
{
 
}

Next, we'll create the methods to fetch, prepare, load into DOM, and collect links.

class DOM_Crawler
{
	protected $markup = '';
 
	protected $dom = null;
 
	protected $base_url = null;
 
	public function __construct ($url)
	{
 
	}
 
	protected function _load_markup ($url)
	{
 
	}
 
	protected function _prepare_markup ($content, $encod='')
	{
 
	}
 
	protected function _init_dom ()
	{
 
	}
 
	public function get ($type)
	{
 
	}
 
	protected function _get_links ()
	{
 
	}
 
	protected function _base_url ($url)
	{
 
	}
}

Fetching Content

As in the other examples, the constructor will accept a URL then pass it on to the method "_load_markup()." Php-curl is used here instead of file_get_contents() as file_get_contents() is deprecated in favor of using the CURL libraries. "_prepare_markup()" is then used to fix encoding issues before loading into Php-DOM. "_prepare_markup()" is explained in the comments of PHP: DOMDocument::loadHTML - Manual. "init_dom()" will initialize Php-DOM.

	public function __construct ($url)
	{
		$this->base_url = $this->_base_url($url);
		$this->markup = $this->_load_markup($url);
		$this->markup = $this->_prepare_markup($this->markup);
		$this->_init_dom();
	}
 
	protected function _load_markup ($url)
	{
		$ch = curl_init();
		$timeout = 10;
		curl_setopt ($ch, CURLOPT_URL, $url);
		curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
		curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
		$contents = curl_exec($ch);
		curl_close($ch);
		return $contents;
	}
 
	protected function _prepare_markup ($content, $encod='')
	{
		mb_detect_order("ASCII,UTF-8,ISO-8859-1,windows-1252,iso-8859-15");
		if (!empty($content))
		{
			if (empty($encod))
			{
				$encod = mb_detect_encoding($content);
			}
			$headpos = mb_strpos($content,'<head>');
			if ($headpos === false)
			{
				$headpos = mb_strpos($content,'<HEAD>');
			}
			if ($headpos !== false) {
					$headpos += 6;
					$content = mb_substr($content, 0, $headpos) 
. '<meta http-equiv="Content-Type" content="text/html; charset=' . $encod . '">' 
. mb_substr($content, $headpos);
			}
			$content = mb_convert_encoding($content, 'HTML-ENTITIES', $encod);
			return $content;
                }
	}
 
	protected function _init_dom ()
	{
		$this->dom = new DomDocument;
		$this->dom->loadHTML($this->markup);
		$this->dom->normalizeDocument();
	}

Crawling the Content for Data

From the other examples: "Our get() method will accept a $type string which essentially will simply be used to invoke another method actually doing the processing. As you can see below we construct the method name as a string, then make sure it is available so now developers can utilize this simply by invoking $crawl->get('images');."

The "_get_links()" will start by fetching all tags named 'a', then it uses DOMElement::getAttribute to return the actual URL from the "href" attribute. If any link lacks the "base" or "scheme, user, password, host, port," we'll fix it by adding $this->base_url. '#' links are simply not included in the array.

	public function get ($type)
	{
		$method = "_get_{$type}";
		if (method_exists($this, $method))
		{
			return call_user_method($method, $this);
		}
	}
 
	protected function _get_links ()
	{
		if (!empty($this->markup))
		{
			$anchors = $this->dom->getElementsByTagName('a');
			foreach ($anchors as $anchor)
			{
				$href = $anchor->getAttribute('href');
				if (mb_substr($href, 0, 1) == '/')
				{
					$links[] = $this->base_url . $href;
				}
				else if (mb_substr($href, 0, 1) != '#')
				{
					$links[] = $href;
				}
			}
			return $links;
		}
		return false;
	}
 
	protected function _base_url ($url)
	{
		$parsed_url = parse_url($url);
		$base_url = $parsed_url['scheme'] . '://';
		if (!empty($parsed_url['user']))
		{
			$base_url .= $parsed_url['user'];
			if (!empty($parsed_url['pass']))
			{
				$base_url .= ':' . $parsed_url['pass'];
			}
			$base_url .= '@';
		}
		$base_url .= $parsed_url['host'];
		if (!empty($parsed_url['port']))
		{
			$base_url .=  ':' . $parsed_url['port'];
		}
		return $base_url;
	}

Final Web Crawler Code

class DOM_Crawler {
 
	protected $markup = '';
 
	protected $dom = null;
 
	protected $base_url = null;
 
	public function __construct ($url)
	{
		$this->base_url = $this->_base_url($url);
		$this->markup = $this->_load_markup($url);
		$this->markup = $this->_prepare_markup($this->markup);
		$this->_init_dom();
	}
 
	protected function _load_markup ($url)
	{
		$ch = curl_init();
		$timeout = 10;
		curl_setopt ($ch, CURLOPT_URL, $url);
		curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
		curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
		$contents = curl_exec($ch);
		curl_close($ch);
		return $contents;
	}
 
	protected function _prepare_markup ($content, $encod='')
	{
		mb_detect_order("ASCII,UTF-8,ISO-8859-1,windows-1252,iso-8859-15");
		if (!empty($content))
		{
			if (empty($encod))
			{
				$encod = mb_detect_encoding($content);
			}
			$headpos = mb_strpos($content,'<head>');
			if ($headpos === false)
			{
				$headpos = mb_strpos($content,'<HEAD>');
			}
			if ($headpos !== false) {
					$headpos += 6;
					$content = mb_substr($content, 0, $headpos) 
. '<meta http-equiv="Content-Type" content="text/html; charset=' . $encod . '">' 
. mb_substr($content, $headpos);
			}
			$content = mb_convert_encoding($content, 'HTML-ENTITIES', $encod);
			return $content;
        }
	}
 
	protected function _init_dom ()
	{
		$this->dom = new DomDocument;
		$this->dom->loadHTML($this->markup);
		$this->dom->normalizeDocument();
	}
 
	public function get ($type)
	{
		$method = "_get_{$type}";
		if (method_exists($this, $method))
		{
			return call_user_method($method, $this);
		}
	}
 
	protected function _get_links ()
	{
		if (!empty($this->markup))
		{
			$anchors = $this->dom->getElementsByTagName('a');
			foreach ($anchors as $anchor)
			{
				$href = $anchor->getAttribute('href');
				if (mb_substr($href, 0, 1) == '/')
				{
					$links[] = $this->base_url . $href;
				}
				else if (mb_substr($href, 0, 1) != '#')
				{
					$links[] = $href;
				}
			}
			return $links;
		}
		return false;
	}
 
	protected function _base_url ($url)
	{
		$parsed_url = parse_url($url);
		$base_url = $parsed_url['scheme'] . '://';
		if (!empty($parsed_url['user']))
		{
			$base_url .= $parsed_url['user'];
			if (!empty($parsed_url['pass']))
			{
				$base_url .= ':' . $parsed_url['pass'];
			}
			$base_url .= '@';
		}
		$base_url .= $parsed_url['host'];
		if (!empty($parsed_url['port']))
		{
			$base_url .=  ':' . $parsed_url['port'];
		}
		return $base_url;
	}
}

Usage

$links[0] = 'http://www.reddit.com/';
for ($i = 0; count($links) < 100; $i++)
{
	$crawler = new DOM_Crawler($links[$i]);
	$links = _merge_unique($links, $crawler->get('links'));
}
print_r($links);
 
function _merge_unique ($arr, $arr2)
{
	foreach ($arr2 as $key => $value)
	{
		if (!in_array($value, $arr))
		{
			$arr[] = $value;
		}
	}
	return $arr;
}

Returns:

Array
(
    [0] => http://www.reddit.com/
    [1] => http://www.reddit.com/r/funny/
    [2] => http://www.reddit.com/r/gaming/
    [3] => http://www.reddit.com/r/pics/
    [4] => http://www.reddit.com/r/politics/
    [5] => http://www.reddit.com/r/programming/
    [6] => http://www.reddit.com/r/reddit.com/
    [7] => http://www.reddit.com/r/science/
    [8] => http://www.reddit.com/r/technology/
    [9] => http://www.reddit.com/r/worldnews/
    [10] => http://www.reddit.com/r/WTF/
    .....
    [201] => http://www.reddit.com/help/faq
    [202] => http://www.reddit.com/help/reddiquette
    [203] => http://www.reddit.com/bookmarklets/
    [204] => http://www.reddit.com/buttons/
    [205] => http://www.reddit.com/code/
    [206] => http://www.reddit.com/socialite/
    [207] => http://www.reddit.com/widget/
    [208] => http://www.reddit.com/iphone/
    [209] => http://www.reddit.com/blog/
    [210] => http://justin.tv/reddit
    [211] => http://www.reddit.com/ad_inq/
    [212] => http://www.reddit.tv
    [213] => http://www.redditall.com
    [214] => http://www.baconbuzz.com
    [215] => http://reddit.destructoid.com
    [216] => http://www.thecutelist.com
    [217] => http://reddit.independent.co.uk
    [218] => http://www.redditgadgetguide.com
    [219] => http://www.weheartgossip.com
    [220] => http://www.idealistnews.com
    [221] => http://www.wired.com
    [222] => http://www.arstechnica.com
    [223] => http://www.style.com
    [224] => http://www.epicurious.com
    [225] => http://www.concierge.com
    [226] => http://reddit.com/help/useragreement
    [227] => http://reddit.com/help/privacypolicy
    [228] => http://www.reddit.com/feedback
)