Table based word replace in MySQL

The REPLACE() function in MySQL works quite well but I recently came across a situation where I wanted only replace matching whole words. Not only that but I also wanted to conduct a table based dictionary search and replace for all matching words. To answer my needs, I wrote a simple MySQL Function. The idea is to pass a phrase to the function and the function in turn would search and replace all occurrences of target words from the dictionary with new words.

First build and populate the dictionary table:

CREATE TABLE YOURDATABASE.dictionary (
`wordID` int(11) NOT NULL AUTO_INCREMENT
,`strBeforeWord` varchar(45) NOT NULL
,`strAfterWord` varchar(45) NOT NULL
PRIMARY KEY (`wordID`)
) ENGINE=MyISAM;

INSERT INTO YOURDATABASE.dictionary(strBeforeWord,strAfterWord)
VALUES ('quick','slow')
,('brown','green')
,('fox','turtle')
,('jumped','climbed')
,('over','under');

Now we build the function.

DELIMITER $$

CREATE FUNCTION YOURDATABASE.ReplaceWords(OrigValue VARCHAR(255)) RETURNS varchar(255) CHARSET latin1
  BEGIN
    DECLARE done int DEFAULT(0);
    DECLARE BeforeValue varchar(255);
    DECLARE AfterValue varchar(255);
    DECLARE ReturnValue varchar(255);
    DECLARE ReplaceList CURSOR FOR SELECT strBeforeWord, strAfterWord FROM YOURDATABASE.dictionary;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
    OPEN ReplaceList;
    SET ReturnValue = OrigValue;
    REPEAT
        IF NOT done THEN
            FETCH ReplaceList INTO BeforeValue, AfterValue;
            SET ReturnValue = REPLACE(REPLACE(ReturnValue,' ','|'), CONCAT('|',TRIM(BeforeValue),'|'), CONCAT(' ',AfterValue,' '));
        END IF;
        UNTIL done
    END REPEAT;
    CLOSE ReplaceList;
    SET ReturnValue = TRIM(REPLACE(ReturnValue,'|',' '));
    RETURN ReturnValue;
END$$

There you have it. To call this function simply run something like this:

SELECT YOURDATABASE.ReplaceWords('The quick brown fox jumped over the fence');

The resulting replaced statement will look like this:

The slow green turtle climbed under the fence

There are a few things that I might do to make this a bit more efficient such as add some sort of WHERE statement to the ReplaceList cursor to cut down the rows required to loop. I might also come up with a more complicated word delimiter, currently the pipe ‘|’ in the above example.

Enjoy!

Avoid MySQL “Copying to tmp table”

I was recently asked to look into some MySQL performance issues. As expected, they conducted a simple package install and never tinkered with the default configuration. This post does not dig into the multitude of configuration settings with the exception of the tmp_table_size variable. If you have large tables (millions of rows) you will invariably end up wanting to execute a query conducting some sort of DISTINCT, ORDER BY, or GROUP BY function. When this happens, your 10 second query turns into a 5 minute query. This happens because MySQL goes from using a memory table (fast), to converting the memory table to a MyISAM table on disk (slow).

Before we dive into the tmp_table_size setting, check to be sure MySQL is using a temporary table in the first place. Do this by using EXPLAIN on your query. The last column Extra, will say something like “Using temporary; …”.

EXPLAIN SELECT DISTINCT element
FROM database.table
GROUP BY element;

Depending on your MySQL configuration, the use of temporary tables can be a major bottleneck. To increase the temp table space size execute the following SET statement before running your SELECT.

-- set the temp memory size to 512MB
SET tmp_table_size = 512000;

-- now run your SELECT
SELECT DISTINCT account
FROM database.largetable
WHERE account LIKE 'themanbehind%'
ORDER BY account;

This will help keep the processing, sorting, and grouping in memory (fast) and off of disk (slow). Of course the increase is only valid during your connection. If you start a new connection, you have to re-establish the new tmp_table_size value again. If you find yourself constantly increasing this, you may want to think about increasing the default size in your my.conf file.

As with everything, your results may vary so I encourage you to explore different sizes until you find the sweet spot.

Forcing Case Senstivie GROUP BY in MySQL

Yesterday I posted about adding case sensitivity to your LIKE statements. Today I want to discuss adding case sensitivity to your GROUP BY statements.

As with the MySQL LIKE statement, the GROUP BY is case insensitive and can sometimes cause you to pull your hair out when you have table entries with the same named instance, but different. An of this would be something like an asset table.

ID Element
1 ten01
2 ten02
3 Ten01
4 ten04
5 TEN01
6 TEN02

Our objective is to count the number of distinctly similar elements. Because each element is not unique a simple GROUP BY statement wont work, the results would be a simple table dump of all of the rows.

To accomplish our goals, we are going to use the MySQL HEX() string function. The SELECT statement uses the HEX function within the group by and in our case we only want the first three characters compared so in combination with HEX we will use the LEFT() function as well.

SELECT DISTINCT LEFT(Element,3) AS Element
  ,COUNT(Element) AS intCount
FROM database.table
GROUP BY HEX(LEFT(Element,3));

The results of this SELECT return a distinct count of each distinct element.

Element intCount
TEN 2
Ten 1
ten 3

As you can see we now have an accurate count of distinct Element names. There are plenty of other functions similar to HEX such as MD5 however HEX is the fastest and easiest to implement.

Good luck and happy querying!

Forcing Case Senstivie LIKE in MySQL

I like LIKE, its great it will give you all kinds of records returned such as A = a. However from time to time you may want to force some case sensitivity. Using MySQL you can force case sensitivity several ways.

The easiest way is to conduct a BINARY LIKE search.

-- this returns 1 (true)
SELECT 'TheManBehindTheCode' LIKE 'themanbehind%';

-- this returns 0 (false)
SELECT 'TheManBehindTheCode' LIKE BINARY 'themanbehind%';

Other methods involve using COLLATE operator with some sort of latin1_general_cs or latin1_bin collation. For this to work, you have to make sure you have the various COLLATION character sets installed.

SELECT COLLATION(VERSION());

Here are a few simple examples, using the COLLATE function:

SELECT 'TheManBehindTheCode' COLLATE latin1_general_cs LIKE 'themanbehind%';
SELECT 'TheManBehindTheCode' LIKE 'themanbehind%' COLLATE latin1_general_cs
SELECT 'TheManBehindTheCode' COLLATE latin1_bin LIKE 'a%'
SELECT 'TheManBehindTheCode' LIKE 'a%' COLLATE latin1_bin

If you find yourself always having to convert your LIKE lookups to binary, you should consider declaring the search column with case sensitivity or binary collation.

CREATE TABLE [database].[table name](
  [column name] CHAR(20) CHARACTER SET utf8 COLLATE utf8_bin
);

So, next time your trying to separate the ‘TheManBehind%’ from ‘themanbehind%’ using LIKE don’t forget the cool tricks you just learned here at TheManBehindTheCode.com.

MySQL’s Error 127 when reading table

As we all know, MySQL tables are the most critical component of the database that allows you to save data in an organized form. As a result, you can easily access and get effective query results. But sometimes MySQL table behaves unexpected behavior and give results as error messages in MySQL error log. The log, mysqld.log is most commonly found, on Linux systems, in the /var/log directory.

Recently I’ve been building a slowly changing dimension when I started noticing a number of my processes started failing with an error message of:

[ERROR] Got error 127 when reading table

When this error starts populating your logs, you will be unable to access any information from your MySQL table. The fix is easy, simply run a REPAIR TABLE on the affected table and your issue should be solved.

mysql>REPAIR TABLE <database>.<tablename>

But simply fixing the table wasn’t enough. I wanted to know what caused the table corruption in the first place. After a bit of research it looks like there are a number of possible reasons.

  • MySQL process is killed unexpectedly
  • Hardware issues
  • MySQL code or MyISAM/InnoDB code is suffering from bugs
  • An external program and server is modifying the same table at the same time
  • Unexpected system shutdown

In my case I am using a MyISAM table with concurrent writes enabled. At any given time I had 10 – 15 processes attempting to read and write 1000+ rows to the table every second. The MySQL database could handle this for about 10 – 15 minutes and then it would mark the table as crashed, causing all of my processing to fail. My fix was to cut back on the processing and stop abusing the table so much. Table corruptions solved.

“server gone away” with Workbench 5.2.29

If your using MySQL Workbench 5.2.29 CE and you start getting the following error message around 30 seconds after your query starts.

Error Code: 2013  Lost connection to MySQL server during query
Error Code: 2006  MySQL server has gone away

And you look at the server itself there are no message in the logs indicating there were any issues. Don’t be fooled if you run the same SQL statement a second time because the server will have cached the results from the first execution that seems to have failed.

After a little searching I found a very helpful link. Evidently this is a known bug identified here: Bug 57449

Interestingly enough the fix is to simply download and replace this file found in your install directory. For me on my Windows 7 machine the file can be found here:

C:\Program Files (x86)\MySQL\MySQL Workbench 5.2 CE\modules\data

The downloaded XML file, by default, increased the old Read Timeout from 30 seconds to 600 seconds.

For the changes to take effect restart MySQL Workbench and run the same long query. Granted, if your waiting for longer than 30 seconds, I would suggest reviewing your query and table structure, 30 seconds is a long time to wait.

GROUP_CONCAT() MySQL Function

GROUP_CONCAT() function is used to concatenate column values into a single string. It is very useful if you would otherwise perform a lookup of many row and then concatenate them on the client end.

For example if you query:

mysql> SELECT strApplication FROM tblApplications WHERE strAppType = 'desktop';

It outputs:

strApplication
Outlook
Excel
Word
PowerPoint

To concatenate the values into a single string, you query:

mysql> SELECT GROUP_CONCAT(strAppliation) As strApplication
> FROM tblApplications
> WHERE strAppType = 'desktop';

Then the output will be:

strApplication
Outlook, Excel, Word,PowerPoint

You can also use some format of GROUP_CONCAT(). Like

  • SELECT GROUP_CONCAT( strApplication SEPARATOR ‘-’ )… It will use ‘-’ instead of ‘,’
  • SELECT GROUP_CONCAT( strApplication ORDER BY strApplication DESC )… To change the order and shorting output

One thing to remember: GROUP_CONCAT() ignores NULL values.

Create a Calendar table in MySQL

Make a calendar table
You need a calendar table for joins to datetime data in other tables:

create table calendar ( dt datetime primary key );

An elegant method of generating any desired number of sequential values, posted by Giuseppe Maxia on his blog, is …

* Create three dummy rows in a View.
* Cross join them to make 10 dummy rows.
* Cross join those to make 100, 1,000 or however many you need.

So to give the calendar table a million rows at one-hour intervals starting on 1 Jan 1970:

create or replace view v3 as select 1 n union all select 1 union all select 1;
create or replace view v as select 1 n from v3 a, v3 b union all select 1;
set @n = 0;
drop table if exists calendar;
create table calendar(dt datetime primary key);
insert into calendar
select cast('1970-1-1 00:00:00' + interval @n:=@n+1 hour as datetime) as dt
from v a, v b, v c, v d, v e, v; 

If you prefer to do it one query without user variables, or if your MySQL version does not support Views, make a general-purpose utility table of integers 0 through 9. We keep such utility objects in a sys database and make it available to all users:

create table sys.ints(i tinyint);
insert into sys.ints values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

From sys.ints you can generate any desired sequence of dates or datetimes. For example to report monthly sales totals for 2009 from the table sales(date d, amount decimal(10,2)), write:

select a.yearmonth, sum(ifnull(b.amount,0)) as total
from (
select concat( '2009-', lpad((u.i + t.i * 10), 2, 0)) as yearmonth
from sys.ints u, sys.ints t
where (u.i + t.i * 10) between 1 and 12
) a
left join sales b on a.yearmonth=date_format(b.d,'%Y-%m')
group by a.yearmonth;

Generate a thousand daily dates starting today:

SELECT CURDATE() + INTERVAL t.i*100 + u.i*10 + v.i DAY AS Date
FROM sys.ints AS t
JOIN sys.ints AS u
JOIN sys.ints AS v
WHERE ( t.i*100 + u.i*10 + v.i ) < 1000
ORDER BY Date;

This can be done down to the level of seconds:

DROP TABLE IF EXISTS SECS;
CREATE TABLE SECS
SELECT SEC_TO_TIME( t.i*10000 + u.i*1000 + v.i*100 + w.i*10 + x.i ) AS sec
FROM sys.ints AS t
JOIN sys.ints AS u
JOIN sys.ints AS v
JOIN sys.ints AS w
JOIN sys.ints AS x
WHERE SEC_TO_TIME( t.i*10000 + u.i*1000 + v.i*100 + w.i*10 + x.i ) < '24:00:00'
ORDER BY sec;

You can use such a query as a View, as an inline derived table, or as input to a CREATE TABLE statement.

A slightly more elaborate method, giving the calendar table an auto-increment key that can also be used as a surrogate for datetime interval calculations:

CREATE TABLE calendar (
id INT AUTO_INCREMENT PRIMARY KEY,
date DATE,
UNIQUE days (date)
);

Calculate the number of days needed in the calendar, eg

SELECT DATEDIFF('2011-12-31','2011-01-01');   # 7670, or 21*365 plus 5

Find a table with that many rows, 7670 in this case. Add a row to the calendar table for every day in the range:

INSERT INTO calendar (id)
SELECT NULL FROM [name of table with 7670  rows] LIMIT 4018;

Populate the date column by incrementing the starting date:

UPDATE calendar SET date = ADDDATE('2011-01-01',id);

The calendar table now has one row for each day from 2011-01-01 through 2011-12-31. Keep the auto_increment ID column for quick day counts in the range, or drop the column if you don’t need that.

To make the calendar table a diary, make the period one leap year, add month, day and text columns, update month and day values with MONTH(date) and DAYOFMONTH(date) respectively, and if the diary is to be used from year to year, drop the date field.

To automate all this, write a stored procedure, for example:

CREATE TABLE times (
date_hour DATETIME,
KEY ( date_hour )
);
DROP PROCEDURE IF EXISTS timespopulate;
DELIMITER |
CREATE PROCEDURE timespopulate( startdate DATETIME, num INT )
BEGIN
DECLARE ctr INT DEFAULT 0;
WHILE ctr < num DO
BEGIN
INSERT INTO times VALUES ( DATE_ADD( startdate, INTERVAL ctr HOUR) );
SET ctr = ctr + 1;
END;
END WHILE;
END;
|
DELIMITER ;
CALL timespopulate( '2011-1-1, 31*24 );

Or, you can have the sproc do your counting:

DROP PROCEDURE IF EXISTS calendar;
DELIMITER |
CREATE PROCEDURE calendar( pstart datetime, pstop datetime, pminutes int )
DETERMINISTIC
BEGIN
DECLARE thisdate datetime;
DROP TABLE IF EXISTS cal;
CREATE TABLE cal( dt datetime );
SET thisdate=pstart;
INSERT INTO cal VALUES(pstart);
WHILE thisdate < pstop DO
SET thisdate = adddate( thisdate, INTERVAL pminutes MINUTE );
INSERT INTO cal VALUES( thisdate );
END WHILE;
END |
DELIMITER ;
-- make cal for 2011, 20-min intervals:
CALL calendar('2011-1-1 00:00:00', '2011-2-1 00:00:00', 20); 

FIX MyISAM MYD file not found (Errcode:2)

I know I’ve previously mention how solid and stable MySQL MyISAM tables are and this post is no different. On occasion an OPTIMIZE or REPAIR TABLE might fail or get interrupted causing the error “MYD file not found (Errcode:2)“. Common symptoms are:

The following error keeps streaming in the mysql.log

110327 14:04:16 [ERROR] /usr/libexec/mysqld: Table './schema/tablename' is marked as crashed and should be repaired
110327 14:04:16 [ERROR] Couldn't repair table: schema.tablename

When attempting to conduct a OPTIMIZE TABLE you get something like this:

mysql> OPTIMIZE TABLE schema.table;
Incorrect key file for table 'tablename'; try to repair it
1 row in set, 2 warnings (0.25 sec)

When attempting to conduct a REPAIR TABLE schema.table you get something like this:

mysql> REPAIR TABLE schema.tablename;
File './schema/tablename.MYD' not found (Errcode: 2) |
1 row in set, 2 warnings (0.25 sec)

The Fix:

Look for the file /mysql/schema/tablename.TMD. Simply rename this file to tablename.MYD. If for some reason there is already an MYD file rename the TMD file to BKP. Once that’s done, run a:

REPAIR TABLE schema.tablename;

If that doesn’t work you may have to simply drop the table and restore from a backup.

Good luck and happy coding.

How to repair MySQL MyISAM tables.

MyISAM tables are arguably the most solid table type available to MySQL at the moment, there are close competitors of course but when having to combine stability, speed, concurrency, and recoverability together I feel MyISAM take the cake. Talking about how great MyISAM tables are really doesn’t lend much value to this post so lets talk about on the rare occasion a MyISAM table becomes corrupt and requires some sort of repair.

In Process using REPAIR TABLE

There are two tools that are used to repair a MyISAM table. The first, REPAIR TABLE, is nothing more than a command run within the MySQL environment. The common syntax is:

REPAIR TABLE db_name.tbl_name [QUICK] [EXTENDED];

An example of this is:

REPAIR TABLE dbTMBTC.tblComments;

The MySQL Manual for the REPAIR TABLE syntax is a great resource for more information about this command.

As a general rule, I don’t use QUICK or EXTENDED for performance and stability reasons. Using the QUICK option is good when you are scripting the use of a REPAIR TABLE. The EXTENDED option should be used if a normal repair fails. Be prepaired for your overall system performance to suffer and the total duration to be extended by over double the normal time. Depending on the size of your table and the size of the MyISAM Sort Buffer Size a repair on a large table should take 10 to 30 minutes.

As repairs are executing its a good idea to keep tabs on progress and that can easily be done by opening a second MySQL connection and running

SHOW FULL PROCESSLIST;

If you see your repair running with a status of REPAIR WITH KEYCACHE and you dont have three or four days to wait for the table to be fixed, go ahead and KILL the MySQL thread running the repair. Don’t worry about messing something up, remember the table is already as crashed or corrupt! What is happening is the repair process is using your disk to read and write the table pages while repairing. Its a detrimental to performance. Luckily you can increase the myisam_sort_buffer_size at the connection level. The following command will set the sort buffer size to 2GB.

mysql> SET myisam_sort_buffer_size=20240000;

I’ve successfully repaired tables that have an overall size 2 – 3 times this size without having the process switch over to the keycache method.

Once you’ve increased the sort buffer size, run your repair again. When its done the output will look something like

+------------------+--------+----------+---------------------------+
| Table            | Op     | Msg_type | Msg_text                  |
+------------------+--------+----------+---------------------------+
| db_name.tbl_name | repair | warning  | Number of rows changed ...|
| db_name.tbl_name | repair | status   | OK                        |
+------------------+--------+----------+---------------------------+
2 rows in set (13.66 sec)

Command Line using myisamchk

Up to this point we’ve covered the method for repairing a MyISAM table while the MySQL server is running. The next method is useful when you can bring down the server in order to dedicate processing power to your repair.

The command myisamchk does many things such as provide general table information, checking for corruption, and table repairs. The MySQL Manual has a full breakdown of the tool and its various functions.

$ myisamchk --recover db_name.tbl_name;

Just like the previous section, if this is a large table you may want to increase the sort_buffer_size variable to something like 2GB.

$ myisamchk --recover --sort_buffer_size=20240000 db_name.tbl_name;

If, for what ever reason, the –recover option fails to fix the problem you can use the –safe-recover option. This, like the EXTENDED option will take considerably longer to complete so use as a last resort.

Depending on your constraints, thankfully MySQL offers several methods for repairing a table with little to no loss in data.