Monday, September 10, 2012

How to obtain the “LES” (Last Executed Statement) from an Optimized Core Dump?

How to obtain the “LES” (Last Executed Statement) from an Optimized Core Dump?:
Ever ran into a situation where you saw “some important variable you really needed to know about=<optimized out>” while debugging? Let’s look at an example:
[Roel@qaserver master-data]$ gdb /percona-server/Percona-Server-5.5.25a-rel27.1-285.Linux.x86_64/bin/mysqld ./core.3200 
[...]
(gdb) bt
[...]
#20 handle_select (thd=0x33acd30, lex=0x33ae900, result=0x7f3e840058d0, setup_tables_done_option=1073741824)
at /percona-server/5.5/Percona-Server-5.5.25a-rel27.1/sql/sql_select.cc:312
#21 mysql_execute_command (thd=thd@entry=0x33acd30)
at /percona-server/5.5/Percona-Server-5.5.25a-rel27.1/sql/sql_parse.cc:3138
#22 mysql_parse (thd=thd@entry=0x33acd30, rawbuf=<optimized out>, length=72, parser_state=parser_state@entry=0x7f3ed013f810)
at /percona-server/5.5/Percona-Server-5.5.25a-rel27.1/sql/sql_parse.cc:5809
#23 dispatch_command (command=COM_QUERY, thd=0x33acd30, packet=<optimized out>, packet_length=72)
at /percona-server/5.5/Percona-Server-5.5.25a-rel27.1/sql/sql_parse.cc:1060
[...]

It happens to all of us. This “issue” is seen when using optimized (release) binaries: debug symbols have been stripped out. In non-optimized binaries, the query would (in most cases) have shown directly in the backtrace. Sidenote: “in most cases”: sometimes you have to swap threads (or use: ‘thread apply all bt’ or ‘thread <nr>’ then ’bt’) before you can view the actual crashing statement: gdb may have incorrectly analyzed which thread caused the crash. Inspecting the error log (which also contains a stack trace) may also help in such cases.
So… maybe you were testing a C program with three threads executing highly concurrent DML statements (where each thread executes one particular type of statement) and you are stuck as to which statement is causing your program to crash.. Or you just ran your latest and greatest RQG grammar and found this nifty crashing bug, only to find out you’ll be stuck with hours or even days of “grammar simplification”..
Not so! Read on…
If we did know the “LES” – or Last Executed Statement - in our RQG case for example, we could bring up the server with a copy of the test run’s datadir and try to re-execute the crashing statement to see if it crashes again – a plausible thing to happen. So, how do we make this happen if the query is not directly visible from the stack trace?
Let’s fire up gdb and see what we find. (We will assume the source code used for compiling is available at the same location as where it was at compilation time)
First we inspect the mysql_execute_command frame:
(gdb) frame 21 
#21 0x00000000005a30af in mysql_execute_command (thd=thd@entry=0x33acd30)
    at /percona-server/5.5/Percona-Server-5.5.25a-rel27.1/sql/sql_parse.cc:3138
3138  res= handle_select(thd, lex, sel_result, OPTION_SETUP_TABLES_DONE);
(gdb) list
3133          &lex->update_list,
3134         &lex->value_list,
3135          lex->duplicates,
3136          lex->ignore)))
3137       {
3138  res= handle_select(thd, lex, sel_result, OPTION_SETUP_TABLES_DONE);
3139         /*
3140           Invalidate the table in the query cache if something changed
3141           after unlocking when changes become visible.
3142           TODO: this is workaround. right way will be move invalidating in

Looking at line 3138 (or by inspecting the stack trace above), we may expect the query to be present in the thd variable. Looking further, we can see that thd is being referenced in all frames, and that it is passed to handle_select. Let’s see what it contains:
(gdb) p thd 
$1 = (THD *) 0x33acd30

Ok, so it’s a pointer to an array’s memory address. Using * before the address allows us to see the contents of the thd array:
(gdb) p *(THD *) 0x33acd30
$5 = {<Statement> = {<ilink> = {_vptr.ilink = 0xf4a670, prev = 0x7f3e9800d660, next = 0x33a5c10},
<Query_arena> = {_vptr.Query_arena = 0xf4a6a8, free_list = 0x7f3e84005f10, mem_root = 0x33afac0,
state = Query_arena::STMT_CONVENTIONAL_EXECUTION}, id = 0, query_strip_comments = {buffer = 0x0,
length = 0, buffer_length = 0}, mark_used_columns = MARK_COLUMNS_READ, name = {str = 0x0,
length = 0}, lex = 0x33ae900, query_string = {string = { str = 0x7f3e84004ba0
"INSERT INTO testdb_N . t1_temp1_N SELECT * FROM test.table10_int_autoinc", length = 72},
cs = 0x101e980}, db = 0x7f3e84004b70 "test", db_length = 4}, <Open_tables_state> = {
m_reprepare_observer = 0x0, open_tables = 0x7f3eac0085d0, temporary_tables = 0x7f3e84034a50,
LOCK_temporary_tables = {m_mutex = { mutex = [...]

Great! The query is showing in there: “INSERT INTO testdb_N . t1_temp1_N SELECT * FROM test.table10_int_autoinc”.
Instead of using a numeric memory address, we could also use:
(gdb) p *thd

Gdb knows the variable type, so we don’t need the typecast to “(THD *)”.
Or, better yet, now that we know where the query string is hiding (query_string > string > str), we can use:
(gdb) p thd->query_string.string.str

In the future.
Conclusion: the Last Executed Statement was found in an optimized core dump by checking the contents of the thd array in the mysql_execute_command frame. By simply examining “surrounding” variables we’ve been able to find the information we needed. You can try this technique yourself the next time you need to find out exactly which query is causing you trouble!

DIGITAL JUICE

No comments:

Post a Comment

Thank's!