前言

本文记录总结了笔者曾经参与的sc770x项目，基于 Trace32 Simulator 定位系统稳定性问题，这是 Lauterbach 公司推出的一款嵌入式系统调试工具，支持多种 CPU 和 RTOS 调试，拥有很强的扩展性，支持CMM脚本扩展。

工具的使用

一般死机会保存对应的 ass 和 mem文件，打开 Trace32 Simulator ，导入 sc770x_simulator.cmm 脚本，加载当前系统版本对应的 axf 符号表文件。此时会提示输入 mem 地址，根据死机时保存的 ass 文件确定为0，然后再选择对应的 mem 文件，一般直接能够看到从 thread_entry 到死机现场的函数回调。如果不幸没有看到完整的回调，那么就需要自己进行推导，类似下面的情况。

查看死机状态时 _tx_thread_current_ptr 的值，根据 tx_thread_name 得知当前是 T_MIDI Task 。

_tx_thread_current_ptr = 0x01D8FADC -> (
  tx_thread_id = 1414025796,
  tx_run_count = 20,
  tx_stack_ptr = 0x027D6E88,
  tx_stack_start = 0x027D68B8,
  tx_stack_end = 0x027D7CB3,
  tx_stack_size = 5116,
  tx_time_slice = 0,
  tx_new_time_slice = 0,
  tx_ready_next = 0x01D8FADC,
  tx_ready_previous = 0x01D8FADC,
  tx_thread_name = 0x080467A8 -> "T_MIDI",
  tx_priority = 74,
  tx_state = 0,
  tx_delayed_suspend = 0,
  tx_suspending = 0,
  tx_preempt_threshold = 74,
  tx_priority_bit = (0, 0, 1024, 0, 0, 0, 0, 0),
  tx_thread_entry = 0x0035B4F7,
  tx_entry_parameter = 66011532,
  tx_thread_timer = (tx_remaining_ticks = 0, tx_re_initialize_ticks = 0, tx_timeout_function =
  tx_suspend_cleanup = 0x0,
  tx_suspend_control_block = 0x0,
  tx_suspended_next = 0x0,
  tx_suspended_previous = 0x0,
  tx_suspend_info = 0,
  tx_additional_suspend_info = 0x0,
  tx_suspend_option = 0,
  tx_suspend_status = 0,
  tx_created_next = 0x00C1BBA4,
  tx_created_previous = 0x01D8F884,
  tx_filex_ptr = 0x0,
  time = 0,
  tx_thread_stack_highest_ptr = 0x027D6CF8)

当前 Task 的栈的起始地址是 0x027D68B8，结束地址是 0x027D7CB3 ，查看对应地址的 data.dump 内存窗口。

1
2
3

tx_stack_start = 0x027D68B8,
tx_stack_end = 0x027D7CB3,
tx_stack_size = 5116,

从 ass 文件中查看死机前 SVC 模式下寄存器的值，以 R13_SVC 为起始地址，按照函数调用的堆栈原则推导。

1
2
3

> SVC mode:
R13 = 0x027d6f70    R14  = 0xfffffffc
SPSR = 0x20000033

找到以0x08开头的函数地址，将其作为 R14，函数压栈结束地址作为 R13，正向推栈的方法是 tx_stack_end -> R13_SVC，反向推栈的方法是 R13_SVC -> tx_stack_end。当我们按照这个方法，就能获得完整的 callback 。

-001|MIDI_Core_Calculation_dls_stereo(
    |  ?,
    |    sample_pool_ptr = 0x027D95D6,
    |    sample_pool_r_ptr = 0x027D9716,
    |    spn = 0)
    |  pContext = 0x027D9478
    |  VoiceChnPtr = 0x027DA52C
    |  pcm_count = 74
    |  temp_pcm_l = 0x027DBD94
    |  temp_pcm_r = 0x027DC014
    |  update = 1
    |
    |
-002|MIDI_Core_Generate_Samples(
    |  ?,
    |  ?)
    |  pContext = 0x027D9478
    |  samples_num = 640
    |  cur_samples = 160
    |
    |
-003|midi_decode(inline)
-003|midi_play(
    |  ?,
    |    midi_dataPtr = 0x027D7634)
    |  pContext = 0x027D9478
    |  midi_event = ((start_time = 0, event_type = 255, isReady = 0, sub_event_type = 47, pa
    |  midi_event_to_play = 0x027D7060
    |  bytes_processed = 1
    |  result = 0
    |  cur_midi_event = (start_time = 35, event_type = 144, isReady = 0, sub_event_type = 0,
    |  min_time = 18
    |  track_end_num = 1
    |
    |
-004|MIDI_PlayMidi(
    |  ?,
    |    mididata = 0x00CD5A58,
    |    file_length = 5656)
    |  pContext = 0x027D9478
    |  midi_track_data = (track = ((file_start = 22, length = 37, buffer_addr = 0x00CD5A6E,
    |  PPQN = 120
    |
    |
-005|MidiPlayFile(inline)
-005|MIDI_Thread_Entry(
    |  ?,
    |  ?)
    |  audio_obj = 0x0278E944
    |  sig_in_ptr = 0x0
    |  sig_out_ptr = 0x0
    |  ptMidiProcRes = 0x03F8FD80
    |  time_offset = 0
    |  samples_offset = 0
    |  result = 0
    |  midi_res_ptr = 0x03F8FD80
    |
    |
-006|ThreadEntry(
    |  ?)
    |  thread_entry = (entry = 0x08046340, argc = 41478468, argv = 0x0)
    |
    |
-007|tx_thread_shell_entry()
    |
    |
 ---|end of frame

我们可以根据 callback 获得死机步骤场景和参数（但参数并不一定准确），推导时如果无法推进就进行参数代入试验和猜测再反向验证的方法，以便使到 callback 向更深推进，以达到死机现场，便于进一步的分析问题。

问题分析

一般根据 ass 文件提示的 assert 信息，能够大概了解死机的原因。

最基本的 assert 信息，根据打印信息可以知道问题代码，直接分析原因。

1	File:nfc_drv_v1.c Line: 501 PASSERT(0) > NFC timeout happened!f=4,e=0

内存泄漏

一般是使用 SCI_ALLOC 申请内存后，却没有调用 SCI_FREE 释放对应内存，导致内存泄漏。

Threadx 在内存管理中定义了 block pool 和 byte pool 两种概念，byte pool 设置固定大小的内存池，支持可变的内存申请释放，但是会存在碎片化问题，而block pool 设定不同大小的 block，方便快速申请释放，不会存在碎片化问题，性能上比 byte pool 会更有效率。

笔者曾根据 byte pool 和 block pool 分配的规则，编写了 mocor_byte_pool_list.cmm 和 mocor_block_pool_list.cmm 脚本，用来检查 heap 和 pool 两种内存空间的分配是否存在异常，有助于分析内存覆盖等问题。

通过 ArmLogel trace 定位

手机连接 ArmLogel 工具，通过［SysInfo］->[Memory Status] 和［SysInfo］->[Memory Allocated Status] 将内存池信息和内存分配情况打印出来。
打开应用测试，然后通过相同方式保存内存信息。
如此重复多次，对比保存文件中的内存信息差异，确定是否有规律可循。如果存在内存泄漏，可通过Allocated memory info 信息对比查找泄漏源。

通过工具 Assert 定位

出现内存问题时，查看 Assert 后的 ass 文件，打印的 Assert 信息的过程实际上是对当前系统处于 Assert 状态时的 Pool 内存和 Heap 内存信息检查的过程，通过观察内存信息输出的完整性检查，基本可以看出来系统的内存是否正常。

Assert 后弹出界面有如下信息：

按5输出内存池 Pool 和 Heap 使用情况，如果有大批内存池耗光（Avail_Num 为0），有可能是内存泄漏问题。

按4输出内存池的详细分配信息，可以查找哪些文件在大量申请内存，观察规律是否存在内存泄漏。

内存越界

内存越界是由于编程不当导致的内存越界覆盖，实际使用的内存空间大于申请的内存范围，比较常见的是 memcpy 时，数据源太长，导致内存end_flag被覆盖。此类问题需要找到发生内存越界之前的内存块，仔细分析该内存块的使用序列，是否有可能造成内存越界访问。

block pool

比如下面这种 block pool 越界的情况：

-000|TXAS_SaveMainReg(
    |  ?,
    |    cur_lr = 0x0108D1C2,
    |    cur_pc = 0x80)
-001|TXAS_SystemAssert(
    |    exp = 0x0021229C,
    |    file = 0x00212224,
    |    line = 118,
    |    assert_info_ptr = 0x0)
    |  assert_mode = 1
    |  cur_sp = 36771216
    |  cur_lr = 3422259
    |  cur_pc = 7799016
    |  i = 0
-002|system_fatal_error_handler(
    |  ?,
    |  ?,
    |  ?,
    |  ?,
    |  ?)
-003|osa_fatal_error_handler_info(
    |    error_message_ptr = 0x0021229C,
    |    error_code = OSA_ERROR_BUFFMNGR_ISVALID_FAILED,
    |    os_error_code = 101,
    |    file = 0x00212224,
    |    line = 118)
-004|osa_validate_buff_footer(
    |    usr_buff = 0x03FD908C,
    |    curr_alloc_size = 1052)
-005|osa_int_release_buffer(
    |  ?,
    |  ?,
    |    dealloc_file = 0,
    |    dealloc_line = 1050)
    |  buff_hdr_p = 0x03FD906C
-006|SCI_Release_Buffer(
    |    buff_ptr = 0x03FD908C,
    |    entity_id = 3,
    |    dealloc_file = 0,
    |    dealloc_line = 1050)
    |  debug_buff_ptr = 0x03FD908C
-007|SCI_Free(
    |    memory_ptr = 0x03FD908C)
    |  free_ptr = 0x00345A95
-008|SendMultiPicByBt()
    |  pic_info = (filename = (68, 58, 92, 80, 104, 111, 116, 111, 115, 92, 68, 83, 67, 95
    |  send_file_info = 0x03FD9090
    |  list_ctrl_id = 2359329
    |  send_file_num = 2
    |  i = 3
    |  total_num = 3
-009|HandleShareItemsPiclistOptWinMsg(
    |    win_id = 2359302,
    |  ?,
    |  ?)
    |  title_str = (wstr_ptr = 0x0, wstr_len = 0)
    |  result = 1
    |  ctrl_id = 2359331
    |  list_ctrl_id = 2359329
    |  menu_id = 2359331
    |  group_id = 2359299
    |  kstring = (wstr_ptr = 0x0, wstr_len = 0)
-010|MMK_RunWinProc(
    |  ?,
    |    msg_id = 57345,
    |    param = 0x03E983C8)
-011|MMK_DispatchToHandle(
    |    handle = 1103560803,
    |    msg_id = 57345,
    |    param_ptr = 0x03E983C8)
    |  openwin_handle_result = 0
    |  old_handle = 16711680
-012|MMK_DispatchWinMSG(
    |    mmi_msg_ptr = 0x03FA1300)
-013|MMK_DispatchMSGQueue(
    |  ?)
-014|thread_entry_P_APP(
    |  ?,
    |  ?)
    |  receiveSignal = 0x0
    |  mmi_msg = 0x03FA1300
    |  ticks1 = 0
    |  ticks2 = 0
    |  is_log_on = 0
    |  time_period = 4294967295
    |  watchdog_ptr = 0x03ECF2EC
-015|ThreadEntry(
    |  ?)
    |  thread_entry = (entry = 0x0833AF73, argc = 0, argv = 0x0)
-016|tx_thread_shell_entry()
 ---|end of frame

在释放 0x03FD908C 内存地址空间时，函数 osa_validate_buff_footer 检查 block pool 的时候发现找不到 end flag 导致 assert。

根据 threadx 内存分配原则，0x03FD94A8 地址内存的值应该是 0xF2F2F2F2，但是实际上是 0x003A0044。分析这个代码逻辑，发现原因是 SendMultiPicByBt 中 send_file_num 和 total_num 不相等导致内存越界。

byte pool

还有一种 byte pool 越界情况：

-000|tx_byte_pool_search_ex(
    |    pool_ptr = 0x00B68A14,
    |    memory_size = 552)
    |  interrupt_save = 536870912
    |  current_ptr = 0x80818081
    |  examine_blocks = 119
    |  alloc_ptr = 0x025F385C
    |  min_size = 32
    |  section_limit_space_addr = 0x0
    |  section_limit_space_size = 4294967295
    |  section_high_space_addr = 0x0
    |  section_high_space_size = 4294967295
    |  search_cnt = 484
-001|tx_byte_allocate(
    |    pool_ptr = 0x00B68A14,
    |    memory_ptr = 0x02320998,
    |  ?,
    |    wait_option = 2863311530)
    |  thread_ptr = 0x01CBAAD8
-002|txe_byte_allocate(
    |  ?,
    |  ?,
    |  ?,
    |  ?)
-003|SCI_MallocEx(
    |    size = 520,
    |    type = 1145324612,
    |    file = 0x08652448,
    |    line = 143)
    |  memory_ptr = 0x0
    |  num_free_buffs = 0
    |  cur_pool_inx = 14
    |  alloc_size = 549
    |  byte_mem_header_ptr = 0x0
-004|allocatePName(
    |    pName = 0x023210EC)
-005|SFS_CreateFileInternal(
    |    file_name = 0x023210EC,
    |    access_mode = 49,
    |    share_mode = 0,
    |    file_attri = 0)
    |  __func__ = (83, 70, 83, 95, 67, 114, 101, 97, 116, 101, 70, 105, 108, 101
    |  handle = 0
-006|DrmParseDCF(
    |  ?)
    |  ret_val = 0
-007|DrmCheckIsDRM(
    |    file_name_ptr = 0x023210EC)
    |  ret_val = 0
-008|DRM_CreateFile(
    |    file_name_ptr = 0x023210EC,
    |    access_mode = 49,
    |    share_mode = 0,
    |    file_attribute = 0)
    |  drm_handle_f = 2555905
    |  drm_file_ptr = 0x014D1904
    |  drm_io_handle = 0
-009|SFS_CreateFile(
    |  ?,
    |  ?,
    |  ?,
    |  ?)
-010|MMIFILE_CreateFile(
    |    file_name = 0x023210EC,
    |    access_mode = 49,
    |    share_mode = 0,
    |    file_attri = 0)
    |  handle = 0
-011|MMIPICVIEW_IsSend(
    |    is_sms = 0,
    |    file_data_ptr = 0x023210EC)
    |  result = 0
    |  file_info = (file_name = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    |  send_file_info = (filepath_name = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
-012|SendMultiPicByBt()
    |  pic_info = (filename = (69, 58, 92, 86, 105, 100, 101, 111, 92, 77, 79, 8
    |  send_file_info = 0x0268B9D0
    |  list_ctrl_id = 2359329
    |  send_file_num = 5
    |  i = 6
    |  total_num = 7
-013|HandleShareItemsPiclistOptWinMsg(
    |    win_id = 2359302,
    |  ?,
    |  ?)
    |  title_str = (wstr_ptr = 0x0, wstr_len = 0)
    |  result = 1
    |  ctrl_id = 2359331
    |  list_ctrl_id = 2359329
    |  menu_id = 2359331
    |  group_id = 2359299
    |  kstring = (wstr_ptr = 0x0, wstr_len = 0)
-014|MMK_RunWinProc(
    |  ?,
    |    msg_id = 57345,
    |    param = 0x03E9BB08)
-015|MMK_DispatchToHandle(
    |    handle = 2632974468,
    |    msg_id = 57345,
    |    param_ptr = 0x03E9BB08)
    |  openwin_handle_result = 0
    |  old_handle = 16711680
-016|MMK_DispatchWinMSG(
    |    mmi_msg_ptr = 0x03FA14E0)
-017|MMK_DispatchMSGQueue(
    |  ?)
-018|thread_entry_P_APP(
    |  ?,
    |  ?)
    |  receiveSignal = 0x0
    |  mmi_msg = 0x03FA14E0
    |  ticks1 = 0
    |  ticks2 = 0
    |  is_log_on = 0
    |  time_period = 4294967295
    |  watchdog_ptr = 0x03ECF298
-019|ThreadEntry(
    |  ?)
    |  thread_entry = (entry = 0x08650E7B, argc = 0, argv = 0x0)
-020|tx_thread_shell_entry()
 ---|end of frame

从死机处代码逻辑和 callback 中并没有看出问题，但是却在申请内存过程中，遍历 byte pool 出错。那么怀疑内存区域中存在异常，可能存在被覆盖的情况。使用 mocor_block_pool_list.cmm 脚本检查 byte pool 内存空间：

byte_static_heap:       0x0B68984
byte_dynamic_base_heap: 0x0B689CC
byte_dynamic_app_heap:  0x0B68A14
pool start address:     0x24FF4D4
pool end address:       0x3E878FC
pool addr: 0x24FF4D4
     size: 0x888

～～～
pool addr: 0x268B9AC
     size: 0x0A64
pool addr: 0x268C410
pool memory end: 0x56005C
pool count: 0x1E1

0x268B9AC 处的内存节点正常，但是 0x268C410 处的内存节点已经异常，end_flag:0xAA 已经丢失，变成了一片非法数据。

按照函数 tx_byte_pool_search_ex 代码逻辑，最终会导致 0 地址访问，R5=0x80818081，然后执行 ldr r0,[r5,#04] ，导致4字节对齐异常死机 (这个和 assert 文件中的错误提示也是匹配的：Fault address :0x80818085 )。

查看 0x268B9AC 地址空间的内存分配情况，如下：

分析 mmipicview_wintab.c 文件中内存分配和使用的代码，发现是 SendMultiPicByBT 函数中针对 send_file_info 分配的内存使用时存在越界操作，从而引发了上面的问题。

内存覆盖

导致内存覆盖原因很多，空指针的操作会操作 0 地址，相对比较容易检查，因为 0 地址一般是 DSP code 区域，可以通过BUSMonitor 辅助监控 code 区域，定位到覆盖代码段的源头。

野指针的操作比较复杂，可以利用 pool_list.cmm 脚本检查内存分配的完整性，也可以通过 dump memory 内容进行比对，寻找覆盖来源。

死机现场：

1
2
3

File:  tx_byta.c
Line:  240
ASSERT(current_ptr != (*((CHAR_PTR *) current_ptr)))

推导出 callback

-000|TXAS_SaveMainReg(
    |  ?,
    |    cur_lr = 0x0108D1CE,
    |    cur_pc = 0x80)
-001|TXAS_SystemAssert(
    |    exp = 0x00215CAC,
    |    file = 0x00215C5C,
    |    line = 240,
    |    assert_info_ptr = 0x0)
    |  assert_mode = 1
    |  cur_sp = 38244696
    |  cur_lr = 3672209
    |  cur_pc = 8536932
    |  i = 12387240
-002|SCI_Assert(
    |  ?,
    |  ?,
    |  ?)
-003|tx_byte_pool_search_ex(
    |    pool_ptr = 0x00C35B3C,
    |    memory_size = 307776)
    |  interrupt_save = 536870912
    |  examine_blocks = 85
    |  alloc_ptr = 0x0
    |  min_size = 4294967295
    |  section_limit_space_addr = 0x0
    |  section_limit_space_size = 4294967295
    |  section_high_space_addr = 0x0
    |  section_high_space_size = 4294967295
    |  search_cnt = 150
-004|tx_byte_allocate(
    |    pool_ptr = 0x00C35B3C,
    |    memory_ptr = 0x02479238,
    |  ?,
    |    wait_option = 2863311530)
    |  thread_ptr = 0x01E10380
-005|txe_byte_allocate(
    |  ?,
    |  ?,
    |  ?,
    |  ?)
-006|SCI_MallocApp(
    |  ?,
    |    file = 0x0904406C,
    |    line = 5942)
    |  memory_ptr = 0x0
    |  num_free_buffs = 0
    |  cur_pool_inx = -1
    |  alloc_size = 307773
    |  byte_mem_header_ptr = 0x0
-007|BL_MallocEx(
    |  ?,
    |    file = 0x0904406C,
    |    line = 5942)
    |  index = 2
-008|MMI_BL_Malloc(
    |    id = 270,
    |    file = 0x0904406C,
    |    line = 5942)
-009|AllocTrans3DBuf(
    |    old_buf_pptr = 0x024792BC,
    |    new_buf_pptr = 0x024792B8)
    |  result = 0
-010|MMIDEFAULT_SaveOldMoveBuf(
    |    buf_type = SCR_EFFECT_BUF_TYPE_SLIDE_RIPPLE = SCR_EFFECT_BUF_TYPE_WIN_SWITCH)
    |  buf_ptr = 0x0265D100
    |  buf_width = 240
    |  old_buf_ptr = 0x0
    |  new_buf_ptr = 0x0
-011|HandleMenuOkKey(
    |    menu_ctrl_ptr = 0x03FE0420,
    |  ?)
    |  is_handled = 0
    |  is_grayed = 0
    |  is_exist_child = 1
    |  cur_item_top = 0
    |  base_ctrl_ptr = 0x03FE0420
    |  cur_item = (menu_id = 0, tip_id = 0, button_id = (0, 0, 0), text_str_id = 0, select_icon
    |  cur_node_ptr = 0x027DC338
    |  lcd_dev_info = 0x092B5920
    |  lcd_rect = (left = 0, top = 0, right = 239, bottom = 319)
-012|MenuHandleMsg(
    |  ?,
    |    msg_id = 64027,
    |    param = 0x02479404)
    |  result = 1
    |  menu_ctrl_ptr = 0x03FE0420
-013|VTLCTRL_HandleMsg(
    |    iguictrl_ptr = 0x03FE0420,
    |    msg_id = 64027,
    |    param_ptr = 0x02479404)
-014|MMK_RunCtrlProc(
    |  ?,
    |    msg_id = 64027,
    |    param = 0x02479404)
    |  me_ptr = 0x03FE0420
-015|MMK_DefaultProcessWinMsg(
    |  ?,
    |    msg_id = 64027,
    |    param = 0x02479404)
    |  result = 0
    |  ctrl_handle = 1288896624
-016|MMK_DispatchToHandle(
    |    handle = 1288765504,
    |    msg_id = 64027,
    |    param_ptr = 0x02479404)
    |  bResult = 0
    |  old_handle = 16711680
-017|MMK_DispMsgToFocusWin(
    |    msg_id = 64027,
    |    param_ptr = 0x02479404)
-018|MMK_DispMsgToWin(
    |    msg_id = 64027,
    |  ?)
    |  result = 0
-019|HandleMSGKbd(
    |    keys_status = 64000,
    |    key_code = 27)
    |  multi_key_tp_param = (is_slide = 0, pre_tp_point = (x = 0, y = 0), cur_tp_point = (x = 0
-020|MMK_DispatchMSGKbd(
    |  ?)
    |  keypress_ptr = 0x03EB1488
    |  key_code = 27
    |  is_long_press = 0
-021|MMK_DispatchExtSig(
    |  ?)
-022|thread_entry_P_APP(
    |  ?,
    |  ?)
    |  receiveSignal = 0x03EB1488
    |  mmi_msg = 0x03FA0600
    |  ticks1 = 0
    |  ticks2 = 0
    |  is_log_on = 0
    |  time_period = 4294967295
    |  watchdog_ptr = 0x03EBE978
-023|ThreadEntry(
    |  ?)
    |  thread_entry = (entry = 0x0800430F, argc = 0, argv = 0x0)
-024|tx_thread_shell_entry()
 ---|end of frame

从 callback 能够看出，是在内存分配过程中出现问题，这里编译内存节点时出现异常，使用 mocor_byte_pool_list.cmm 进行检查：

根据内存分配器的规则查找，从 0x027DD46C 寻找下一个内存节点时出现问题，这里的内存被 0x21242124覆盖。

根据对应 assert 文件中的信息，ctrlmenu.c 分配的内存从0x027DC2BC 到 0x027DD46C，但是 0x027DD46C 开始的位置被覆盖了。

根据log记录，这段内存在死机前先是分配给了jpeg decode使用，通过代码逻辑我们可以看到 APP 发起的 Destroy 立刻返回并释放了内存，并没有等待 Set Event 的动作(解码 IMG_DEC 结束)。

2265207-4     37131 [MMIPIC]:HandlePicListWinMsg msg_id =f023
2265207-5     37131 [IMG_DEC_Destroy +] handle = 0x85a0fd17
2265207-6     37131 _IMG_DEC_Get_Caller_Priority: T_P_APP, queue_name: Q_P_APP, priority: 76 
2265207-7     37131 _IMG_DEC_Get_Caller_Priority: T_IMG_DEC, queue_name: T_IMG_DEC_QUEUE, priority: 76 
2265207-8     37131 _IMG_DEC_SendMsg: sig_code = 2
2265207-9     37131 [IMG_DEC_Destroy -] handle = 0x85a0fd17
2265207-10    37131 GUIANIM_DestroyHandle:destroy handle=0x85a0fd17 result is 0!

  (byte_heap_hdr_struct)0x27dc2c4 = (
    pre = 0x03D47820,
    succ = 0x00C6E1FC,
    file_name = 0x08DB3DB4 -> "ctrlmenu.c",
    line = 861,
    size = 4493,
    block_num = 572061)

 > 
	 0x27dc2bc    0x27dd46c      0x11b0      ALLOC        ctrlmenu.c(861)
 > 
	 0x27dd46c    0x21242124     0x1ea64cb8  ALLOC        (556015908)
 >

GUIANIM_DestroyHandle 没有等待 decode 执行JPEGDEC_DestoryHandle 停止底层解码的动作，提前释放内存，导致当前 decode 的动作继续使用了之前的内存，如下，通过 T_IMG_DEC task 的信息也可以看出来，target_ptr_=_0x027DC300 依然存在。

-000|tx_thread_suspend(
    |  ?)
    |
-001|tx_semaphore_get(
    |  ?,
    |  ?)
    |  thread_ptr = 0x01E16F50
    |
    |
-002|txe_semaphore_get(
    |  ?,
    |  ?)
    |
    |
-003|SCI_GetSemaphore(
    |  ?,
    |  ?)
    |  _sem_ptr = 0x03EBC074
    |
    |
-004|FreeLock()
    |  sem_count = 0
    |  susp_count = 0
    |  semap_ptr = 0x03EBC074
    |
    |
-005|JPEGDEC_DestoryHandle(
    |  ?,
    |    exit_type = IMG_DEC_EXIT_HALT)
    |  dec_info_ptr = 0x03F79100 -> (
    |    src_info = (src_ptr = 0x0, src_file_size = 14576, src_file_handle = 150
    |    frame_in_param = (
    |      handle = 66556160,
    |      target_ptr = 0x027DC300,
    |      target_buf_size = 11552,
    |      target_width = 76,
    |      target_height = 76,
    |      img_rect = (left = 0, top = 40, right = 239, bottom = 279),
    |      target_rect = (left = 0, top = 0, right = 75, bottom = 75),
    |      data_format = IMG_DEC_RGB565,
    |      frame_index = 0,
    |      is_dec_thumbnail = 0,
    |      is_exist_background = 0,
    |      padding1 = 0,
    |      padding2 = 0,
    |      alpha_buf_ptr = 0x0,
    |      alpha_buf_size = 0,
    |      write_data = 0x0,
    |      callback = 0x003AE0CB,
    |      app_param_ptr = 38240904,
    |      app_param_size = 12,
    |      quality = JINF_QUALITY_HIGH,
    |      target_buf_width = 0,
    |      target_buf_height = 0,
    |      img_dec_mode = IMG_DEC_TARGET_SIZE_RESIZABLE),
    |    frame_out_param = (is_decode_finished = 0, is_process_alpha = 0, paddin
    |    frame_extra_info = (priority = 77),
    |    dec_buf_ptr = 0x03D4783C,
    |    read_buf_ptr = 0x0,
    |    ret_val = IMG_DEC_RET_SUCCESS,
    |    jpeg_type = JINF_JPEG_TYPE_BASELINE)
    |
    |
-006|IMG_DEC_Destroy_Hal(
    |  ?,
    |    handle = 66556160,
    |    exit_type = IMG_DEC_EXIT_HALT)
    |
    |
-007|IMG_DEC_Remove_Command(
    |  ?)
    |  ret = IMG_DEC_RET_SUCCESS
    |  dec_handle_ptr = 0x03FE0EA0
    |  tmp_cmd_ptr = 0x03EBBFCC
    |  cur_cmd_ptr = 0x03ED1368
    |  next_cmd_ptr = 0x0
    |  pre_cmd_ptr = 0x03EBBFCC
    |
    |
-008|IMG_DEC_Task_Routine(
    |  ?,
    |  ?)
    |  command = 2
    |  param0 = 2241920279
    |  param2 = 0
    |  sig_ptr = 0x03EC8D34
    |  handle = 2241920279
    |
    |
-009|ThreadEntry(
    |  ?)
    |  thread_entry = (entry = 0x003AE7A9, argc = 0, argv = 0x0)
    |
    |
-010|tx_thread_shell_entry()
    |
    |
 ---|end of frame

通过研究代码发现此问题的原因如下：

buffer target_ptr 的申请与释放是由应用层所做的，当应用层调用了函数 IMG_DEC_Remove_Handle 后，只是停止发送解码消息的进程，实际解码工作仍在进行，当解码完成后，会调用函数 JPEG_OutputData 将解码后的数据写入 target_ptr 所指内存，但上层应用此时已经释放了此内存，导致非法内存访问。需要修改代码流程，在调用 JPEG_OutputData 函数前，判断该图片是否被强制结束解码，如果是则不调用，否则则调用，从而将解码后的数据传递给上层应用。

另外发现 JVM callback 在图片解码 task 中调用导致 envent 状态错乱，导致 APP 发起的 Destroy 立刻返回并释放了内存。

死锁问题

死锁问题一般是因为互斥量的使用不当引发的问题，可能会导致界面不响应或看门狗复位等问题。

以看门狗复位为例，ass 文件有以下信息：

File:  watchdog.c
Line:  353
PASSERT(SCI_FALSE)
 > Task APP timeout

对应 callback 如下：

-000|TXAS_SaveMainReg(
    |  ?,
    |    cur_lr = 0x0108D180,
    |    cur_pc = 0x80)
-001|TXAS_SystemAssert(
    |    exp = 0x003ACD84 -> ,
    |    file = 0x003ACD78 -> ,
    |    line = 353,
    |    assert_info_ptr = 0x00C36215 -> "Task APP timeout")
    |  assert_mode = 1
    |  cur_sp = 31244224
    |  cur_lr = 3791281
    |  cur_pc = 8534840
    |  i = 12387240
-002|SCI_PAssert(
    |  ?,
    |  ?,
    |  ?,
    |  ?)
-003|CheckAllTask()
    |  list = 0x03EBEB2C
    |  curr_tick = 3562772
-004|DoIdle_DoCallback(
    |    param = 5000)
    |  i = 2
-005|DoIdle_Callback(
    |  ?)
    |  assert_mode = 44
    |  dischg = (warning_vol = 64244, shutdown_vol = 1011, deadline_vol = 65369,
-006|osa_timer_routine_wraper(
    |    usr_timer_id = 0x03F3FAF4)
-007|tx_timer_thread_entry(
    |  ?)
    |  timeout_function_backup = 0x003BFF59
    |  expired_timers = 0x03F3FAFC
    |  timeout_function = 0x003BFF59
-008|tx_thread_shell_entry()
 ---|end of frame

提示死机原因是 Task APP timeout ，那么分析代码是在 mmimain.c 中，函数 void APP_Task(uint32 argc, void *argv) 注册的看门狗没有及时喂狗。

watchdog_ptr = SWDG_RegTask("APP", 180000)

分析 APP_Task callback 如下，这里在获取 img_decoder_event 时被挂起。

-000|tx_thread_suspend(?)
    |
-001|tx_event_flags_get(?, ?, ?, ?, ?)
    |
    |
-002|txe_event_flags_get(?, ?, get_option = 1, ?, ?)
    |
    |
-003|SCI_GetEvent(?, ?, get_option = 1, ?)
    |
    |
-004|IMG_DEC_GetEvent(requested_flags = 4, ?)
    |
    |
-005|IMG_DEC_Remove_Handle(?)
    |
    |
-006|IMG_DEC_Destroy(handle = 2241919696)
    |
    |
-007|GUIANIM_DestroyHandle(?, img_handle = 2241919696, ?)
    |
    |
-008|HandleAnimLoseFocus(anim_ctrl_ptr = 0x03FDAFD8)
    |
    |
-009|AnimCtrlHandleMsg(?, ?, ?)
    |
    |
-010|VTLCTRL_HandleMsg(iguictrl_ptr = 0x03FDAFD8, msg_id = 61492, param_ptr = 0x
    |
    |
-011|MMK_RunCtrlProc(?, msg_id = 61492, param = 0x0)
    |
    |
-012|ControlTreeNodeHandleEvent(?, ?, ?)
    |
    |
-013|MMK_DispatchToAllTreeNode(?, func = 0x09076FEF, msg_id = 61492, param = 0x0
    |
    |
-014|MMK_DispatchToAllControl(?, msg_id = 61492, param = 0x0, state = 2)
    |
    |
-015|MMK_ProcSpecialWinMsg(win_handle = 89194548, ?, param = 0x0)
    |
    |
-016|MMK_DispatchToHandle(handle = 89194548, msg_id = 61475, param_ptr = 0x0)
    |
    |
-017|MMK_SendMsg(?, msg_id = 61475, param_ptr = 0x0)
    |
    |
-018|MMK_OpenWin(win_handle = 90570798, ?)
    |
    |
-019|AppletCreateWindow(?, is_win_table = 1)
    |
    |
-020|MMK_CreateWinTable(create_ptr = 0x02477A24)
    |
    |
-021|MMK_CreateWin(win_table_ptr = 0x082A4600, add_data_ptr = 0x0)
    |
    |
-022|HandlePicListWinMsg(?, ?, ?)
    |
    |
-023|MMK_RunWinProc(?, msg_id = 57345, param = 0x03E9BC88)
    |
    |
-024|MMK_DispatchToHandle(handle = 89194548, msg_id = 57345, param_ptr = 0x03E9B
    |
    |
-025|MMK_DispatchWinMSG(mmi_msg_ptr = 0x03FA06C0)
    |
    |
-026|MMK_DispatchMSGQueue(?)
    |
    |
-027|thread_entry_P_APP(?, ?)
    |
    |
-028|ThreadEntry(?)
    |
    |
-029|tx_thread_shell_entry()
    |
    |
 ---|end of frame

而 img_decoder_event 应该在 T_IMG_DEC Task 中被释放，但是这个 Task 被信号量 JPEG_FREE_RES_SEMAP 挂起，参考下面的 callback：

-000|tx_thread_suspend(
    |  ?)
    |
-001|tx_semaphore_get(
    |  ?,
    |  ?)
    |  thread_ptr = 0x01E15D10 -> (
    |    tx_thread_id = 1414025796,
    |    tx_run_count = 265,
    |    tx_stack_ptr = 0x026A7F38,
    |    tx_stack_start = 0x026A70F4,
    |    tx_stack_end = 0x026A80EF,
    |    tx_stack_size = 4092,
    |    tx_time_slice = 0,
    |    tx_new_time_slice = 0,
    |    tx_ready_next = 0x01E15D10,
    |    tx_ready_previous = 0x01E15D10,
    |    tx_thread_name = 0x003AD978 -> "T_IMG_DEC",
    |    tx_priority = 76,
    |    tx_state = 6,
    |    tx_delayed_suspend = 0,
    |    tx_suspending = 0,
    |    tx_preempt_threshold = 76,
    |    tx_priority_bit = (0, 0, 4096, 0, 0, 0, 0, 0),
    |    tx_thread_entry = 0x003B234F,
    |    tx_entry_parameter = 65588616,
    |    tx_thread_timer = (tx_remaining_ticks = 4294967295, tx_re_initialize_ti
    |    tx_suspend_cleanup = 0x002162D1,
    |    tx_suspend_control_block = 0x03EBC074,
    |    tx_suspended_next = 0x01E15D10,
    |    tx_suspended_previous = 0x01E15D10,
    |    tx_suspend_info = 1,
    |    tx_additional_suspend_info = 0x026A808C,
    |    tx_suspend_option = 1,
    |    tx_suspend_status = 0,
    |    tx_created_next = 0x01E15E3C,
    |    tx_created_previous = 0x01E146CC,
    |    tx_filex_ptr = 0x0,
    |    time = 0,
    |    tx_thread_stack_highest_ptr = 0x026A7B58)
    |
    |
-002|txe_semaphore_get(
    |  ?,
    |  ?)
    |
    |
-003|SCI_GetSemaphore(
    |  ?,
    |  ?)
    |  _sem_ptr = 0x03EBC074 -> (
    |    sem_id = (
    |      tx_semaphore_id = 1397050689,
    |      tx_semaphore_name = 0x003DDCB4 -> "JPEG_FREE_RES_SEMAP",
    |      tx_semaphore_count = 0,
    |      tx_semaphore_suspension_list = 0x01E15D10,
    |      tx_semaphore_suspended_count = 1,
    |      tx_semaphore_created_next = 0x03EBC218,
    |      tx_semaphore_created_previous = 0x03EBC020),
    |    sem_stat = 0x03E8CE48)
    |
    |
-004|FreeLock()
    |  sem_count = 0
    |  susp_count = 0
    |  semap_ptr = 0x03EBC074
    |
    |
-005|IMGJPEG_FreeRes()
    |  pContext = 0x01250CDC
    |  ret_value = 255
    |
    |
-006|JPEGDEC_DestoryHandle(
    |  ?,
    |    exit_type = IMG_DEC_EXIT_HALT)
    |  dec_info_ptr = 0x03F75488
    |
    |
-007|IMG_DEC_Destroy_Hal(
    |  ?,
    |    handle = 66540680,
    |    exit_type = IMG_DEC_EXIT_HALT)
    |
    |
-008|IMG_DEC_Remove_Command(
    |  ?)
    |  ret = IMG_DEC_RET_SUCCESS
    |  dec_handle_ptr = 0x03FDD4E0
    |  tmp_cmd_ptr = 0x03EBBFCC
    |  cur_cmd_ptr = 0x03EC0700
    |  next_cmd_ptr = 0x0
    |  pre_cmd_ptr = 0x03EBBFCC
    |
    |
-009|IMG_DEC_Task_Routine(
    |  ?,
    |  ?)
    |  command = 2
    |  param0 = 2241919696
    |  param2 = 0
    |  sig_ptr = 0x03EC0604
    |  handle = 2241919696
    |
    |
-010|ThreadEntry(
    |  ?)
    |  thread_entry = (entry = 0x003AD589, argc = 0, argv = 0x0)
    |
    |
-011|tx_thread_shell_entry()
    |
    |
 ---|end of frame

通过 ass 文件，也能够看出 T_IMG_DEC Task 被 JPEG_FREE_RES_SEMAP 挂起。

1 2	> JPEG_FREE_RES_SEMAP 0 > Suspend Task_Name : T_IMG_DEC

根据这些 callback 继续分析代码，得到如下的结论：
应用窗体在丢失焦点时会发消息让 T_IMG_DEC 执行销毁流程，而 T_IMG_DEC拿到锁执行 JPEGDEC_DestroyHandle 销毁流程，这个执行过程中会释放 JPEG_FREE_RES_SEMAP；此时 T_JPEG_DECODER 从挂起状态解除，但是因为在 JPEGDEC_DestroyHandle 前面被销毁了，导致不能执行。两个 task 都不能执行，所以导致 timeout。

Task Queue Full问题

消息队列满现象为Assert提示：ASSERT: Error 0xb (The queue was full !)，直接原因为接收消息的Task得不到执行，导致消息队列满，而在发送消息的任务检测到无法发送消息，直接报告消息队列满错误。

可能原因如下：
1.Task优先级太低，一直无法得到执行。
2.Task因为某些原因（比如死锁或信号量等）无法处理消息，可以分析代码逻辑。
3.中断处理太多，导致Task得不到执行，可以通过通过TaskAnalyzer工具分析中断原因。
4.关中断时间太长，导致Task得不到执行，尽量减少关中断的时间。
5.消息队列长度设置不正确，可以增加Queue Size。

此问题分析关键：首先找到无法处理消息的Task，而后逐条分析，包括当前Task队列的消息检查，也可以在Assert窗口输入命令“6”，输出Task的各项信息，寻找可用Queue数目为0的Task，这个Task就是问题点。

-000|TXAS_SaveMainReg(
    |  ?,
    |    cur_lr = 0x0108D1CE,
    |    cur_pc = 0x80 -> 0)
-001|TXAS_SystemAssert(
    |    exp = 0x023482FC -> ,
    |    file = 0x00377A80 -> ,
    |    line = 810,
    |    assert_info_ptr = 0x0 -> NULL)
    |  assert_mode = 1
    |  cur_sp = 36994960
    |  cur_lr = 3666785
    |  cur_pc = 7830016
    |  i = 11666952
    |  tem_str = "RTOS/source/src_osa/c/threadx_os.c"
-002|SCI_Assert(
    |  ?,
    |  ?,
    |  ?)
-003|SCI_SendSignal(
    |    signal_ptr = 0x03EB67C8,
    |  ?)
    |  status = 11
-004|MMISRV_CAMERAROLL_Download_Thumbnail()
    |  sig_ptr = 0x03EB67C8 -> (
    |    sig = (SignalCode = 1, SignalSize = 20, Pre = 0x0, Suc = 0x0, Sender = 21),
    |    data_ptr = 0x0)
-005|HandlePicListWinMsg(
    |  ?,
    |  ?,
    |  ?)
    |  result = 1
    |  title_str = (wstr_ptr = 0x0, wstr_len = 0)
    |  query_win_id = 2359321
    |  mark_wstr = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    |  mark_num_str = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    |  mark_num_wstr = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
    |  data_info = (is_bitmap = 76, is_free_bitmap = 12, is_save_data = 185, data_ptr = 0xF
-006|MMK_RunWinProc(
    |  ?,
    |    msg_id = 53255,
    |    param = 0x026B9D14)
-007|MMK_DispatchToHandle(
    |    handle = 62717996,
    |    msg_id = 53255,
    |    param_ptr = 0x026B9D14)
    |  openwin_handle_result = 0
    |  old_handle = 16711680
-008|MMK_SendMsg(
    |  ?,
    |    msg_id = 53255,
    |    param_ptr = 0x026B9D14)
    |  result = 0
-009|MMIAPIPICVIEW_HandleCameraRollSig(
    |    msg_id = 53255,
    |    param = 0x026B9D14)
-010|HandlePsAndRefMsg(
    |  ?,
    |    msg_id = 53255,
    |    param = 0x026B9D14)
    |  result = 1
-011|DispatchSysSig(
    |    signal_ptr = 0x026B9D14)
    |  i = 131
    |  regapp_num = 245
-012|MMK_DispatchExtSig(
    |  ?)
-013|thread_entry_P_APP(
    |  ?,
    |  ?)
    |  receiveSignal = 0x026B9D14
    |  mmi_msg = 0x03FA1410
    |  ticks1 = 0
    |  ticks2 = 0
    |  is_log_on = 0
    |  time_period = 4294967295
    |  watchdog_ptr = 0x03EBE534
-014|ThreadEntry(
    |  ?)
    |  thread_entry = (entry = 0x08EFC94B, argc = 0, argv = 0x0)
-015|tx_thread_shell_entry()
 ---|end of frame

结合代码能够看到，函数 MMISRV_CAMERAROLL_Download_Thumbnail() 发送消息给 T_P_APP_CAMERAROLL_TASK 时，发现该Task消息队列满。

查看ASS文件，发现 T_P_APP_CAMERAROLL_TASK 消息队列确实已满：

1
2
3

Task_ID Name Tcb_Addr Current_PC  Queue_All Queue_Avail 
0xaf T_P_APP_CAMERAR 0x01ceaf800 x002127d0 20 0     
TX_READY 206

查看T_P_APP_CAMERAROLL_TASK callback，如下：

-000|tx_thread_suspend(
    |  ?)
    |
-001|tx_queue_receive(
    |  ?,
    |  ?,
    |  ?)
    |  thread_ptr = 0x01CEAF88
    |
    |
-002|txe_queue_receive(
    |  ?,
    |  ?,
    |  ?)
    |
    |
-003|SCI_GetSignal(
    |  ?)
    |  item = 66974816
    |  thread_block = 0x01CEAF80
    |
    |
-004|CAMERAROLL_Task(
    |  ?,
    |  ?)
    |
    |
-005|ThreadEntry(
    |  ?)
    |  thread_entry = (entry = 0x08074DD5, argc = 0, argv = 0x0)
    |
    |
-006|tx_thread_shell_entry()
    |
    |
 ---|end of frame

通过TaskAnalyzer内存打点信息可以看出，T_P_APP以及网络相关高优先级的Task被频繁调度，导致低优先级的 T_P_APP_CAMERAROLL_TASK 得不到调度，通过分析T_P_APP_CAMERAROLL_TASK 消息类型，发现是T_P_APP一直在重复频繁发送如下三个消息：

1
2
3

0xAD14:HTTP_SIG_GET_CNF
0XAD18:HTTP_SIG_HEADER_IND
0xAD1A:HTTP_SIG_DATA_IND

分析代码问题原因：CAMERAROLL_TASK 并发使用了12个Http，但是对应的Queue只有20个，并不能支持场景使用。最优解是增加Queue个数，同时降低Http并发个数。

栈溢出问题

栈溢出的可能原因如下：
1.栈空间分配太小，不足以满足大量局部变量的使用场景，应尽量使用堆区动态内存。
2.函数调用层次过多，或者陷入递归死循环当中。
3.栈空间内存异常，可能发生内存覆盖导致栈空间数据异常。

-000|TXAS_SaveMainReg(
    |  ?,
    |    cur_lr = 0x0108D1CE,
    |    cur_pc = 0x80)
-001|TXAS_SystemAssert(
    |    exp = 0x0021873C,
    |    file = 0x00218710,
    |    line = 763,
    |    assert_info_ptr = 0x00C35F3D -> "Stack Overflow,thread:0x1e00dd8,sp overflow addr:0x2459928,thread ID:0x15,Tx Name:T_P_APP")
    |  assert_mode = 1
    |  cur_sp = 66875528
    |  cur_lr = 3778945
    |  cur_pc = 8533360
    |  i = 12387240
-002|SCI_PAssert(
    |  ?,
    |  ?,
    |  ?,
    |  ?)
-003|prod_thread_stack_overflow_handle(
    |  ?)
-004|tx_thread_stack_error_handler(
    |  ?)
    |  interrupt_save = 128
-005|tx_thread_suspend(
    |  ?)
    |  interrupt_save = 536870912
-006|tx_event_flags_get(
    |  ?,
    |  ?,
    |  ?,
    |  ?,
    |  ?)
    |  thread_ptr = 0x01E07050
-007|txe_event_flags_get(
    |  ?,
    |  ?,
    |    get_option = 1,
    |  ?,
    |  ?)
-008|SCI_GetEvent(
    |  ?,
    |  ?,
    |    get_option = 1,
    |  ?)
-009|mta_ex_trace_task(
    |  ?,
    |  ?)
    |  actual_flag = 2
    |  request_flag = 2
-010|ThreadEntry(
    |  ?)
    |  thread_entry = (entry = 0x003C76EB, argc = 0, argv = 0x0)
-011|tx_thread_shell_entry()
 ---|end of frame

根据上面callback的提示信息，T_P_APP Task存在栈溢出，tx_stack_ptr = 0x2459928 超出了进程的栈帧起始地址 tx_stack_start_=_0x02462728 。

  (TX_THREAD*)0x1E00DE0 = 0x01E00DE0 -> (
    tx_thread_id = 1414025796,
    tx_run_count = 3138736,
    tx_stack_ptr = 0x02459928,
    tx_stack_start_=_0x02462728,
    tx_stack_end = 0x02469F23,
    tx_stack_size = 30716,
    tx_time_slice = 0,
    tx_new_time_slice = 0,
    tx_ready_next = 0x01E00DE0,
    tx_ready_previous = 0x01E00DE0,
    tx_thread_name = 0x00202A60 -> "T_P_APP",
    tx_priority = 76,
    tx_state = 0,
    tx_delayed_suspend = 0,
    tx_suspending = 0,
    tx_preempt_threshold = 76,
    tx_priority_bit = (0, 0, 4096, 0, 0, 0, 0, 0),
    tx_thread_entry = 0x003B1FF7,
    tx_entry_parameter = 65597768,
    tx_thread_timer = (tx_remaining_ticks = 0, tx_re_initialize_ticks = 0, tx_ti
    tx_suspend_cleanup = 0x0,
    tx_suspend_control_block = 0x01E00ED4,
    tx_suspended_next = 0x01E00DE0,
    tx_suspended_previous = 0x01E00DE0,
    tx_suspend_info = 4,
    tx_additional_suspend_info = 0x02469570,
    tx_suspend_option = 1,
    tx_suspend_status = 0,
    tx_created_next = 0x01E01290,
    tx_created_previous = 0x01DFFFD0,
    tx_filex_ptr = 0x0,
    time = 0,
tx_thread_stack_highest_ptr = 0x02459928)

手动推导T_P_APP Task callback发现，存在000～003死循环导致栈溢出。

-000|HandlePicListWinMsg(
    |    win_id = 2359300,
    |    msg_id = 64004,
    |    param = 0xFA04)
    |  title_str = (wstr_ptr = 0x0, wstr_len = 0)
    |  query_win_id = 2359321
    |  mark_wstr = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    |  mark_num_str = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    |  mark_num_wstr = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    |
-001|MMK_RunWinProc(
    |  ?,
    |    msg_id = 64004,
    |    param = 0x0)
    |
    |                                                                           
-002|MMK_DispatchToHandle(
    |    handle = 3217293358,
    |    msg_id = 64004,
    |    param_ptr = 0x0)
    |  openwin_handle_result = 0
    |  old_handle = 3217293358
    |
    |                                                                           
-003|MMK_SendMsg(
    |  ?,
    |    msg_id = 64004,
    |    param_ptr = 0x0)
    |  result = 0
    |
    |                                                                           
-004|HandlePicListWinMsg(
    |    win_id = 2359300,
    |  ?,
    |  ?)
    |  result = 1
    |  title_str = (wstr_ptr = 0x0, wstr_len = 0)
    |  query_win_id = 2359321
    |  mark_wstr = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    |  mark_num_str = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    |  mark_num_wstr = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    |  data_info = (is_bitmap = 0, is_free_bitmap = 0, is_save_data = 0, data_pt
    |
    |                                                                           
-005|MMK_RunWinProc(
    |  ?,
    |    msg_id = 64004,
    |    param = 0x02469E64)
    |
    |                                                                           
-006|MMK_DispatchToHandle(
    |    handle = 3217293358,
    |    msg_id = 64004,
    |    param_ptr = 0x02469E64)
    |  openwin_handle_result = 0
    |  old_handle = 16711680
    |
    |                                                                           
-007|MMK_DispMsgToFocusWin(
    |    msg_id = 64004,
    |    param_ptr = 0x02469E64)
    |
    |                                                                           
-008|MMK_DispMsgToWin(
    |    msg_id = 64004,
    |  ?)
    |  result = 0
    |
    |                                                                           
-009|HandleMSGKbd(
    |    keys_status = 64000,
    |    key_code = 4)
    |  multi_key_tp_param = (is_slide = 0, pre_tp_point = (x = 0, y = 0), cur_tp
    |
    |                                                                           
-010|MMK_DispatchMSGKbd(
    |  ?)
    |  keypress_ptr = 0x03EAE9C8
    |  key_code = 4
    |  is_long_press = 0
    |
    |                                                                           
-011|MMK_DispatchExtSig(
    |  ?)
    |
    |                                                                           
-012|thread_entry_P_APP(
    |  ?,
    |  ?)
    |  receiveSignal = 0x03EAE9C8
    |  mmi_msg = 0x03FA0650
    |  ticks1 = 0
    |  ticks2 = 0
    |  is_log_on = 0
    |  time_period = 4294967295
    |  watchdog_ptr = 0x03EBE978
    |
    |                                                                           
-013|ThreadEntry(
    |  ?)
    |  thread_entry = (entry = 0x08004393, argc = 0, argv = 0x0)
    |
    |                                                                           
-014|tx_thread_shell_entry()
    |
    |                                                                           
 ---|end of frame
    |

总结

通过MTBF和Monkey测试会暴露出各种问题，需要借助于丰富的调试方法进行分析定位。当然除了常规的调试手法，我们也借助AMBA Bus Monitor监控指定的代码段，包括boot、kernel、dsp等区域，以定位内存区域复写的异常情况。

1.MTBF测试明确了平均故障间隔时间，可以反映出产品的时间质量。
2.Monkey压力测试，保证产品的软硬件稳定性。
3.EUT Release版本测试，此时大部分问题已经收敛，该阶段问题不易复现，可能安排各种专项测试。