0%

sc770x-稳定性问题分析

前言

本文记录总结了笔者曾经参与的sc770x项目,基于 Trace32 Simulator 定位系统稳定性问题,这是 Lauterbach 公司推出的一款嵌入式系统调试工具,支持多种 CPU 和 RTOS 调试,拥有很强的扩展性,支持CMM脚本扩展。

工具的使用

一般死机会保存对应的 ass 和 mem文件,打开 Trace32 Simulator ,导入 sc770x_simulator.cmm 脚本,加载当前系统版本对应的 axf 符号表文件。此时会提示输入 mem 地址,根据死机时保存的 ass 文件确定为0,然后再选择对应的 mem 文件,一般直接能够看到从 thread_entry 到死机现场的函数回调。如果不幸没有看到完整的回调,那么就需要自己进行推导,类似下面的情况。

查看死机状态时 _tx_thread_current_ptr 的值,根据 tx_thread_name 得知当前是 T_MIDI Task 。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
_tx_thread_current_ptr = 0x01D8FADC -> (
tx_thread_id = 1414025796,
tx_run_count = 20,
tx_stack_ptr = 0x027D6E88,
tx_stack_start = 0x027D68B8,
tx_stack_end = 0x027D7CB3,
tx_stack_size = 5116,
tx_time_slice = 0,
tx_new_time_slice = 0,
tx_ready_next = 0x01D8FADC,
tx_ready_previous = 0x01D8FADC,
tx_thread_name = 0x080467A8 -> "T_MIDI",
tx_priority = 74,
tx_state = 0,
tx_delayed_suspend = 0,
tx_suspending = 0,
tx_preempt_threshold = 74,
tx_priority_bit = (0, 0, 1024, 0, 0, 0, 0, 0),
tx_thread_entry = 0x0035B4F7,
tx_entry_parameter = 66011532,
tx_thread_timer = (tx_remaining_ticks = 0, tx_re_initialize_ticks = 0, tx_timeout_function =
tx_suspend_cleanup = 0x0,
tx_suspend_control_block = 0x0,
tx_suspended_next = 0x0,
tx_suspended_previous = 0x0,
tx_suspend_info = 0,
tx_additional_suspend_info = 0x0,
tx_suspend_option = 0,
tx_suspend_status = 0,
tx_created_next = 0x00C1BBA4,
tx_created_previous = 0x01D8F884,
tx_filex_ptr = 0x0,
time = 0,
tx_thread_stack_highest_ptr = 0x027D6CF8)

当前 Task 的栈的起始地址是 0x027D68B8,结束地址是 0x027D7CB3 ,查看对应地址的 data.dump 内存窗口。

1
2
3
tx_stack_start = 0x027D68B8,
tx_stack_end = 0x027D7CB3,
tx_stack_size = 5116,

从 ass 文件中查看死机前 SVC 模式下寄存器的值,以 R13_SVC 为起始地址,按照函数调用的堆栈原则推导。

1
2
3
> SVC mode:
R13 = 0x027d6f70 R14 = 0xfffffffc
SPSR = 0x20000033

找到以0x08开头的函数地址,将其作为 R14,函数压栈结束地址作为 R13,正向推栈的方法是 tx_stack_end -> R13_SVC,反向推栈的方法是 R13_SVC -> tx_stack_end。当我们按照这个方法,就能获得完整的 callback 。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
-001|MIDI_Core_Calculation_dls_stereo(
| ?,
| sample_pool_ptr = 0x027D95D6,
| sample_pool_r_ptr = 0x027D9716,
| spn = 0)
| pContext = 0x027D9478
| VoiceChnPtr = 0x027DA52C
| pcm_count = 74
| temp_pcm_l = 0x027DBD94
| temp_pcm_r = 0x027DC014
| update = 1
|
|
-002|MIDI_Core_Generate_Samples(
| ?,
| ?)
| pContext = 0x027D9478
| samples_num = 640
| cur_samples = 160
|
|
-003|midi_decode(inline)
-003|midi_play(
| ?,
| midi_dataPtr = 0x027D7634)
| pContext = 0x027D9478
| midi_event = ((start_time = 0, event_type = 255, isReady = 0, sub_event_type = 47, pa
| midi_event_to_play = 0x027D7060
| bytes_processed = 1
| result = 0
| cur_midi_event = (start_time = 35, event_type = 144, isReady = 0, sub_event_type = 0,
| min_time = 18
| track_end_num = 1
|
|
-004|MIDI_PlayMidi(
| ?,
| mididata = 0x00CD5A58,
| file_length = 5656)
| pContext = 0x027D9478
| midi_track_data = (track = ((file_start = 22, length = 37, buffer_addr = 0x00CD5A6E,
| PPQN = 120
|
|
-005|MidiPlayFile(inline)
-005|MIDI_Thread_Entry(
| ?,
| ?)
| audio_obj = 0x0278E944
| sig_in_ptr = 0x0
| sig_out_ptr = 0x0
| ptMidiProcRes = 0x03F8FD80
| time_offset = 0
| samples_offset = 0
| result = 0
| midi_res_ptr = 0x03F8FD80
|
|
-006|ThreadEntry(
| ?)
| thread_entry = (entry = 0x08046340, argc = 41478468, argv = 0x0)
|
|
-007|tx_thread_shell_entry()
|
|
---|end of frame

我们可以根据 callback 获得死机步骤场景和参数(但参数并不一定准确),推导时如果无法推进就进行参数代入试验和猜测再反向验证的方法,以便使到 callback 向更深推进,以达到死机现场,便于进一步的分析问题。

问题分析

一般根据 ass 文件提示的 assert 信息,能够大概了解死机的原因。

  • 最基本的 assert 信息,根据打印信息可以知道问题代码,直接分析原因。
1
File:nfc_drv_v1.c Line:  501 PASSERT(0) > NFC timeout happened!f=4,e=0

内存泄漏

一般是使用 SCI_ALLOC 申请内存后,却没有调用 SCI_FREE 释放对应内存,导致内存泄漏。

Threadx 在内存管理中定义了 block pool 和 byte pool 两种概念,byte pool 设置固定大小的内存池,支持可变的内存申请释放,但是会存在碎片化问题,而block pool 设定不同大小的 block,方便快速申请释放,不会存在碎片化问题,性能上比 byte pool 会更有效率。

笔者曾根据 byte pool 和 block pool 分配的规则,编写了 mocor_byte_pool_list.cmm 和 mocor_block_pool_list.cmm 脚本,用来检查 heap 和 pool 两种内存空间的分配是否存在异常,有助于分析内存覆盖等问题。

  • 通过 ArmLogel trace 定位
  1. 手机连接 ArmLogel 工具,通过 [SysInfo]->[Memory Status] 和[SysInfo]->[Memory Allocated Status] 将内存池信息和内存分配情况打印出来。

  2. 打开应用测试,然后通过相同方式保存内存信息。

  3. 如此重复多次,对比保存文件中的内存信息差异,确定是否有规律可循。如果存在内存泄漏,可通过Allocated memory info 信息对比查找泄漏源。

  • 通过工具 Assert 定位

出现内存问题时,查看 Assert 后的 ass 文件,打印的 Assert 信息的过程实际上是对当前系统处于 Assert 状态时的 Pool 内存和 Heap 内存信息检查的过程,通过观察内存信息输出的完整性检查,基本可以看出来系统的内存是否正常。

  1. Assert 后弹出界面有如下信息:

assert

  1. 按5输出内存池 Pool 和 Heap 使用情况,如果有大批内存池耗光(Avail_Num 为0),有可能是内存泄漏问题。

memory
memory

  1. 按4输出内存池的详细分配信息,可以查找哪些文件在大量申请内存,观察规律是否存在内存泄漏。

memory

memory

内存越界

内存越界是由于编程不当导致的内存越界覆盖,实际使用的内存空间大于申请的内存范围,比较常见的是 memcpy 时,数据源太长,导致内存end_flag被覆盖。此类问题需要找到发生内存越界之前的内存块,仔细分析该内存块的使用序列,是否有可能造成内存越界访问。

block pool

比如下面这种 block pool 越界的情况:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
-000|TXAS_SaveMainReg(
| ?,
| cur_lr = 0x0108D1C2,
| cur_pc = 0x80)
-001|TXAS_SystemAssert(
| exp = 0x0021229C,
| file = 0x00212224,
| line = 118,
| assert_info_ptr = 0x0)
| assert_mode = 1
| cur_sp = 36771216
| cur_lr = 3422259
| cur_pc = 7799016
| i = 0
-002|system_fatal_error_handler(
| ?,
| ?,
| ?,
| ?,
| ?)
-003|osa_fatal_error_handler_info(
| error_message_ptr = 0x0021229C,
| error_code = OSA_ERROR_BUFFMNGR_ISVALID_FAILED,
| os_error_code = 101,
| file = 0x00212224,
| line = 118)
-004|osa_validate_buff_footer(
| usr_buff = 0x03FD908C,
| curr_alloc_size = 1052)
-005|osa_int_release_buffer(
| ?,
| ?,
| dealloc_file = 0,
| dealloc_line = 1050)
| buff_hdr_p = 0x03FD906C
-006|SCI_Release_Buffer(
| buff_ptr = 0x03FD908C,
| entity_id = 3,
| dealloc_file = 0,
| dealloc_line = 1050)
| debug_buff_ptr = 0x03FD908C
-007|SCI_Free(
| memory_ptr = 0x03FD908C)
| free_ptr = 0x00345A95
-008|SendMultiPicByBt()
| pic_info = (filename = (68, 58, 92, 80, 104, 111, 116, 111, 115, 92, 68, 83, 67, 95
| send_file_info = 0x03FD9090
| list_ctrl_id = 2359329
| send_file_num = 2
| i = 3
| total_num = 3
-009|HandleShareItemsPiclistOptWinMsg(
| win_id = 2359302,
| ?,
| ?)
| title_str = (wstr_ptr = 0x0, wstr_len = 0)
| result = 1
| ctrl_id = 2359331
| list_ctrl_id = 2359329
| menu_id = 2359331
| group_id = 2359299
| kstring = (wstr_ptr = 0x0, wstr_len = 0)
-010|MMK_RunWinProc(
| ?,
| msg_id = 57345,
| param = 0x03E983C8)
-011|MMK_DispatchToHandle(
| handle = 1103560803,
| msg_id = 57345,
| param_ptr = 0x03E983C8)
| openwin_handle_result = 0
| old_handle = 16711680
-012|MMK_DispatchWinMSG(
| mmi_msg_ptr = 0x03FA1300)
-013|MMK_DispatchMSGQueue(
| ?)
-014|thread_entry_P_APP(
| ?,
| ?)
| receiveSignal = 0x0
| mmi_msg = 0x03FA1300
| ticks1 = 0
| ticks2 = 0
| is_log_on = 0
| time_period = 4294967295
| watchdog_ptr = 0x03ECF2EC
-015|ThreadEntry(
| ?)
| thread_entry = (entry = 0x0833AF73, argc = 0, argv = 0x0)
-016|tx_thread_shell_entry()
---|end of frame

在释放 0x03FD908C 内存地址空间时,函数 osa_validate_buff_footer 检查 block pool 的时候发现找不到 end flag 导致 assert。

memory

根据 threadx 内存分配原则,0x03FD94A8 地址内存的值应该是 0xF2F2F2F2,但是实际上是 0x003A0044。分析这个代码逻辑,发现原因是 SendMultiPicByBt 中 send_file_num 和 total_num 不相等导致内存越界。

byte pool

还有一种 byte pool 越界情况:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
-000|tx_byte_pool_search_ex(
| pool_ptr = 0x00B68A14,
| memory_size = 552)
| interrupt_save = 536870912
| current_ptr = 0x80818081
| examine_blocks = 119
| alloc_ptr = 0x025F385C
| min_size = 32
| section_limit_space_addr = 0x0
| section_limit_space_size = 4294967295
| section_high_space_addr = 0x0
| section_high_space_size = 4294967295
| search_cnt = 484
-001|tx_byte_allocate(
| pool_ptr = 0x00B68A14,
| memory_ptr = 0x02320998,
| ?,
| wait_option = 2863311530)
| thread_ptr = 0x01CBAAD8
-002|txe_byte_allocate(
| ?,
| ?,
| ?,
| ?)
-003|SCI_MallocEx(
| size = 520,
| type = 1145324612,
| file = 0x08652448,
| line = 143)
| memory_ptr = 0x0
| num_free_buffs = 0
| cur_pool_inx = 14
| alloc_size = 549
| byte_mem_header_ptr = 0x0
-004|allocatePName(
| pName = 0x023210EC)
-005|SFS_CreateFileInternal(
| file_name = 0x023210EC,
| access_mode = 49,
| share_mode = 0,
| file_attri = 0)
| __func__ = (83, 70, 83, 95, 67, 114, 101, 97, 116, 101, 70, 105, 108, 101
| handle = 0
-006|DrmParseDCF(
| ?)
| ret_val = 0
-007|DrmCheckIsDRM(
| file_name_ptr = 0x023210EC)
| ret_val = 0
-008|DRM_CreateFile(
| file_name_ptr = 0x023210EC,
| access_mode = 49,
| share_mode = 0,
| file_attribute = 0)
| drm_handle_f = 2555905
| drm_file_ptr = 0x014D1904
| drm_io_handle = 0
-009|SFS_CreateFile(
| ?,
| ?,
| ?,
| ?)
-010|MMIFILE_CreateFile(
| file_name = 0x023210EC,
| access_mode = 49,
| share_mode = 0,
| file_attri = 0)
| handle = 0
-011|MMIPICVIEW_IsSend(
| is_sms = 0,
| file_data_ptr = 0x023210EC)
| result = 0
| file_info = (file_name = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
| send_file_info = (filepath_name = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
-012|SendMultiPicByBt()
| pic_info = (filename = (69, 58, 92, 86, 105, 100, 101, 111, 92, 77, 79, 8
| send_file_info = 0x0268B9D0
| list_ctrl_id = 2359329
| send_file_num = 5
| i = 6
| total_num = 7
-013|HandleShareItemsPiclistOptWinMsg(
| win_id = 2359302,
| ?,
| ?)
| title_str = (wstr_ptr = 0x0, wstr_len = 0)
| result = 1
| ctrl_id = 2359331
| list_ctrl_id = 2359329
| menu_id = 2359331
| group_id = 2359299
| kstring = (wstr_ptr = 0x0, wstr_len = 0)
-014|MMK_RunWinProc(
| ?,
| msg_id = 57345,
| param = 0x03E9BB08)
-015|MMK_DispatchToHandle(
| handle = 2632974468,
| msg_id = 57345,
| param_ptr = 0x03E9BB08)
| openwin_handle_result = 0
| old_handle = 16711680
-016|MMK_DispatchWinMSG(
| mmi_msg_ptr = 0x03FA14E0)
-017|MMK_DispatchMSGQueue(
| ?)
-018|thread_entry_P_APP(
| ?,
| ?)
| receiveSignal = 0x0
| mmi_msg = 0x03FA14E0
| ticks1 = 0
| ticks2 = 0
| is_log_on = 0
| time_period = 4294967295
| watchdog_ptr = 0x03ECF298
-019|ThreadEntry(
| ?)
| thread_entry = (entry = 0x08650E7B, argc = 0, argv = 0x0)
-020|tx_thread_shell_entry()
---|end of frame

从死机处代码逻辑和 callback 中并没有看出问题,但是却在申请内存过程中,遍历 byte pool 出错。那么怀疑内存区域中存在异常,可能存在被覆盖的情况。使用 mocor_block_pool_list.cmm 脚本检查 byte pool 内存空间:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
byte_static_heap:       0x0B68984
byte_dynamic_base_heap: 0x0B689CC
byte_dynamic_app_heap: 0x0B68A14
pool start address: 0x24FF4D4
pool end address: 0x3E878FC
pool addr: 0x24FF4D4
size: 0x888

~~~
pool addr: 0x268B9AC
size: 0x0A64
pool addr: 0x268C410
pool memory end: 0x56005C
pool count: 0x1E1

0x268B9AC 处的内存节点正常,但是 0x268C410 处的内存节点已经异常,end_flag:0xAA 已经丢失,变成了一片非法数据。

按照函数 tx_byte_pool_search_ex 代码逻辑,最终会导致 0 地址访问,R5=0x80818081,然后执行 ldr r0,[r5,#04] ,导致4字节对齐异常死机 (这个和 assert 文件中的错误提示也是匹配的:Fault address :0x80818085 )。

memory

查看 0x268B9AC 地址空间的内存分配情况,如下:

memory

分析 mmipicview_wintab.c 文件中内存分配和使用的代码,发现是 SendMultiPicByBT 函数中针对 send_file_info 分配的内存使用时存在越界操作,从而引发了上面的问题。

内存覆盖

导致内存覆盖原因很多,空指针的操作会操作 0 地址,相对比较容易检查,因为 0 地址一般是 DSP code 区域,可以通过BUSMonitor 辅助监控 code 区域,定位到覆盖代码段的源头。

野指针的操作比较复杂,可以利用 pool_list.cmm 脚本检查内存分配的完整性,也可以通过 dump memory 内容进行比对,寻找覆盖来源。

死机现场:

1
2
3
File:  tx_byta.c
Line: 240
ASSERT(current_ptr != (*((CHAR_PTR *) current_ptr)))

推导出 callback

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
-000|TXAS_SaveMainReg(
| ?,
| cur_lr = 0x0108D1CE,
| cur_pc = 0x80)
-001|TXAS_SystemAssert(
| exp = 0x00215CAC,
| file = 0x00215C5C,
| line = 240,
| assert_info_ptr = 0x0)
| assert_mode = 1
| cur_sp = 38244696
| cur_lr = 3672209
| cur_pc = 8536932
| i = 12387240
-002|SCI_Assert(
| ?,
| ?,
| ?)
-003|tx_byte_pool_search_ex(
| pool_ptr = 0x00C35B3C,
| memory_size = 307776)
| interrupt_save = 536870912
| examine_blocks = 85
| alloc_ptr = 0x0
| min_size = 4294967295
| section_limit_space_addr = 0x0
| section_limit_space_size = 4294967295
| section_high_space_addr = 0x0
| section_high_space_size = 4294967295
| search_cnt = 150
-004|tx_byte_allocate(
| pool_ptr = 0x00C35B3C,
| memory_ptr = 0x02479238,
| ?,
| wait_option = 2863311530)
| thread_ptr = 0x01E10380
-005|txe_byte_allocate(
| ?,
| ?,
| ?,
| ?)
-006|SCI_MallocApp(
| ?,
| file = 0x0904406C,
| line = 5942)
| memory_ptr = 0x0
| num_free_buffs = 0
| cur_pool_inx = -1
| alloc_size = 307773
| byte_mem_header_ptr = 0x0
-007|BL_MallocEx(
| ?,
| file = 0x0904406C,
| line = 5942)
| index = 2
-008|MMI_BL_Malloc(
| id = 270,
| file = 0x0904406C,
| line = 5942)
-009|AllocTrans3DBuf(
| old_buf_pptr = 0x024792BC,
| new_buf_pptr = 0x024792B8)
| result = 0
-010|MMIDEFAULT_SaveOldMoveBuf(
| buf_type = SCR_EFFECT_BUF_TYPE_SLIDE_RIPPLE = SCR_EFFECT_BUF_TYPE_WIN_SWITCH)
| buf_ptr = 0x0265D100
| buf_width = 240
| old_buf_ptr = 0x0
| new_buf_ptr = 0x0
-011|HandleMenuOkKey(
| menu_ctrl_ptr = 0x03FE0420,
| ?)
| is_handled = 0
| is_grayed = 0
| is_exist_child = 1
| cur_item_top = 0
| base_ctrl_ptr = 0x03FE0420
| cur_item = (menu_id = 0, tip_id = 0, button_id = (0, 0, 0), text_str_id = 0, select_icon
| cur_node_ptr = 0x027DC338
| lcd_dev_info = 0x092B5920
| lcd_rect = (left = 0, top = 0, right = 239, bottom = 319)
-012|MenuHandleMsg(
| ?,
| msg_id = 64027,
| param = 0x02479404)
| result = 1
| menu_ctrl_ptr = 0x03FE0420
-013|VTLCTRL_HandleMsg(
| iguictrl_ptr = 0x03FE0420,
| msg_id = 64027,
| param_ptr = 0x02479404)
-014|MMK_RunCtrlProc(
| ?,
| msg_id = 64027,
| param = 0x02479404)
| me_ptr = 0x03FE0420
-015|MMK_DefaultProcessWinMsg(
| ?,
| msg_id = 64027,
| param = 0x02479404)
| result = 0
| ctrl_handle = 1288896624
-016|MMK_DispatchToHandle(
| handle = 1288765504,
| msg_id = 64027,
| param_ptr = 0x02479404)
| bResult = 0
| old_handle = 16711680
-017|MMK_DispMsgToFocusWin(
| msg_id = 64027,
| param_ptr = 0x02479404)
-018|MMK_DispMsgToWin(
| msg_id = 64027,
| ?)
| result = 0
-019|HandleMSGKbd(
| keys_status = 64000,
| key_code = 27)
| multi_key_tp_param = (is_slide = 0, pre_tp_point = (x = 0, y = 0), cur_tp_point = (x = 0
-020|MMK_DispatchMSGKbd(
| ?)
| keypress_ptr = 0x03EB1488
| key_code = 27
| is_long_press = 0
-021|MMK_DispatchExtSig(
| ?)
-022|thread_entry_P_APP(
| ?,
| ?)
| receiveSignal = 0x03EB1488
| mmi_msg = 0x03FA0600
| ticks1 = 0
| ticks2 = 0
| is_log_on = 0
| time_period = 4294967295
| watchdog_ptr = 0x03EBE978
-023|ThreadEntry(
| ?)
| thread_entry = (entry = 0x0800430F, argc = 0, argv = 0x0)
-024|tx_thread_shell_entry()
---|end of frame

从 callback 能够看出,是在内存分配过程中出现问题,这里编译内存节点时出现异常,使用 mocor_byte_pool_list.cmm 进行检查:

memory

根据内存分配器的规则查找,从 0x027DD46C 寻找下一个内存节点时出现问题,这里的内存被 0x21242124覆盖。

根据对应 assert 文件中的信息,ctrlmenu.c 分配的内存从0x027DC2BC 到 0x027DD46C,但是 0x027DD46C 开始的位置被覆盖了。

根据log记录,这段内存在死机前先是分配给了jpeg decode使用,通过代码逻辑我们可以看到 APP 发起的 Destroy 立刻返回并释放了内存,并没有等待 Set Event 的动作(解码 IMG_DEC 结束)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
2265207-4     37131 [MMIPIC]:HandlePicListWinMsg msg_id =f023
2265207-5 37131 [IMG_DEC_Destroy +] handle = 0x85a0fd17
2265207-6 37131 _IMG_DEC_Get_Caller_Priority: T_P_APP, queue_name: Q_P_APP, priority: 76
2265207-7 37131 _IMG_DEC_Get_Caller_Priority: T_IMG_DEC, queue_name: T_IMG_DEC_QUEUE, priority: 76
2265207-8 37131 _IMG_DEC_SendMsg: sig_code = 2
2265207-9 37131 [IMG_DEC_Destroy -] handle = 0x85a0fd17
2265207-10 37131 GUIANIM_DestroyHandle:destroy handle=0x85a0fd17 result is 0!

(byte_heap_hdr_struct)0x27dc2c4 = (
pre = 0x03D47820,
succ = 0x00C6E1FC,
file_name = 0x08DB3DB4 -> "ctrlmenu.c",
line = 861,
size = 4493,
block_num = 572061)

>
0x27dc2bc 0x27dd46c 0x11b0 ALLOC ctrlmenu.c(861)
>
0x27dd46c 0x21242124 0x1ea64cb8 ALLOC (556015908)
>

GUIANIM_DestroyHandle 没有等待 decode 执行JPEGDEC_DestoryHandle 停止底层解码的动作,提前释放内存,导致当前 decode 的动作继续使用了之前的内存,如下,通过 T_IMG_DEC task 的信息也可以看出来,target_ptr_=_0x027DC300 依然存在。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
-000|tx_thread_suspend(
| ?)
|
-001|tx_semaphore_get(
| ?,
| ?)
| thread_ptr = 0x01E16F50
|
|
-002|txe_semaphore_get(
| ?,
| ?)
|
|
-003|SCI_GetSemaphore(
| ?,
| ?)
| _sem_ptr = 0x03EBC074
|
|
-004|FreeLock()
| sem_count = 0
| susp_count = 0
| semap_ptr = 0x03EBC074
|
|
-005|JPEGDEC_DestoryHandle(
| ?,
| exit_type = IMG_DEC_EXIT_HALT)
| dec_info_ptr = 0x03F79100 -> (
| src_info = (src_ptr = 0x0, src_file_size = 14576, src_file_handle = 150
| frame_in_param = (
| handle = 66556160,
| target_ptr = 0x027DC300,
| target_buf_size = 11552,
| target_width = 76,
| target_height = 76,
| img_rect = (left = 0, top = 40, right = 239, bottom = 279),
| target_rect = (left = 0, top = 0, right = 75, bottom = 75),
| data_format = IMG_DEC_RGB565,
| frame_index = 0,
| is_dec_thumbnail = 0,
| is_exist_background = 0,
| padding1 = 0,
| padding2 = 0,
| alpha_buf_ptr = 0x0,
| alpha_buf_size = 0,
| write_data = 0x0,
| callback = 0x003AE0CB,
| app_param_ptr = 38240904,
| app_param_size = 12,
| quality = JINF_QUALITY_HIGH,
| target_buf_width = 0,
| target_buf_height = 0,
| img_dec_mode = IMG_DEC_TARGET_SIZE_RESIZABLE),
| frame_out_param = (is_decode_finished = 0, is_process_alpha = 0, paddin
| frame_extra_info = (priority = 77),
| dec_buf_ptr = 0x03D4783C,
| read_buf_ptr = 0x0,
| ret_val = IMG_DEC_RET_SUCCESS,
| jpeg_type = JINF_JPEG_TYPE_BASELINE)
|
|
-006|IMG_DEC_Destroy_Hal(
| ?,
| handle = 66556160,
| exit_type = IMG_DEC_EXIT_HALT)
|
|
-007|IMG_DEC_Remove_Command(
| ?)
| ret = IMG_DEC_RET_SUCCESS
| dec_handle_ptr = 0x03FE0EA0
| tmp_cmd_ptr = 0x03EBBFCC
| cur_cmd_ptr = 0x03ED1368
| next_cmd_ptr = 0x0
| pre_cmd_ptr = 0x03EBBFCC
|
|
-008|IMG_DEC_Task_Routine(
| ?,
| ?)
| command = 2
| param0 = 2241920279
| param2 = 0
| sig_ptr = 0x03EC8D34
| handle = 2241920279
|
|
-009|ThreadEntry(
| ?)
| thread_entry = (entry = 0x003AE7A9, argc = 0, argv = 0x0)
|
|
-010|tx_thread_shell_entry()
|
|
---|end of frame

通过研究代码发现此问题的原因如下:

buffer target_ptr 的申请与释放是由应用层所做的,当应用层调用了函数 IMG_DEC_Remove_Handle 后,只是停止发送解码消息的进程,实际解码工作仍在进行,当解码完成后,会调用函数 JPEG_OutputData 将解码后的数据写入 target_ptr 所指内存,但上层应用此时已经释放了此内存,导致非法内存访问。需要修改代码流程,在调用 JPEG_OutputData 函数前,判断该图片是否被强制结束解码,如果是则不调用,否则则调用,从而将解码后的数据传递给上层应用。

另外发现 JVM callback 在图片解码 task 中调用导致 envent 状态错乱,导致 APP 发起的 Destroy 立刻返回并释放了内存。

死锁问题

死锁问题一般是因为互斥量的使用不当引发的问题,可能会导致界面不响应或看门狗复位等问题。

以看门狗复位为例,ass 文件有以下信息:

1
2
3
4
File:  watchdog.c
Line: 353
PASSERT(SCI_FALSE)
> Task APP timeout

对应 callback 如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-000|TXAS_SaveMainReg(
| ?,
| cur_lr = 0x0108D180,
| cur_pc = 0x80)
-001|TXAS_SystemAssert(
| exp = 0x003ACD84 -> ,
| file = 0x003ACD78 -> ,
| line = 353,
| assert_info_ptr = 0x00C36215 -> "Task APP timeout")
| assert_mode = 1
| cur_sp = 31244224
| cur_lr = 3791281
| cur_pc = 8534840
| i = 12387240
-002|SCI_PAssert(
| ?,
| ?,
| ?,
| ?)
-003|CheckAllTask()
| list = 0x03EBEB2C
| curr_tick = 3562772
-004|DoIdle_DoCallback(
| param = 5000)
| i = 2
-005|DoIdle_Callback(
| ?)
| assert_mode = 44
| dischg = (warning_vol = 64244, shutdown_vol = 1011, deadline_vol = 65369,
-006|osa_timer_routine_wraper(
| usr_timer_id = 0x03F3FAF4)
-007|tx_timer_thread_entry(
| ?)
| timeout_function_backup = 0x003BFF59
| expired_timers = 0x03F3FAFC
| timeout_function = 0x003BFF59
-008|tx_thread_shell_entry()
---|end of frame

提示死机原因是 Task APP timeout ,那么分析代码是在 mmimain.c 中,函数 void APP_Task(uint32 argc, void *argv) 注册的看门狗没有及时喂狗。

watchdog_ptr = SWDG_RegTask("APP", 180000)

分析 APP_Task callback 如下,这里在获取 img_decoder_event 时被挂起。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
-000|tx_thread_suspend(?)
|
-001|tx_event_flags_get(?, ?, ?, ?, ?)
|
|
-002|txe_event_flags_get(?, ?, get_option = 1, ?, ?)
|
|
-003|SCI_GetEvent(?, ?, get_option = 1, ?)
|
|
-004|IMG_DEC_GetEvent(requested_flags = 4, ?)
|
|
-005|IMG_DEC_Remove_Handle(?)
|
|
-006|IMG_DEC_Destroy(handle = 2241919696)
|
|
-007|GUIANIM_DestroyHandle(?, img_handle = 2241919696, ?)
|
|
-008|HandleAnimLoseFocus(anim_ctrl_ptr = 0x03FDAFD8)
|
|
-009|AnimCtrlHandleMsg(?, ?, ?)
|
|
-010|VTLCTRL_HandleMsg(iguictrl_ptr = 0x03FDAFD8, msg_id = 61492, param_ptr = 0x
|
|
-011|MMK_RunCtrlProc(?, msg_id = 61492, param = 0x0)
|
|
-012|ControlTreeNodeHandleEvent(?, ?, ?)
|
|
-013|MMK_DispatchToAllTreeNode(?, func = 0x09076FEF, msg_id = 61492, param = 0x0
|
|
-014|MMK_DispatchToAllControl(?, msg_id = 61492, param = 0x0, state = 2)
|
|
-015|MMK_ProcSpecialWinMsg(win_handle = 89194548, ?, param = 0x0)
|
|
-016|MMK_DispatchToHandle(handle = 89194548, msg_id = 61475, param_ptr = 0x0)
|
|
-017|MMK_SendMsg(?, msg_id = 61475, param_ptr = 0x0)
|
|
-018|MMK_OpenWin(win_handle = 90570798, ?)
|
|
-019|AppletCreateWindow(?, is_win_table = 1)
|
|
-020|MMK_CreateWinTable(create_ptr = 0x02477A24)
|
|
-021|MMK_CreateWin(win_table_ptr = 0x082A4600, add_data_ptr = 0x0)
|
|
-022|HandlePicListWinMsg(?, ?, ?)
|
|
-023|MMK_RunWinProc(?, msg_id = 57345, param = 0x03E9BC88)
|
|
-024|MMK_DispatchToHandle(handle = 89194548, msg_id = 57345, param_ptr = 0x03E9B
|
|
-025|MMK_DispatchWinMSG(mmi_msg_ptr = 0x03FA06C0)
|
|
-026|MMK_DispatchMSGQueue(?)
|
|
-027|thread_entry_P_APP(?, ?)
|
|
-028|ThreadEntry(?)
|
|
-029|tx_thread_shell_entry()
|
|
---|end of frame

而 img_decoder_event 应该在 T_IMG_DEC Task 中被释放,但是这个 Task 被信号量 JPEG_FREE_RES_SEMAP 挂起,参考下面的 callback:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
-000|tx_thread_suspend(
| ?)
|
-001|tx_semaphore_get(
| ?,
| ?)
| thread_ptr = 0x01E15D10 -> (
| tx_thread_id = 1414025796,
| tx_run_count = 265,
| tx_stack_ptr = 0x026A7F38,
| tx_stack_start = 0x026A70F4,
| tx_stack_end = 0x026A80EF,
| tx_stack_size = 4092,
| tx_time_slice = 0,
| tx_new_time_slice = 0,
| tx_ready_next = 0x01E15D10,
| tx_ready_previous = 0x01E15D10,
| tx_thread_name = 0x003AD978 -> "T_IMG_DEC",
| tx_priority = 76,
| tx_state = 6,
| tx_delayed_suspend = 0,
| tx_suspending = 0,
| tx_preempt_threshold = 76,
| tx_priority_bit = (0, 0, 4096, 0, 0, 0, 0, 0),
| tx_thread_entry = 0x003B234F,
| tx_entry_parameter = 65588616,
| tx_thread_timer = (tx_remaining_ticks = 4294967295, tx_re_initialize_ti
| tx_suspend_cleanup = 0x002162D1,
| tx_suspend_control_block = 0x03EBC074,
| tx_suspended_next = 0x01E15D10,
| tx_suspended_previous = 0x01E15D10,
| tx_suspend_info = 1,
| tx_additional_suspend_info = 0x026A808C,
| tx_suspend_option = 1,
| tx_suspend_status = 0,
| tx_created_next = 0x01E15E3C,
| tx_created_previous = 0x01E146CC,
| tx_filex_ptr = 0x0,
| time = 0,
| tx_thread_stack_highest_ptr = 0x026A7B58)
|
|
-002|txe_semaphore_get(
| ?,
| ?)
|
|
-003|SCI_GetSemaphore(
| ?,
| ?)
| _sem_ptr = 0x03EBC074 -> (
| sem_id = (
| tx_semaphore_id = 1397050689,
| tx_semaphore_name = 0x003DDCB4 -> "JPEG_FREE_RES_SEMAP",
| tx_semaphore_count = 0,
| tx_semaphore_suspension_list = 0x01E15D10,
| tx_semaphore_suspended_count = 1,
| tx_semaphore_created_next = 0x03EBC218,
| tx_semaphore_created_previous = 0x03EBC020),
| sem_stat = 0x03E8CE48)
|
|
-004|FreeLock()
| sem_count = 0
| susp_count = 0
| semap_ptr = 0x03EBC074
|
|
-005|IMGJPEG_FreeRes()
| pContext = 0x01250CDC
| ret_value = 255
|
|
-006|JPEGDEC_DestoryHandle(
| ?,
| exit_type = IMG_DEC_EXIT_HALT)
| dec_info_ptr = 0x03F75488
|
|
-007|IMG_DEC_Destroy_Hal(
| ?,
| handle = 66540680,
| exit_type = IMG_DEC_EXIT_HALT)
|
|
-008|IMG_DEC_Remove_Command(
| ?)
| ret = IMG_DEC_RET_SUCCESS
| dec_handle_ptr = 0x03FDD4E0
| tmp_cmd_ptr = 0x03EBBFCC
| cur_cmd_ptr = 0x03EC0700
| next_cmd_ptr = 0x0
| pre_cmd_ptr = 0x03EBBFCC
|
|
-009|IMG_DEC_Task_Routine(
| ?,
| ?)
| command = 2
| param0 = 2241919696
| param2 = 0
| sig_ptr = 0x03EC0604
| handle = 2241919696
|
|
-010|ThreadEntry(
| ?)
| thread_entry = (entry = 0x003AD589, argc = 0, argv = 0x0)
|
|
-011|tx_thread_shell_entry()
|
|
---|end of frame

通过 ass 文件,也能够看出 T_IMG_DEC Task 被 JPEG_FREE_RES_SEMAP 挂起。

1
2
> JPEG_FREE_RES_SEMAP                    0          
> Suspend Task_Name : T_IMG_DEC

根据这些 callback 继续分析代码,得到如下的结论:
应用窗体在丢失焦点时会发消息让 T_IMG_DEC 执行销毁流程,而 T_IMG_DEC拿到锁执行 JPEGDEC_DestroyHandle 销毁流程,这个执行过程中会释放 JPEG_FREE_RES_SEMAP;此时 T_JPEG_DECODER 从挂起状态解除,但是因为在 JPEGDEC_DestroyHandle 前面被销毁了,导致不能执行。两个 task 都不能执行,所以导致 timeout。

Task Queue Full问题

消息队列满现象为Assert提示:ASSERT: Error 0xb (The queue was full !),直接原因为接收消息的Task得不到执行,导致消息队列满,而在发送消息的任务检测到无法发送消息,直接报告消息队列满错误。

可能原因如下:
1.Task优先级太低,一直无法得到执行。
2.Task因为某些原因(比如死锁或信号量等)无法处理消息,可以分析代码逻辑。
3.中断处理太多,导致Task得不到执行,可以通过通过TaskAnalyzer工具分析中断原因。
4.关中断时间太长,导致Task得不到执行,尽量减少关中断的时间。
5.消息队列长度设置不正确,可以增加Queue Size。

此问题分析关键:首先找到无法处理消息的Task,而后逐条分析,包括当前Task队列的消息检查,也可以在Assert窗口输入命令“6”,输出Task的各项信息,寻找可用Queue数目为0的Task,这个Task就是问题点。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
-000|TXAS_SaveMainReg(
| ?,
| cur_lr = 0x0108D1CE,
| cur_pc = 0x80 -> 0)
-001|TXAS_SystemAssert(
| exp = 0x023482FC -> ,
| file = 0x00377A80 -> ,
| line = 810,
| assert_info_ptr = 0x0 -> NULL)
| assert_mode = 1
| cur_sp = 36994960
| cur_lr = 3666785
| cur_pc = 7830016
| i = 11666952
| tem_str = "RTOS/source/src_osa/c/threadx_os.c"
-002|SCI_Assert(
| ?,
| ?,
| ?)
-003|SCI_SendSignal(
| signal_ptr = 0x03EB67C8,
| ?)
| status = 11
-004|MMISRV_CAMERAROLL_Download_Thumbnail()
| sig_ptr = 0x03EB67C8 -> (
| sig = (SignalCode = 1, SignalSize = 20, Pre = 0x0, Suc = 0x0, Sender = 21),
| data_ptr = 0x0)
-005|HandlePicListWinMsg(
| ?,
| ?,
| ?)
| result = 1
| title_str = (wstr_ptr = 0x0, wstr_len = 0)
| query_win_id = 2359321
| mark_wstr = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
| mark_num_str = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
| mark_num_wstr = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
| data_info = (is_bitmap = 76, is_free_bitmap = 12, is_save_data = 185, data_ptr = 0xF
-006|MMK_RunWinProc(
| ?,
| msg_id = 53255,
| param = 0x026B9D14)
-007|MMK_DispatchToHandle(
| handle = 62717996,
| msg_id = 53255,
| param_ptr = 0x026B9D14)
| openwin_handle_result = 0
| old_handle = 16711680
-008|MMK_SendMsg(
| ?,
| msg_id = 53255,
| param_ptr = 0x026B9D14)
| result = 0
-009|MMIAPIPICVIEW_HandleCameraRollSig(
| msg_id = 53255,
| param = 0x026B9D14)
-010|HandlePsAndRefMsg(
| ?,
| msg_id = 53255,
| param = 0x026B9D14)
| result = 1
-011|DispatchSysSig(
| signal_ptr = 0x026B9D14)
| i = 131
| regapp_num = 245
-012|MMK_DispatchExtSig(
| ?)
-013|thread_entry_P_APP(
| ?,
| ?)
| receiveSignal = 0x026B9D14
| mmi_msg = 0x03FA1410
| ticks1 = 0
| ticks2 = 0
| is_log_on = 0
| time_period = 4294967295
| watchdog_ptr = 0x03EBE534
-014|ThreadEntry(
| ?)
| thread_entry = (entry = 0x08EFC94B, argc = 0, argv = 0x0)
-015|tx_thread_shell_entry()
---|end of frame

结合代码能够看到,函数 MMISRV_CAMERAROLL_Download_Thumbnail() 发送消息给 T_P_APP_CAMERAROLL_TASK 时,发现该Task消息队列满。

查看ASS文件,发现 T_P_APP_CAMERAROLL_TASK 消息队列确实已满:

1
2
3
Task_ID Name Tcb_Addr Current_PC  Queue_All Queue_Avail 
0xaf T_P_APP_CAMERAR 0x01ceaf800 x002127d0 20 0
TX_READY 206

查看T_P_APP_CAMERAROLL_TASK callback,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-000|tx_thread_suspend(
| ?)
|
-001|tx_queue_receive(
| ?,
| ?,
| ?)
| thread_ptr = 0x01CEAF88
|
|
-002|txe_queue_receive(
| ?,
| ?,
| ?)
|
|
-003|SCI_GetSignal(
| ?)
| item = 66974816
| thread_block = 0x01CEAF80
|
|
-004|CAMERAROLL_Task(
| ?,
| ?)
|
|
-005|ThreadEntry(
| ?)
| thread_entry = (entry = 0x08074DD5, argc = 0, argv = 0x0)
|
|
-006|tx_thread_shell_entry()
|
|
---|end of frame

通过TaskAnalyzer内存打点信息可以看出,T_P_APP以及网络相关高优先级的Task被频繁调度,导致低优先级的 T_P_APP_CAMERAROLL_TASK 得不到调度,通过分析T_P_APP_CAMERAROLL_TASK 消息类型,发现是T_P_APP一直在重复频繁发送如下三个消息:

1
2
3
0xAD14:HTTP_SIG_GET_CNF
0XAD18:HTTP_SIG_HEADER_IND
0xAD1A:HTTP_SIG_DATA_IND

queue

分析代码问题原因:CAMERAROLL_TASK 并发使用了12个Http,但是对应的Queue只有20个,并不能支持场景使用。最优解是增加Queue个数,同时降低Http并发个数。

栈溢出问题

栈溢出的可能原因如下:
1.栈空间分配太小,不足以满足大量局部变量的使用场景,应尽量使用堆区动态内存。
2.函数调用层次过多,或者陷入递归死循环当中。
3.栈空间内存异常,可能发生内存覆盖导致栈空间数据异常。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
-000|TXAS_SaveMainReg(
| ?,
| cur_lr = 0x0108D1CE,
| cur_pc = 0x80)
-001|TXAS_SystemAssert(
| exp = 0x0021873C,
| file = 0x00218710,
| line = 763,
| assert_info_ptr = 0x00C35F3D -> "Stack Overflow,thread:0x1e00dd8,sp overflow addr:0x2459928,thread ID:0x15,Tx Name:T_P_APP")
| assert_mode = 1
| cur_sp = 66875528
| cur_lr = 3778945
| cur_pc = 8533360
| i = 12387240
-002|SCI_PAssert(
| ?,
| ?,
| ?,
| ?)
-003|prod_thread_stack_overflow_handle(
| ?)
-004|tx_thread_stack_error_handler(
| ?)
| interrupt_save = 128
-005|tx_thread_suspend(
| ?)
| interrupt_save = 536870912
-006|tx_event_flags_get(
| ?,
| ?,
| ?,
| ?,
| ?)
| thread_ptr = 0x01E07050
-007|txe_event_flags_get(
| ?,
| ?,
| get_option = 1,
| ?,
| ?)
-008|SCI_GetEvent(
| ?,
| ?,
| get_option = 1,
| ?)
-009|mta_ex_trace_task(
| ?,
| ?)
| actual_flag = 2
| request_flag = 2
-010|ThreadEntry(
| ?)
| thread_entry = (entry = 0x003C76EB, argc = 0, argv = 0x0)
-011|tx_thread_shell_entry()
---|end of frame

根据上面callback的提示信息,T_P_APP Task存在栈溢出,tx_stack_ptr = 0x2459928 超出了进程的栈帧起始地址 tx_stack_start_=_0x02462728

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
  (TX_THREAD*)0x1E00DE0 = 0x01E00DE0 -> (
tx_thread_id = 1414025796,
tx_run_count = 3138736,
tx_stack_ptr = 0x02459928,
tx_stack_start_=_0x02462728,
tx_stack_end = 0x02469F23,
tx_stack_size = 30716,
tx_time_slice = 0,
tx_new_time_slice = 0,
tx_ready_next = 0x01E00DE0,
tx_ready_previous = 0x01E00DE0,
tx_thread_name = 0x00202A60 -> "T_P_APP",
tx_priority = 76,
tx_state = 0,
tx_delayed_suspend = 0,
tx_suspending = 0,
tx_preempt_threshold = 76,
tx_priority_bit = (0, 0, 4096, 0, 0, 0, 0, 0),
tx_thread_entry = 0x003B1FF7,
tx_entry_parameter = 65597768,
tx_thread_timer = (tx_remaining_ticks = 0, tx_re_initialize_ticks = 0, tx_ti
tx_suspend_cleanup = 0x0,
tx_suspend_control_block = 0x01E00ED4,
tx_suspended_next = 0x01E00DE0,
tx_suspended_previous = 0x01E00DE0,
tx_suspend_info = 4,
tx_additional_suspend_info = 0x02469570,
tx_suspend_option = 1,
tx_suspend_status = 0,
tx_created_next = 0x01E01290,
tx_created_previous = 0x01DFFFD0,
tx_filex_ptr = 0x0,
time = 0,
tx_thread_stack_highest_ptr = 0x02459928)

手动推导T_P_APP Task callback发现,存在000~003死循环导致栈溢出。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
-000|HandlePicListWinMsg(
| win_id = 2359300,
| msg_id = 64004,
| param = 0xFA04)
| title_str = (wstr_ptr = 0x0, wstr_len = 0)
| query_win_id = 2359321
| mark_wstr = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
| mark_num_str = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
| mark_num_wstr = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
-001|MMK_RunWinProc(
| ?,
| msg_id = 64004,
| param = 0x0)
|
|
-002|MMK_DispatchToHandle(
| handle = 3217293358,
| msg_id = 64004,
| param_ptr = 0x0)
| openwin_handle_result = 0
| old_handle = 3217293358
|
|
-003|MMK_SendMsg(
| ?,
| msg_id = 64004,
| param_ptr = 0x0)
| result = 0
|
|
-004|HandlePicListWinMsg(
| win_id = 2359300,
| ?,
| ?)
| result = 1
| title_str = (wstr_ptr = 0x0, wstr_len = 0)
| query_win_id = 2359321
| mark_wstr = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
| mark_num_str = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
| mark_num_wstr = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
| data_info = (is_bitmap = 0, is_free_bitmap = 0, is_save_data = 0, data_pt
|
|
-005|MMK_RunWinProc(
| ?,
| msg_id = 64004,
| param = 0x02469E64)
|
|
-006|MMK_DispatchToHandle(
| handle = 3217293358,
| msg_id = 64004,
| param_ptr = 0x02469E64)
| openwin_handle_result = 0
| old_handle = 16711680
|
|
-007|MMK_DispMsgToFocusWin(
| msg_id = 64004,
| param_ptr = 0x02469E64)
|
|
-008|MMK_DispMsgToWin(
| msg_id = 64004,
| ?)
| result = 0
|
|
-009|HandleMSGKbd(
| keys_status = 64000,
| key_code = 4)
| multi_key_tp_param = (is_slide = 0, pre_tp_point = (x = 0, y = 0), cur_tp
|
|
-010|MMK_DispatchMSGKbd(
| ?)
| keypress_ptr = 0x03EAE9C8
| key_code = 4
| is_long_press = 0
|
|
-011|MMK_DispatchExtSig(
| ?)
|
|
-012|thread_entry_P_APP(
| ?,
| ?)
| receiveSignal = 0x03EAE9C8
| mmi_msg = 0x03FA0650
| ticks1 = 0
| ticks2 = 0
| is_log_on = 0
| time_period = 4294967295
| watchdog_ptr = 0x03EBE978
|
|
-013|ThreadEntry(
| ?)
| thread_entry = (entry = 0x08004393, argc = 0, argv = 0x0)
|
|
-014|tx_thread_shell_entry()
|
|
---|end of frame
|

总结

通过MTBF和Monkey测试会暴露出各种问题,需要借助于丰富的调试方法进行分析定位。当然除了常规的调试手法,我们也借助AMBA Bus Monitor监控指定的代码段,包括boot、kernel、dsp等区域,以定位内存区域复写的异常情况。

1.MTBF测试明确了平均故障间隔时间,可以反映出产品的时间质量。
2.Monkey压力测试,保证产品的软硬件稳定性。
3.EUT Release版本测试,此时大部分问题已经收敛,该阶段问题不易复现,可能安排各种专项测试。