>>>>
What is debugging
To gain an insight into what happens in the kernel, we have to read and understand source code. But we should take a close look at the system while it is running in order to find out what is going on inside kernel or kernel devier.
>>>
This activity is called debugging. When fixing the problem during project, I can say that I am "debugging" the system.
>>>
Debugging is considered to be the most important skill for troubleshooting. So many of BSP engineers want to improve their debugging skills. Every time I talk with BSP engineer, they used to say that they had a hard time to debug difficult issues.
>>>
Sometimes, they contact me with this question. "How to use Trace32 simulator?". "It is hard to understand signature in the kernel log. Please help me out" And debugging is the main topic for BSP engineer during coffee break. They have a lot of behind stories related to debugging.
>>>
Anyway. Debugging can be categorized into two parts. The first one is on-target debugging. And second one is offline debugging.
>>> END OF PAGE
First let's talk about On-target debugging. When we debug with real sample device, it is called On-target debugging. On-target debugging is a basic way of debugging software issues,
>>>
because we have access to most of the information from the device. For this, there is condition that the issue be easily reproducible. It also requires additional hardware components like the host PC, USB cable, and JTAG.
>>>
There are various debug tools and mechanisms based on failure symptom, scenario, OS, module, or cores. Most common tool for BSP debugging is serial log.
>>>
The debug messages can be printed through a serial console so we can read necessary log. JTAG is preferred for Linux kernel-level debugging, especially for bootloader
>>>
And the second type of debugging is offline debugging. Offline debugging is an alternative approach to on-target debugging. Offline debugging uses captured logs instead of an actual device.
>>>
Offline debugging is convenient and efficient, it is easier to share logs than sharing actual devices. When you need extra information, extra debug code can be added any time to collect another set of log. However, you should make appropriate debug code before the issue is reproduced.
>>>
In addition, symbol information is needed. Take for example. The vmlinux file has debug symbols for the kernel BSP. For the other RTOS or bootloader an .elf format file has the symbol information.
>>>
Various types of logs can be used for offline debugging. Complete RAM dump is usually most powerful, but you can use other log files saved in log file system.
>>>>>>>>>> NEXT PAGE
There several effective debugging tools and features. And let me introduce several effective debugging tools and features.
>>>
Ther are UART serial console, printk, dump_stack, ftrace, debugfs, proc file system, sysrq magic number. There debugging features are provided by linux kernel.
>>>
And we have more powerful debugging tools like; RAM dump, TRACE32, Crash Utility. In order to use these debug features, we need extra support from SoC vendor.
>>>
Please note that all these debug features will be explained in the next video. I hope this video will help you, see you next time.
Bring-up
What are BSP and system engineers doing in the project? The fist thing BSP engineers are doing is bring-up the target device. In this video, let’s explore about the procedure of bring-up task.
>>>
The bring-up task can be categoried into two part; the first thing is source bring-up. And second thing to do is troubleshooting with real-target device.
>>>
First of all, let's talk about source bring-up. The first thing we have to do is to get source code from SoC vendor. If your chipset is Qualcomm
snapdragon, you should get source from Qualcomm.
>>>
The source code from SoC vendor is called BSP, which stands for board support package. When SoC vendor release their BSP code, they provide us with git repository information for us to pull source code. So we have to be familiar with git command and git utility.
>>>
Sometimes, SoC vendor specify the certain compiler version. Or they customized the compiler to optimize more. In this case, we have to install compiler in your linux machine such as Ubuntu or debian.
>>>
After we install compiler, what should we do? The next thing to do is to build the source code. In many times, the build fails with error message. In this case troubleshooting is required to fix build error.
>>>
Soc vendor sometimes provide documents which describes build command or git repository information to get BSP code. So please contact SoC vendor to know about this information.
>>>
The second thing to do is building source code with compiler. After the source is compiled without error, it is time to analyze the build system.
>>>
Understanding the build system means we get to know how the each directory is being compiler by looking at makefile or configuration file. We also have to check how the total image is generated.
>>>
What should we do to analyze build system? First we have to find makefile which specifies the feature and configuration of the system. Sometimes we have to get approval from architect who maintains the whole system.
Until now, we have use the command to build the system recommended by SoC vendor. To generate our own build script, it is necessary to know build architecture.
>>>
Many times we have to customize device tree in linux kernel drivers. So we are in the situation to add new LGE features in the makefile.
>>>
Now that we have obtained the BSP source code and build the source code successfully. What is the next step for source bring-up?
>>>
Last thing to do is to check flash and debugging tools. SoC provides flash tools(e.g: QDLoader), which makes it possible for us to flash several images. Several images include kernel boot image and system image and bootloader image and so one.
>>>
Sometimes, SoC vendor give us document which describe the how to use flash tool. If you do not have such document, you should contact SoC vendor to get information how to use flash tool.
>>>
Until so far, I have talked about the source bring-up process.
If you are developing re-used platform, which means this project has already developed in the headquarter, you might skill this step.
>>>
But if you are using the new chipset project, you have to go through these steps.
>>>>>>>>>>>>>>>>>>> NEXT PAGE
The next steps for bring-up the target device is troubleshooting.
Hardware engineers will bring new target device to your place. Now it is the time to bring-up the target device. What the first thing to do?
>>>
First, you have to flash images with target device using flash tools provided by SoC vendor. If downloading is complete, we have to connect device with UART console to monitor what is going on inside the target device.
And then you will give power to your target device where the chargers cable should be connected with target device. And then you will observe that UART console display booting message to display activity inside the system.
>>>>>>>>>>>>>>>>> NEXT PAGE
As for bring-up task, there are several things I would like to talk about. Before we start bring-up task, we have to fully prepare for it. No matter how we make an effort to prepare 100% for bring-up task, we can go through unexpected issue that we did not observe before.
>>>
Take for example. Crash might occur in the bootloader. In this case, we might be in the situation to use Trace32 on-circuit debugger to debug bootloader.
>>>
I think it is too late to learn how to use Trace32 on-circuit debugger when we are about to troubleshoot into booting issue. We have to know the way to use Trace32 debugger before we start bring-up task.
>>>
Another thing we have to do in advance is to understand the boot sequence 100%. We can identify what is wrong in the system during the boot-up progress.
Second thing we keep in mind is that we might be under much stress during bring-up task. If the some issues occurs during bring-up task such as crash, lock-up, the project manager will come to your place to know what happends in your system.
>>>
If you cannot explain the symtom properly, project manager is frustrated. Sometimes they give you a lot of stress. Because the delay of bring-up task will have impact on the project schedule. The reason is that middleware software engineer or application engineer cannot even start their work.
As for me, I overworked more than 10 days until the bring-up task is completed successfully. If you realize that you do not even understand the issue like crash, lock-up, you might be under more stress.
>>>
In this case, you cannot even start analyzing the issue. If you go through this situation, please report the situation to the project manager exactly. The most important thing is clear commuincation.
Project manager might bring another software engineers, making them get involved with this issue. And project manager can do more to get support from another department, which include HQ department or SoC vendor.
>>>
If you do not tell the project manager what happens in the project, it might cause disaster. Project manager might have plan B for the project.
Thanks for watchding this. I hope to see you in the next material.
Bootloader
What is bootloader? If we refer to wikipedia, we can easily find out definition of bootloader.
>>>
A bootloader, also spelled as boot loader[1][2] or called boot manager[2] and bootstrap loader, is a computer program that is responsible for booting a computer.
>>>
If we push power botton in your system, we can give a power to system.
In this case, the first software starting to execute is booloader.
>>>
What is the purpose of bootloader? The role of Bootloader can be categoried into two parts;
>>>
First, bootloader is designed to initialize the basic hardware unit such as DDR, eMMC and several peripherals. Many of driver initialization routines are implemented in the bootloader.
>>>
Second, bootloader is loading RTOS kernel image like Linux kernel from storage device, and then make it running in the system.
>>>
As you know, the main responsibilty of BSP engineer is bringing-up the target device, so we have to know bootloader in more detail. Since bootloader is the first executed whenever we give power to system.
>>>
Here, I would like to talk about types of bootloader. First type of bootloader is on-chip bootloader and second one is programmable bootloader.
>>>
Now let's explore about the on-chip bootloader. First type of bootloader is on-chip bootloader which is already flashed in the chipset. Since chipset vendor flashed on-chip bootloader we do not need to reserch on on-chip bootloader.
>>>
The on-chip booloader is reading 1st programmable bootloader image from storage device, and makes it execute out of reset. What is the main role of on-chip bootloader?
>>>
If the amount of power is not supplied to the storage device, or storage device like emmc or UFS is not working fine, on-chip bootloader are not able to load 1st programmable bootloader from emmc or UFS.
>>>
Sometimes the image of programmerable image is not flashed properly. In this case, on-chip bootloader cannot load 1st programmerable bootloader image from storage device. This will cause no booting issue.
>>>
Please be aware of this when you are troubleshooting into no-booting issue.
Well-known on-chip bootloader is pbl, which stands for primary bootloader designed by Qualcomm. The pbl is on-chip bootloader of Qualcomm Soc. If you refer to the boot sequence document provided by QCOM, we can have futher information
>>>
Now we have introduced the on-chip bootloader, let's move into the second types of bootlader which is 1st programable bootloader.
>>>
1st programmable bootloader is the bootloader we can customize. SBL and XBL is name of 1st programmable bootloader of Qualcomm. preloader is the 1st programmable bootloader of Qualcomm.
When we attempt to debug bootloader, which tools we have to use? The most common tool to debug bootloader is UART console. If you connect device with UART console after configuring UART baut-rate in proper way, it is easy for us to know the activity of bootloader.
>>>>>>>>>>>>>>>> NEXT PAGE
Now let me tell you several subjects we have to learn to improve bring-up ability.
>>>
First, we have to understand boot architecture which indicates how the system boots when we give power to the system. Boot architecture is called boot sequence which has the same meaning.
>>>
Boot architecture is totally dependent SoC vendor. That is, boot architecture of Qualcomm is different from that of mediatek. The boot architecture of nVidia SoC is different. Since boot architecture is designed by SoC, we encourage you to contact SoC vendor to get document.
>>>
Second, we have to understand build archiecture. Many times we customize the build script and makefile by adding LGE configuration or feature. For this, understanding of build architecture is necessary.
Third, we should learn about physical partition which is designed by SoC. Sometimes we are in the situation to modify partition table according to product specification
Fouth, it is essential to know how to use download tools. Sometimes SoC provides emergency download tool.
The last thing to do is to learn how to debug the system. During bring-up task, we might go through crash or freeze issues. In this case, it is necessary for us to monitor what is exeucted inside the system.
Thanks for watchding this. I hope to see you in the next material.
UART serial console
Most simple and common way to connect almost any embedded device is serial UART port. All of embedded target device provides this port. That means you can find initialization code for UART port in bootloader and RTOS or kernel.
You can see message from UART port which is called UART console.
The Uart serial console is the most simple and most common way to debug the target device because it is simple to configure.
>>>
First you should open the console application like putty to configure proper baut rate for UART serial console.
>>>
You can connect the target device with UART cable. That is the only thing you have to do to perform UART console debugging. You will see necessary message from UART console.
>>>
There are 3 things I would like to share when we use UART console.
>>>
First, make sure that UART initilization code is working fine with correct baut rate. If you cannot see message uing UART console from bootloader or linux kernel, please review your bootloader or kernel device tree code to make sure UART initialization code is working fine.
>>>
Normally, UART console driver is enabled by default. But in some cases, it is not enabled. And! make sure you configure correct baut rate for UART console. For more information, you can contact headquarter BSP engineers or SoC vendor software engineer.
>>>
Second, during development stage you might enable UART serial console. But make sure to disable UART console routine in bootloader and linux kernel device tree during production stage of development. If UART console is enable with release software package.
>>>
Third, UART console might result in performance overhead. If you notice any decrease in performance, you may disable the UART console. The flooding of UART log might cause unexpected issue like watchdog reset or UI freeze.
>>>>>>>>>>>>>>>>>>>>>>>>>>> NEXT PAGE
Now I will introduce the good tip for UART serial console. When you bringup linux kerne driver, you might go through a hang early in the kernel boot progress. In this case, you cannot see any message kernel boot log from the device.
>>>
In this case, you can enable early-con property of device tree in your kernel code to initialize console driver at the earliest possbiel.
>>>
With this property enabled, you might be able to take a look at kernel booting from UART console.
Please refer to patch listed as below. As you can see, early-con is modified to enable ealry console feature.
I hope this video will help you to debugging target device.
printk
Every BSP engineers know printk which is the fundamental debug feature in Linux device driver and Linux kernel. printk() is one of the most widely known functions in the Linux kernel.
>>>
It’s the most basic way of tracing and debugging. printk() is considered to be kernel print function. printk() behaves almost the same as the C library printf() function.
>>>
As name implies, printk() is simply the print function. So printk() function
can be called from just about anywhere in the kernel at any time. We can use printk in the process context or interrupt context in any code.
>>>
I will show you the sample code of printk(). As you can see printk() function is called inside init_module() function. If init_module() is called, ""Hello world 1." message will be printed in the kernel log.
>>>>>>>>>>>>>>>> next page
Compare to printf, printk provides several features. The first thing is log level. We can specify the log-level of printk(). Once we specify a loglevel of printk, kernel display messages depending on loglevel. In certain case, kernel show less massage with low loglevel
>>>
There are 8 log levels for printk as you can see in the header file. The log level 0 is the emergence log level and log level 7 is the debug log level.
>>>
With 7 log level, you can see more messages than log level 6, 5, 4, 3, 2, 1, 0 log level when printk is called. The default log level of 4 which stands for KERN_WARNING.
>>>
When you bringup device, please make sure to specify 7 log level, which stands for KERN_DEBUG. Because you need as many message as possible for bringup.
>>>>>>
To find out log level of your system, you can simply cat the /proc/sys/kernel/printk using below command.
>>>
The result shows the current, default, minimum and boot-time-default log levels.
>>>
If you would like to change the current log level, you can simply write the desired level to /proc/sys/kernel/printk using echo. For example, you can use below command to change log level as 5.
>>>
There is another way to find out log-level. That is 'dmesg' command. If you use dmesg with option as below you can change log level as 5.
>>>>>>>>>>>>>>>>> NEXT PAGE
If you configure default log level during kernel boot up, you have to modify device tree. If you refer to below code, you can see that "loglevel=8" is specified in bootargs node.
>>>
Be aware that when you bringup device, please set the highest console log-level using patch set.
>>>>>>>>>>>>>>>>> NEXT PAGE
And then let's explore about the buffer size increase of console buffer. When printk() is called, the kernel keeps its messages from printk() in a circular buffer.
But The circular buffer is configurable using CONFIG_LOG_BUF_SHIFT.
By default CONFIG_LOG_BUF_SHIFT is configured as 14. If you change this config as 17, the log buffer is increased accordingly.
We can specify kernel log buffer using kernel config which is CONFIG_LOG_BUF_SHIFT. By default CONFIG_LOG_BUF_SHIFT is configured as 14. If you change this config as 17, the log buffer is increased accordingly.
printk: several tips
This time, I'll show you how to use printk to see symbol information of function. If you specify %pS in kernel message, %pS will return the function address as a symbol.
>>>
As you can see, '+' shows the code is added from the original code.
The following are a list of the function argument on lines 09-10 as follows:
>>>
__func__ indicates the name of the function that is currently being executed. __func__ is provided by compiler.
>>>
__LINE__ macro indicates line of code that is currently being executed. __builtin_return_address(0) is macro function which tells us the address of caller function.
>>>
All of these macros are provided by the GCC compiler. This code indicates where the kernel code is running and caller function address.
>>>
Look at the leftmost code in line 09 and you'll see %pS. %pS converts the address specified by the argument to a symbol and outputs it.
Printk: warning
Even though printk() is common debug feature, there is one thing we should be carful. That is,... you must check how often printk() function is called.
>>>
Let's assume that you add printk() to __schedule() function. You may expect to see message from kernel log when a call to __schedule() function is made.
>>>
In this case __schedule() function is called so often, which means printk() function you added will be called, too. As a result, this may freeze or lock-up symptom.
>>>
What is the reason for this? Because printk() is called, it output to the console. This may take several milliseconds per write.
>>>
If the printk() is called 100 times per second, the console driver is busy with handling message of printk. And several background thread will handle this.
>>>
To sum up what I explain, schedule() is called faster than console driver flushs the print buffer to the kernel log.
>>>
If you want to debug a high volume area such as the timer interrupt, the scheduler, or the network, please avoid using printk(). In such cases, I recommend ftrace log.
>>>
I would suggest using another method of debugging such as ftrace.
>>>
And kernel provides another API called printk_ratelimit() function.
By default, the printk_ratelimit() function allows only one message every 5 seconds.
>>>
When console is filled with lots of messages, printk_ratelimit() returns a failure status and then output should be avoided.
>>>
If you refer to relevant commit, you will find printk_ratelimited() is added.
dump_stack()
In this video, let me introduce another kernel debug feaure, called dump_stack() function.
>>>
When you use printk() function in the kernel code, you might want to see stack trace from the kernel log when a particular function is called. In this situation, you can use dump_stack() function
>>>
It is simple to use dump_stack() function, which is simiar to printk(). If you just add dump_stack() function to the code, you can see the stack trace in the kernel log. When you use dump_stack() function, please add "linux/kernel.h "header file at the top of the code
>>>
Let's take a look at the definition of the dump_stack() function.
asmlinkage __visible void dump_stack(void);
>>>
Both the argument and return value types are void. Which means, you can simply place the dump_stack() function anywhere in the kernel source code.
>>>
This time, let me show you how to use dump_stack() function. Please refer to the patch code.
>>>
As you can see, dump_stack() function is added inside rpi_kernel_debug_stat_set(). If rpi_kernel_debug_stat_set() is called and dump_stack() will be executed.
>>>
After you apply the patch code above, then build and install the kernel image. And then system boots again, you can use below command. Using this command, the call to rpi_kernel_debug_stat_set() is made.
>>>
And then if you take a look at kernel log using dmesg, you will find that stack trace is printed in the kernel log.
>>>
In many kernel drivers, dump_stack() function is called inside exceptional case routine. Take for example.
>>>
Inside scsi_queue_work() function, you can see dump_stack() function. dump_stack() function is mainly used as way to warn the BSP engineers that something wrong happens in the driver.
>>>
Ftrace: Introduction
Now, let's talk about ftrace. When I was starting to work as BSP engineer, most of the time I just looked at kernel log for debugging. Sometimes I looked into the ramdump when crash occurs in device driver.
>>>
Later I was engaged in more complex issues like crashes or performance issues, I realize that kernel log and ramdump is not enough to debug difficult issues.
>>>
At the time I wanted the debug feature which provides more detailed information inside kernel. Finally I found a right debug feature that I was looking for. that is ftrace.
>>>
Ftrace is a debugging tool for understanding what is going on inside the Linux kernel. And it is most popular tracing tools for kernel developers as well as BSP engineers.
>>>
Talking with brilliant kernel developer, many of them say how great ftrace is. Here let me introduce ftrace with one sentense. Ftrace is a great way to learn more about the internal workings of the Linux kernel.
>>>
If you refer to the link, you are going to find more information about ftrace. So what can we do with ftrace?
With ftrace, there a number of ways to monitor linux kernel since ftrace tracing utility has many different features that will assist in tracking down Linux kernel problems. Let me introduce 3 methods to use ftrace
>>>
First, we are able to read debugging information about execution of linux kernel system, including interrupt handling, scheduling, workqueue, system call and signal.
>>>
Once you understand the meaning of ftrace message, I can say that you are familar with linux kernel. Second, you can view the stack trace without modifying kernel code.
>>>
Third, every ftrace message give us fundamental information like context, CPU core number and so on. From ftrace message we can deduce many things.
>>>
As I told you, ftrace is powerfui feature because we can see stack trace without modifying kernel code. Only thing you have to do is just configuring ftrace option before starting ftrace.
>>>
Another strong point of ftrace is that ftrace especially for nop tracer results less overhead compared to printk and dump_stack. Because ftrace is stored in ring-buffer as binary format.
>>>
Please be noted that some of the tracers have a noticeable overhead when the tracer is configured into the kernel. Take for example. Let's assume that you are using fuction tracer.
>>>
If all of the functions are set to set_available_function, it might can cause overhead. But if you set particular function for function tracer, it will not cause huge overhead.
>>>
Ftrace is a very powerful tool and easy to configure. No extra tools are necessary. So as for kernel developer, ftrace become the main tools for debugging the Linux kernel.
>>>
But there is challenge for software engineer when using ftrace. The first thing challenge is that it takes time to understand ftrace message.
>>>
Since ftrace is based on text-format, it is hard to understand meaning of ftrace
Ftrace: How to set up ftrace
To enable several ftrace feature, ftrace is located in the Debugfs file system. Typically, that is mounted at /sys/kernel/debug. When Ftrace is configured, it will create its own directory called tracing within the Debugfs file system.
At the last session, the ftrace was introduced. Now that you listened to last material you might realize that ftrace is powerful tool. So now it is time to use ftrace.
>>>
First let me tell you how to turn on ftrace. Please be aware that the file tracing_on is used to enable or disable the ring buffer from recording data.
>>>
When tracing start, ftrace ring buffer will record information. For this, we should write 1 into 'tracing_on'. You can refer to first command.
>>>
And if you run the second command, this will disable the Ftrace ring buffer from recording, meaning that we can turn off tracing.
>>>
If you may want to trace what is happening when you run a specific test under specific scenario, please write 1 or 0 into 'tracing_on'. Please note that 'tracing_on' is set to 0 by default.
>>>
ftrace provides two functions that work well inside the kernel. That is tracing_on() and tracing_off(). These two act just like echoing "1" or "0" respectively into the tracing_on file.
>>>
If you want to start tracing, you can call tracing_on() in any kernel function. This acts like using echo 1 into tracing_on file.
>>>
In certain condition, tracing will stop when the call to tracing_off() is made. The tracer may be stopped by adding tracing_off().
>>>
Now let's check the available tracer. To find out which tracers are available, simply cat the available_tracers file in the tracing directory. The output show that function_graph, function and nop are available_tracers.
>>>
To enable the function tracer, just echo "function" into the current_tracer file. You can specify tracer among available_tracers using echo.
>>>
If you move to events directory, you can find the list of event. The each directory show the subsystem of linux kernel.
>>>>
Now let's talk about how to enable event in scheduler subsystem. After you move to 'events/sched', 21 events related to scheduler subsystem are shown.
>>>
I can say that all of them except enable and filter are ftrace events such as sched_switch, sched_wakeup and sched_process_fork.
>>>
And how do we do to enable event? If you want to enable sched_wakeup event, you can use below command.
echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
>>>
Sometimes, you do not want to enabel sched_wakeup. In this case, you can enable sched_wakeup event using below command.
echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
If you want to enable all events in scheduler subsystem, please use below command.
>>>
Let's move on to the next page.
>>>
There are a number of ftrace events. Among these events, sched ftrace events are most widely used such as sched_switch and sched_wakeup. I strongly encourage you to enable these events when using ftrace.
>>>
Another widely used ftrace event is interrupt handling. Please be aware that irq_handler_entry and irq_handler_exit event are enabled when the kernel developer are debugging kernel with ftrace.
>>>
Since it is important to know which interrupt has been triggered in the specific scenarios.
>>>
The most important thing when you use ftrace is to understand message of ftrace. Here, I will let you know the way to read ftrace message.
>>>
As figure indicates, every ftrace message contains the same pattern.
First, you will notice the name of process. In this figure, chromium-browse is the name of process. And 1436 shows the PID of process.
I can say that chromium-browse process with 1436 PID is executed when this message is printed.
>>>
And then you can see CPU which tells us the logical CPU core number. In this figure, CPU core 2 was running at this point.
>>>
Also the timestamp is displayed as 9445.131875 where this is identical to timestamp of kernel log.
>>>
You can see sched_switch which is the name of event. sched_switch shows the activity of task scheduling. With this message, chromium-browse is scheduled as kworker/2:3 process.
>>>
This means that new process that will start executing is kworker/2:3 process
Another information that every ftrace message shows is context signature. The left most field indicates the local interrupt activation.
>>>>>>>>>>more
Here let me show the sample ftrace message. sched_wakeup, sched_stat_runtime and sched_switch events are shown in the ftrace message.
>>>
Each message shows the their own actvitity. Let me show how to interprete 3 ftrace events.
>>>
Line 1 shows sched_wakeup events. This event show that the specified process wakes up. That means kworker/1:3 has been waked up.
>>>
Line 2 shows sched_stat_runtime events. This event indicate process status with vruntime, runtime. Please note that vruntime and runtime is important parameter for scheduling.
Ftrace: function tracer
Whenever you want to see the message in the kernel log, you can use printk() at any kernel code. Ftrace introduces a new form of printk() called trace_printk().
>>>
If you are running the function tracer, I am sure that you are going to see detailed information that you have not seen before. From now on, let me introduce the way to use function tracer.
>>>
First thing you have to keep in mind when using function trace is to limit functions you see. That means if you do not limit functions using function tracer, ftrace will trace the debugging information of every kernel function.
>>>
Ftrace provides a way to limit what functions you see. set_ftrace_filter exist that let you limit what functions are traced:
>>>
When any function is listed in the set_ftrace_filter, only those functions will be traced. This will help the performance of the system when the trace is active since tracing every function will cause a large overhead.
>>>
But if you specify the particular function using the set_ftrace_filter, only those functions you specified will be traced.
>>>
ftrace: trace_printk() is your friend
You can use trace_printk() just like printk(). And you can use trace_printk() in any context (interrupt context, soft irq context, and scheduler code). What is nice about trace_printk() is that it does not output to the console. So it results in less overhead.
>>>
Please be aware that writing into the ring buffer with trace_printk() only takes around a tenth of a microsecond or so. If you use printk(), it may take several milliseconds per write.
>>>
We can take advantage of trace_printk() in terms of performance. That is main reason why we can record the most sensitive areas of the kernel, including interrupt context, scheduler core and network subsysten.
DebugFS
The Kernel provides debugfs interface to debug the kernel, which enables logs at runtime. It is useful for device drivers that have many logs because enabling logs statically makes the system slow.
Hence, kernel provides debugfs to enable kernel logs only for the time when users want to check a particular scenario.
There are two things we have to do when you want to use debugfs. First, CONFIG_DEBUG_FS configuration should be present in the kernel config. This will make compiler build necessary code related to debugfs. After kernel image is generated with this confinguration, please make sure to flash kernel image into target device.
Next, you should run the following command to mount debugfs:
After the debugfs is successfully mouted, you're going to see the following directory.
Let me introduce the sample debugfs sample code. This is a very simple code block, as you can see. debugfs_create_file() function is called inside initialization function of device driver.
And then after this code is compiled successfully. And then you going to find out that this file is generated under /sys/kernel/debug directory. If you would like to change the value of the raspbian_debug_state which is declared inside the sample device driver you're going to run the following command at runtime.
If you use below command, you are going to see it will be updated as 1,000.
Let's assume that you are analyzing rpi_kernel_debug_stat_set() function defined in the sample debugfs function. In this case, you add dump_stack() inside rpi_kernel_debug_stat_set().
dump_stack()
One of the useful options in debugging is to print the call stack trace. Linux kernel provides dump_stack() function to print the stack trace. Calling dump_stack() function will print the stack trace at that point. If you add dump_stack() function, please make sure to add header file
But dump_stack() function is used as way to notify some problem inside subsystem. Which means that there is logical error in the kernel driver or kernel subsystem. Please refer to the sample code as below.
As you can see, dump_stack() function is called inside exceptional case routine at scsi_queue_work. This aims to notify kernel developer that something wrong happens in the driver.
>>>>
>>>>>>
「이파란님」
* 4:54 ~ 18:18: 리눅스 커널을 배우는 이유
* 18:18 ~ 34:15: QEMU를 사용해 Trusted Firmware 로딩하기
* 49:40 ~ 34:15: QEMU로 GDB 로딩하기 + 디버그 피쳐
「김동현」
* 4:54 ~ 18:18: 본인 소개
* 18:18 ~ 34:15: 리눅스 커널을 배워야 하는 이유 + 배우면 좋은 관련 SW Stack
* 49:40 ~ 34:15: 시스템 Software 관점으로 Troubleshooting
>>>>>>>>>>>>
Many of BSP engineers cannot go home due to endless crash issues.
I can see that right now several BSP engineers are fingerpointing to each other saying
"This is not my issue".
I hope this cmm script will give a hand for all BSP engineer
debugging ramdump with TRACE32 in the office.
* How to run script
1. Run cmm script in the TRACE32 command prompt.
$ do backtrace_all_process.cmm
2. When popup appears, select the directory
where ramdump and vimlinux are located.
3. After the execution of cmm script is complete, backtrace_callstacks_all_process.txt is
generated with call stack.
Happy Debugging!
BR,
Austin Kim
>>>>
Ftrace
Now, let's talk about ftrace. When I was starting to work as BSP engineer, most of the time I just looked at kernel log for debugging. Sometimes I looked into the ramdump when crash occurs in device driver.
>>>
Later I was engaged in more complex issues like crashes or performance issues, I realize that kernel log and ramdump is not enough to debug difficult issues.
>>>
At the time I wanted the debug feature which provides more detailed information inside kernel. Finally I found a right debug feature that I was looking for. that is ftrace.
>>>
Ftrace is a debugging tool for understanding what is going on inside the Linux kernel. And it is most popular tracing tools for kernel developers as well as BSP engineers.
>>>
Talking with brilliant kernel developer, many of them say how great ftrace is. Here let me introduce ftrace with one sentense. Ftrace is a great way to learn more about the internal workings of the Linux kernel.
>>>
If you refer to the link, you are going to find more information about ftrace. So what can we do with ftrace?
With ftrace, there a number of ways to monitor linux kernel since ftrace tracing utility has many different features that will assist in tracking down Linux kernel problems. Let me introduce 3 method to use ftrace
>>>
First, we are able to read debugging information about execution of linux kernel system, including interrupt handling, scheduling, workqueue, system call and signal.
>>>
Once you understand the meaning of ftrace message, I can say that you are familar with linux kernel. Second, you can view the stack trace without modifying kernel code.
>>>
Third, every ftrace message give us fundamental information like context, CPU core number and so on. From ftrace message we can deduce many things.
>>>
As I told you, ftrace is powerfui feature because we can see stack trace without modifying kernel code. Only thing you have to do is just configuring ftrace option before starting ftrace.
>>>
Another strong point of ftrace is that ftrace especially for nop tracer results less overhead compared to printk and dump_stack. Because ftrace is stored in ring-buffer as binary format.
>>>
Please be noted that some of the tracers have a noticeable overhead when the tracer is configured into the kernel. Take for example. Let's assume that you are using fuction tracer.
>>>
If all of the functions are set to set_available_function, it might can cause overhead. But if you set particular function for function tracer, it will not cause huge overhead.
>>>
Ftrace is a very powerful tool and easy to configure. No extra tools are necessary. So as for kernel developer, ftrace become the main tools for debugging the Linux kernel.
>>>
But there is challenge for software engineer when using ftrace. The first thing challenge is that it takes time to understand ftrace message.
>>>
Since ftrace is based on text-format, it is hard to understand meaning of ftrace
>>>>
'[Debugging] Tips' 카테고리의 다른 글
[리눅스커널] 디버깅: TRACE32: 모듈 타입 드라이버 심벌(*.ko)을 로딩해 깨진 콜 스택 복원 (0) | 2023.05.04 |
---|---|
[라즈베리파이] crash-utility 설치(Arm64) (0) | 2023.05.04 |
[Crash-Utility] 램덤프를 로딩할 때 심볼을 읽는 함수: symbol_exists, readmem (0) | 2023.05.04 |
[Crash-Utility] 램덤프 로딩 시 파라미터(CONFIG_PROC_KCORE) (0) | 2023.05.04 |
[TRACE32] T32: vmcore 파일을 TRACE32 시뮬레이터로 올리기 (0) | 2023.05.04 |