【Https://www.toutiao.com/i6595365358301872643/
Preface
OutOfMemoryError Problem I believe many friends have encountered, compared to common business anomalies (array boundaries, null pointers, etc.) such problems are difficult to locate and solve.
This paper focuses on the location and solution of a recent online memory overflow problem, hoping to bring ideas and help to students who encounter similar problems.
Mainly from the performance of –> check –> locate –> solve four steps to analyze and solve problems.
Appearance
Recently, an application on our production has been constantly exploding memory overflows, and with the growth of traffic occurs more and more frequently.
The business logic of the program is very simple, which is to consume data from Kafka and do persistence operations in batches.
The phenomenon is that the more frequent the Kafka message, the faster the abnormal frequency will appear. Because there were other things to do at the time, Operations and Maintenance had to be restarted and the heap memory and GC were monitored.
It is good to restart the big law, but it still can not solve the problem at all.
Check up
So we want to try to determine where the problem is based on the memory data collected before the operation and maintenance, GC logs.
>
There are even several applications of FGC that have reached hundreds of times, and the time is terrible.
This indicates that the memory usage of the application must be problematic, and there are many useless objects that can never be recovered.
Location
Because the memory dump file on production is very large, it reaches dozens of G. It’s also because our memory settings are too large.
So it will take a lot of time to use MAT analysis.
Therefore, we would like to think about whether we can reappear locally.
In order to reappear as soon as possible, I will set the maximum heap memory in local application to 150M.
Then, in consuming Kafka, Mock continuously generates data for a while loop.
At the same time, when the application is started, VisualVM will be used to monitor the usage of memory and GC in real time.
The result was 10 minutes running and there was no problem with memory usage. As you can see from the graph, every time GC memory is generated, it can be effectively reclaimed, so there is no recurrence problem.
>
At the same time, the backstage also starts printing memory overflow, thus reappearing the problem.
Solve
From the current performance point of view, there are many objects in memory that have been strongly referenced and can not be reclaimed.
So you want to see what object is occupying so much memory, using Visual VM’s HepDump function can immediately dump out the memory situation of the current application.
>
I set the queue size to 8, write 10 pieces of data from 0 to 9, and when I write to 8, I overwrite the position of the previous zero, and so on (similar to HashMap’s positioning).
So in production, assuming our queue size is 1024, the system will eventually run with 1024 locations full of objects, and each location is 700!
So we looked at the RingBuffer configuration of Disruptor on production, and the result is: 1024*1024.
This magnitude is very frightening.
In order to verify whether this is the problem, I will try to change the value to 2 and a minimum value locally.
The same 128M memory is also continuously extracted from data through Kafka. Monitoring is as follows:
Posted on Categories DB, default