We tend to use fluentd (td-agent) to keep the access log of the service persistently. Fluentd is one of the most reliable and flexible middleware to collect various kinds of application logging. We can quickly broaden the target of logging by using many types of plugins maintained by the community.
As is often the case with the middleware libraries, we have found unexpected behavior of fluentd due to our lack of full knowledge of the library. Our Rails application often loses the application log!
This article will illustrate why that problem happens and caveats when we use fluent-logger in Rails application.
Problem
Our Rails application has a controller that collects its access log.
require 'fluent-logger'
class OurController < ApplicationController
before_action :setup_fluent_logger
before_action :access_log
def setup_fluent_logger
@log = Fluent::Logger::FluentLogger.new(
nil,
host: 'localhost',
port: 24_224
)
end
def access_log
record = ...
@log.post("test.access", record)
end
end
Initially, I thought recreating the fluent-logger instance whenever it gets access is safe to dump the log. But that’s not true at the end of the day. This controller fails to post logs when it gets many accesses from clients.
Cause
To detect the cause, I have tried to write a script to reproduce the issue.
require 'singleton'
require 'fluent-logger'
class OurController
include Singleton
def initialize
@counter = 0
setup_logger
end
def setup_logger
@logger = Fluent::Logger::FluentLogger.new(
nil,
host: 'localhost',
port: 24_224
)
end
def post(v)
@logger.post('debug.test', {
a: @counter,
v: v
})
@counter += 1
end
end
puts "Start..."
def post(i)
controller = OurController.instance
controller.setup_logger
controller.post(i)
end
thread_num = 256
# Multi Thread
threads = (0..thread_num).map do |i|
Thread.new {
post(i)
}
end
threads.each do |t|
puts "Waiting for #{t}"
t.join
end
When I ran the program with thread_num = 256
, it fails to dump the log due to connection failure to the local td-agent, but it did not with thread_num = 200
. There is an implicit threshold between 200 and 256, I thought. Here is what I have found.
➜ launchctl limit
cpu unlimited unlimited
filesize unlimited unlimited
data unlimited unlimited
stack 8388608 67104768
core 0 unlimited
rss unlimited unlimited
memlock unlimited unlimited
maxproc 2784 4176
maxfiles 256 unlimited
It hit the max number of file descriptors in the system. The Too many open files
error was thrown by td-agent around the time.
2020-11-11 12:06:17 +0900 [warn]: #0 [forward_input] thread exited by unexpected error plugin=Fluent::Plugin::ForwardInput title=:event_loop error_class=Errno::EMFILE error="Too many open files - accept(2)"
#<Thread:0x00007fea449aa020@event_loop@/Users/sasaki/.rbenv/versions/2.5.4/lib/ruby/gems/2.5.0/gems/fluentd-1.11.5/lib/fluent/plugin_helper/thread.rb:70 run> terminated with exception (report_on_exception is true):
...
/Users/sasaki/.rbenv/versions/2.5.4/lib/ruby/2.5.0/socket.rb:1313:in `__accept_nonblock': Too many open files - accept(2) (Errno::EMFILE)
Initializing the fluent-logger object opens a new file descriptor, which caused the Too many open files
error. It looks like an event library Cool.io creates a file descriptor when it starts the event loop.
def start
super
# event loop does not run here, so mutex lock is not required
thread_create :event_loop do
begin
default_watcher = DefaultWatcher.new
event_loop_attach(default_watcher)
@_event_loop_running = true
@_event_loop.run(@_event_loop_run_timeout) # this method blocks
ensure
@_event_loop_running = false
end
end
end
Since we cannot have any privilege to modify the Cool.io codebase quickly, what we can do to the maximum is avoiding the recreation of fluent-logger instance. In this case, the setup_fluent_logger
method looks as follows. Note that we use ||=
to initialize @log
instead of =
assignment.
class OurController < ApplicationController
def setup_fluent_logger
@log ||= Fluent::Logger::FluentLogger.new(
nil,
host: 'localhost',
port: 24_224
)
end
end
In general, we cannot guarantee the order of the call of the method of before_action
. We should have made @log
thread-safe. I would stress that that problem does not devote to fluentd or Rails itself. Our practice of programming causes that. I have gotten another lesson here.