Symbol vs. String in Ruby
For the longest time, I've postponed digging deeper to find the differences.
I would tell myself I'd eventually find the time to look into it, but there were always other things that felt more important than figuring out this seemingly minute detail.
But as I kept pounding the keyboard, I'd eventually run into a crossroad.
Should I use a string here? Or a symbol?
I can't tell you how many times I've just meh-ed over that thought.
But that feeling of uncertainty ends now. If you're curious, as I was, about which one to choose, or why there are both available, or which one is better, then keep reading as I will explain everything you need to know.
Why does Ruby have both symbols and strings
Even though in the day-to-day symbols and strings are used interchangeably, and mostly because developers don't know the difference, they should not be.
Symbols were made to be used as identifiers (i.e., variables, method names, constant names). In contrast, strings were made to be used as data. And they were both optimized for their respective goals.
If you think about it, most of the data that comes into your application via an HTML form, or a JSON response, or what have you, will be mostly strings.
So, the fact that strings were built to be used as data makes a lot of sense.
You can't get a symbol from the outside world. You can only get a symbol if you convert a string to a symbol.
What is the difference between a symbol and a string?
A string, in Ruby, is a mutable series of characters or bytes.
Symbols, on the other hand, are immutable values. Just like the integer 2
is a value.
Mutability is the ability for an object to change.
In the case of a string, you can add to, or remove from the string. And thus immutable means once you create it, it can never be changed.
Because symbols are immutable, Ruby doesn't have to allocate more memory for the same symbol. That is because it knows that once it put the value in memory, it will never be changed, so it can reuse it.
You can easily see this by looking at their object IDs.
# Symbols (same id)
:foo_bar.object_id # => 2386588
:foo_bar.object_id # => 2386588
# Strings (different ids)
"foo_bar".object_id # => 1020
"foo_bar".object_id # => 1040
Which one is faster?
Obviously, since symbols are immutable values, working with symbols requires less memory, and it's just easier for computers to work with literal values than it is to work with complex objects. So, lookups for symbols are faster as well.
So… the answer is symbols. Right?
Well, there's one more thing to take into consideration.
And that is…
Frozen strings
Starting with Ruby version 2.1, you can freeze strings. And that means Ruby doesn't need to allocate new memory when you create a second identical string. It just reuses the first one.
"foo_bar".freeze.object_id # => 1060
"foo_bar".freeze.object_id # => 1060
"foo_bar".object_id # => 1100
"foo_bar".object_id # => 1120
But with Ruby 2.3, you also have the option to use a magic comment that you can put at the top of your file, which will freeze all the string literals in that file.
# frozen_string_literal: true
puts "foo".object_id
puts "foo".object_id
If you put the code above in a file and execute it with ruby ./yourfile.rb
, you will see the same number printed twice. This means the memory address of both strings (technically, it's just one reused) is the same even though you're not calling .freeze
on them.
So, that obviously helps with memory allocations, but what about lookup speed? Is there a difference between the two when it comes to identifier lookup?
So let's look at a few benchmarks (originally posted in this Gist, but I will copy them here for convenience) and re-run them with Ruby 3.0.
Accessing a hash
require 'benchmark/ips'
FOO = "foo".freeze
HASH = {"foo" => "bar"}
Benchmark.ips do |x|
x.report("constant") { HASH[FOO] }
x.report("regular") { HASH["foo"]}
end
constant 15.669M (± 3.5%) i/s - 79.519M in 5.082575s
regular 15.882M (± 1.9%) i/s - 80.402M in 5.064430s
Making a hash
require 'benchmark/ips'
FOO = "foo".freeze
Benchmark.ips do |x|
x.report("constant") { {FOO => "bar"} }
x.report("regular") { {"foo" => "bar"} }
end
constant 8.123M (± 1.2%) i/s - 41.199M in 5.072646s
regular 8.189M (± 1.9%) i/s - 41.190M in 5.031931s
Comparisons
require 'benchmark/ips'
FOO = "foo".freeze
Benchmark.ips do |x|
x.report("constant") { "foo" == FOO }
x.report("regular") { "foo" == "foo" }
end
constant 12.790M (± 1.4%) i/s - 65.094M in 5.090450s
regular 9.835M (± 1.0%) i/s - 50.155M in 5.100092s
Passing in an argument to a regular method
require 'benchmark/ips'
FOO = "foo".freeze
Benchmark.ips do |x|
x.report("constant") { "foo".gsub(FOO, "") }
x.report("regular") { "foo".gsub("foo", "") }
end
constant 1.472M (± 1.3%) i/s - 7.478M in 5.079855s
regular 1.416M (± 1.8%) i/s - 7.203M in 5.086869s
As you can see from these benchmarks, the differences are negligible.
When to use symbols and when to use strings?
It's best to use them as they were intended whenever possible (i.e., use symbols when you want to create an identifier, and use strings for data).
Or to quote the late Jim Weirich…
“If the contents (the sequence of characters) of the object are important, use a string. If the identity of the object is important, use a symbol.” — Jim Weirich (Ruby Cookbook)
Why use symbols as hash keys in Ruby?
Historically, there was a performance benefit of using symbols over strings, but in recent versions of Ruby, that benefit is no longer significant.
But even today, with all the performance improvements to strings, most Ruby developers still use symbols over strings for hash keys.
They read better, and it's using them for what they were meant to be used (i.e., identifiers).
Conclusion
I hope you learned a thing or two about the difference between symbols and strings and that you are better equipped to answer the question the next time it pops into your head.