Difference between revisions of "Unicode and Locale"

From Noah.org
Jump to navigationJump to search
Line 11: Line 11:
 
</pre>
 
</pre>
  
== Setting locale for just a single command ==
+
== Set locale for just a single command ==
  
 
Many commands behave differently depending on the locale. For example, `grep` will interpret range expressions like [a-z] differently depending on the locale. This can cause problems with regular expressions. Generally, most system administration scripts will prefer the C locale.
 
Many commands behave differently depending on the locale. For example, `grep` will interpret range expressions like [a-z] differently depending on the locale. This can cause problems with regular expressions. Generally, most system administration scripts will prefer the C locale.
Line 19: Line 19:
 
</pre>
 
</pre>
  
== Fixing just the collate sorting order ==
+
== Fix just the sorting order with collate ==
  
Some shell commands such as `ls` use a very annoying sort order when LANG=en_US.UTF-8. You can change just the collate order without changing all the other ways the locale could be used. For example, run the following commands and notice the difference. The "C" collate order is what most people probably remember and loved back in the old ASCII days.
+
Some shell commands such as `ls` use a very annoying sort order when LANG=en_US.UTF-8. You can change just the collate order without changing all the other ways the locale could be used. For example, run the following commands and notice the difference. The "C" collate order is what people might remember and love from the old ASCII days.
  
 
<pre>
 
<pre>

Revision as of 16:04, 28 September 2008


Unicode and the shell

When you start Bash usually your LANG environment variable is set for you. The setting usually originates from the /etc/environment file.

Run the `locale` command and notice that it first prints LANG and then a bunch of LC_ variables. The LC_ variables may or may not be set in your environment. If any of them is not set then it automatically takes on the same value as LANG.

locale

Set locale for just a single command

Many commands behave differently depending on the locale. For example, `grep` will interpret range expressions like [a-z] differently depending on the locale. This can cause problems with regular expressions. Generally, most system administration scripts will prefer the C locale.

LANG=C grep 'Search Text' filename

Fix just the sorting order with collate

Some shell commands such as `ls` use a very annoying sort order when LANG=en_US.UTF-8. You can change just the collate order without changing all the other ways the locale could be used. For example, run the following commands and notice the difference. The "C" collate order is what people might remember and love from the old ASCII days.

LC_COLLATE="C" ls -la ~
LC_COLLATE="en_US.UTF-8" ls -la ~

You can set just collate permanently by putting this in your ~/.bashrc:

export LC_COLLATE="C"