Difference between revisions of "Unicode and Locale"

From Noah.org
Jump to navigationJump to search
Line 3: Line 3:
 
== Unicode and the shell ==
 
== Unicode and the shell ==
  
When you start Bash usually your LANG environment variable is set (usually this is set by something in the /etc/profile chain of rc files).
+
When you start Bash usually your LANG environment variable is set for you. The setting usually originates from the /etc/environment file.
  
 
Run the `locale` command and notice that it first prints LANG and then a bunch of LC_ variables. The LC_ variables may or may not be set in your environment. If any of them is not set then it automatically takes on the same value as LANG.
 
Run the `locale` command and notice that it first prints LANG and then a bunch of LC_ variables. The LC_ variables may or may not be set in your environment. If any of them is not set then it automatically takes on the same value as LANG.

Revision as of 19:49, 22 September 2008


Unicode and the shell

When you start Bash usually your LANG environment variable is set for you. The setting usually originates from the /etc/environment file.

Run the `locale` command and notice that it first prints LANG and then a bunch of LC_ variables. The LC_ variables may or may not be set in your environment. If any of them is not set then it automatically takes on the same value as LANG.

locale

Setting locale for just a single command

Many commands behave differently depending on the locale. For example, `grep` will interpret range expressions like [a-z] differently depending on the locale. This can cause problems with regular expressions. Generally, most system administration scripts will prefer the C locale.

LANG=C grep 'Search Text' filename

Fixing just the collate sorting order

Some shell commands such as `ls` use a very annoying sort order when LANG=en_US.UTF-8. You can change just the collate order without changing all the other ways the locale could be used. For example, run the following commands and notice the difference. The "C" collate order is what most people probably remember and loved back in the old ASCII days.

LC_COLLATE="C" ls -la ~
LC_COLLATE="en_US.UTF-8" ls -la ~

You can set just collate permanently by putting this in your ~/.bashrc:

export LC_COLLATE="C"