Difference between revisions of "Unicode and Locale"

From Noah.org
Jump to navigationJump to search
Line 19: Line 19:
 
</pre>
 
</pre>
  
== Fix just the sorting order with collate ==
+
== Fix the sorting order by setting LC_COLLATE=C ==
  
Some shell commands such as `ls` use a very annoying sort order when LANG=en_US.UTF-8. You can change just the collate order without changing all the other ways the locale could be used. For example, run the following commands and notice the difference. The '''C''' collate order is what people might remember and love from the old ASCII (ANSI_X3.4-1968) days.
+
If you make one change to your default locale it should probably be this one.
 +
 
 +
Some shell commands such as `ls -la` use a very annoying sort order when LANG=en_US.UTF-8. You can change just the collate order without changing all the other ways the locale could be used. For example, run the following commands and notice the difference. The '''C''' collate order is what some people might remember and love from the old ASCII (ANSI_X3.4-1968) days. Basically everything is sorted by the 8-bit ASCII value of each character.
  
 
<pre>
 
<pre>
Line 31: Line 33:
  
 
<pre>
 
<pre>
export LC_COLLATE="C"
+
export LC_COLLATE=C
 
</pre>
 
</pre>

Revision as of 20:10, 21 April 2014


Unicode and the shell

When you start Bash usually your LANG environment variable is set for you. The setting usually originates from the /etc/environment file.

Run the `locale` command and notice that it first prints LANG and then a bunch of LC_ variables. The LC_ variables may or may not be set in your environment. If any of them is not set then it automatically takes on the same value as the LANG variable.

locale

Set locale for just a single command

Many commands behave differently depending on the locale. For example, `grep` will interpret range expressions like [a-z] differently depending on the locale. This can cause problems with regular expressions. Generally, most system administration scripts will prefer the C locale.

LANG=C grep 'Search Text' filename

Fix the sorting order by setting LC_COLLATE=C

If you make one change to your default locale it should probably be this one.

Some shell commands such as `ls -la` use a very annoying sort order when LANG=en_US.UTF-8. You can change just the collate order without changing all the other ways the locale could be used. For example, run the following commands and notice the difference. The C collate order is what some people might remember and love from the old ASCII (ANSI_X3.4-1968) days. Basically everything is sorted by the 8-bit ASCII value of each character.

LC_COLLATE="C" ls -la ~
LC_COLLATE="en_US.UTF-8" ls -la ~

You can set just collate permanently by putting this in your ~/.bashrc:

export LC_COLLATE=C