Difference between revisions of "Unicode and Locale"
m (moved Unicode to Unicode and Locale) |
|||
Line 19: | Line 19: | ||
</pre> | </pre> | ||
− | == Fix | + | == Fix the sorting order by setting LC_COLLATE=C == |
− | Some shell commands such as `ls` use a very annoying sort order when LANG=en_US.UTF-8. You can change just the collate order without changing all the other ways the locale could be used. For example, run the following commands and notice the difference. The '''C''' collate order is what people might remember and love from the old ASCII (ANSI_X3.4-1968) days. | + | If you make one change to your default locale it should probably be this one. |
+ | |||
+ | Some shell commands such as `ls -la` use a very annoying sort order when LANG=en_US.UTF-8. You can change just the collate order without changing all the other ways the locale could be used. For example, run the following commands and notice the difference. The '''C''' collate order is what some people might remember and love from the old ASCII (ANSI_X3.4-1968) days. Basically everything is sorted by the 8-bit ASCII value of each character. | ||
<pre> | <pre> | ||
Line 31: | Line 33: | ||
<pre> | <pre> | ||
− | export LC_COLLATE= | + | export LC_COLLATE=C |
</pre> | </pre> |
Revision as of 20:10, 21 April 2014
Unicode and the shell
When you start Bash usually your LANG environment variable is set for you. The setting usually originates from the /etc/environment file.
Run the `locale` command and notice that it first prints LANG and then a bunch of LC_ variables. The LC_ variables may or may not be set in your environment. If any of them is not set then it automatically takes on the same value as the LANG variable.
locale
Set locale for just a single command
Many commands behave differently depending on the locale. For example, `grep` will interpret range expressions like [a-z] differently depending on the locale. This can cause problems with regular expressions. Generally, most system administration scripts will prefer the C locale.
LANG=C grep 'Search Text' filename
Fix the sorting order by setting LC_COLLATE=C
If you make one change to your default locale it should probably be this one.
Some shell commands such as `ls -la` use a very annoying sort order when LANG=en_US.UTF-8. You can change just the collate order without changing all the other ways the locale could be used. For example, run the following commands and notice the difference. The C collate order is what some people might remember and love from the old ASCII (ANSI_X3.4-1968) days. Basically everything is sorted by the 8-bit ASCII value of each character.
LC_COLLATE="C" ls -la ~ LC_COLLATE="en_US.UTF-8" ls -la ~
You can set just collate permanently by putting this in your ~/.bashrc:
export LC_COLLATE=C