300字范文 > SQL Server根据汉字笔划和取得拼音首字母进行排序

SQL Server根据汉字笔划和取得拼音首字母进行排序

时间：2023-07-21 07:09:11

做房产网站，已经有了小区模板表，将小区的名称一律输入数据表中供登记用户进行选择。发现一个排序的问题：小区名称加载至下拉列表中时默认只是按照主键进行了升序排序，无任何规律，名称找起来非常麻烦。如何按小区名称第一个汉字的首字母进行排序？可原小区模版中并无拼音首字母字段。改数据表是不可行的，有几个系统都在使用此表，改动量巨大。于是决定从排序规则入手，参考有关资料进行解决。

Article address：/xuchangwei/

Author：XuChangwei

排序规则简介什么叫排序规则呢？ ms 是这样描述的： " 在 microsoft sql server 2000 中，字符串的物理存储由排序规则控制。排序规则指定表示每个字符的位模式以及存储和比较字符所使用的规则。 "

在查询分析器内执行下面语句，可以得到 sql server 支持的所有排序规则。

select * from ::fn_helpcollations() 排序规则名称由两部份构成，前半部份是指本排序规则所支持的字符集。如：

chinese_prc_cs_ai_ws 前半部份：指 unicode 字符集， chinese_prc_ 指针对大陆简体字 unicode 的排序规则。

排序规则的后半部份即后缀含义：

_bin 二进制排序

_ci(cs) 是否区分大小写， ci 不区分， cs 区分

_ai(as) 是否区分重音， ai 不区分， as 区分

_ki(ks) 是否区分假名类型 ,ki 不区分， ks 区分

_wi(ws) 是否区分宽度 wi 不区分， ws 区分

区分大小写 : 如果想让比较将大写字母和小写字母视为不等，请选择该选项。

区分重音 : 如果想让比较将重音和非重音字母视为不等，请选择该选项。如果选择该选项，比较还将重音不同的字母视为不等。

区分假名 : 如果想让比较将片假名和平假名日语音节视为不等，请选择该选项。

区分宽度 : 如果想让比较将半角字符和全角字符视为不等，请选择该选项

排序规则的应用

sql server 提供了大量的 windows 和 sqlserver 专用的排序规则，但它的应用往往

被开发人员所忽略。其实它在实践中大有用处。

例 1: 让表 name 列的内容按拼音排序：

create table #t(id int,name varchar(20))

insert #t select 1, 中

union all select 2, 国

union all select 3, 人

union all select 4, 阿

select * from #t order by name collate chinese_prc_cs_as_ks_ws

drop table #t

/* 结果：

id name

----------- --------------------

4 阿

2 国

3 人

1 中

例 2 ：让表 name 列的内容按姓氏笔划排序：

create table #t(id int,name varchar(20))

insert #t select 1, 三

union all select 2, 乙

union all select 3, 二

union all select 4, 一

union all select 5, 十

select * from #t order by name collate chinese_prc_stroke_cs_as_ks_ws

drop table #t

/* 结果：

id name

----------- --------------------

4 一

2 乙

3 二

5 十

1 三

在实践中排序规则应用的扩展

sql server 汉字排序规则可以按拼音、笔划等排序，那么我们如何利用这种功能

来处理汉字的一些难题呢？我现在举个例子：

用排序规则的特性计算汉字笔划

要计算汉字笔划，我们得先做准备工作，我们知道， windows 多国汉字， unicode 目前

收录汉字共 20902 个。简体 gbk 码汉字 unicode 值从 19968 开始。

首先，我们先用 sqlserver 方法得到所有汉字，不用字典，我们简单利用 sql 语句就

可以得到：

select top 20902 code=identity(int,19968,1) into #t from syscolumns a,syscolumns b

再用以下语句，我们就得到所有汉字，它是按 unicode 值排序的：

select code,nchar(code) as cnword from #t

然后，我们用 select 语句，让它按笔划排序。

select code,nchar(code) as cnword

from #t

order by nchar(code) collate chinese_prc_stroke_cs_as_ks_ws,code

结果：

code cnword

----------- ------

19968 一

20008 丨

20022 丶

1 丿

2 乀

3 乁

7 乙

8 乚

9 乛

1 亅

19969 丁

..........

从上面的结果，我们可以清楚的看到，一笔的汉字， code 是从 19968 到 1 ，从小到大排，但到了二笔汉字的第一个字 “ 丁 ” ， code 为 19969 ，就不按顺序而重新开始了。有了这结果，我们就可以轻松的用 sql 语句得到每种笔划汉字归类的第一个或最后一个汉字。下面用语句得到最后一个汉字：

create table #t1(id int identity,code int,cnword nvarchar(2))

insert #t1(code,cnword)

select code,nchar(code) as cnword from #t

order by nchar(code) collate chinese_prc_stroke_cs_as_ks_ws,code

select word

from #t1 a

left join #t1 b on a.id=b.id-1 and a.code<b.code

where b.code is null

order by a.id

得到 36 个汉字，每个汉字都是每种笔划数按 chinese_prc_stroke_cs_as_ks_ws 排序规则排序后的

最后一个汉字：

亅阝马风龙齐龟齿鸩龀龛龂龆龈龊龍龠龎龐龑龡龢龝齹龣龥齈龞麷鸞麣龖龗齾齉龘

上面可以看出： “ 亅 ” 是所有一笔汉字排序后的最后一个字， “ 阝 ” 是所有二笔汉字排序后的最后

一个字 ...... 等等。

但同时也发现，从第 33 个汉字 “ 龗 (33 笔 )” 后面的笔划有些乱，不正确。但没关系，比 “ 龗 ” 笔划

多的只有四个汉字，我们手工加上：齾 35 笔，齉 36 笔，靐 39 笔，龘 64 笔

建汉字笔划表（ tab_hzbh ）：

create table tab_hzbh(id int identity,cnword nchar(1))

-- 先插入前 33 个汉字

insert tab_hzbh

select top 33 word

from #t1 a

left join #t1 b on a.id=b.id-1 and a.code<b.code

where b.code is null

order by a.id

-- 再加最后四个汉字

set identity_insert tab_hzbh on

insert tab_hzbh(id,cnword)

select 35,n 齾

union all select 36,n 齉

union all select 39,n 靐

union all select 64,n 龘

set identity_insert tab_hzbh off

到此为止，我们可以得到结果了，比如我们想得到汉字 “ 国 ” 的笔划：

declare @a nchar(1)

set @a= 国

select top 1 id

from tab_hzbh

where cnword>=@a collate chinese_prc_stroke_cs_as_ks_ws

order by id

-----------

( 结果：汉字 “ 国 ” 笔划数为 8)

上面所有准备过程，只是为了写下面这个函数，这个函数撇开上面建的所有临时表和固

定表，为了通用和代码转移方便，把表 tab_hzbh 的内容写在语句内，然后计算用户输入一串

汉字的总笔划：

create function fun_getbh(@str nvarchar(4000))

returns int

begin

declare @word nchar(1),@n int

set @n=0

while len(@str)>0

begin

set @word=left(@str,1)

-- 如果非汉字，笔划当 0 计

set @n=@n+(case when unicode(@word) between 19968 and 19968+20901

then (select top 1 id from (

select 1 as id,n 亅 as word

union all select 2,n 阝

union all select 3,n 马

union all select 4,n 风

union all select 5,n 龙

union all select 6,n 齐

union all select 7,n 龟

union all select 8,n 齿

union all select 9,n 鸩

union all select 10,n 龀

union all select 11,n 龛

union all select 12,n 龂

union all select 13,n 龆

union all select 14,n 龈

union all select 15,n 龊

union all select 16,n 龍

union all select 17,n 龠

union all select 18,n 龎

union all select 19,n 龐

union all select 20,n 龑

union all select 21,n 龡

union all select 22,n 龢

union all select 23,n 龝

union all select 24,n 齹

union all select 25,n 龣

union all select 26,n 龥

union all select 27,n 齈

union all select 28,n 龞

union all select 29,n 麷

union all select 30,n 鸞

union all select 31,n 麣

union all select 32,n 龖

union all select 33,n 龗

union all select 35,n 齾

union all select 36,n 齉

union all select 39,n 靐

union all select 64,n 龘

) t

where word>=@word collate chinese_prc_stroke_cs_as_ks_ws

order by id asc) else 0 end)

set @str=right(@str,len(@str)-1)

end

return @n

end

-- 函数调用实例：

select dbo.fun_getbh( 中华人民共和国 ),dbo.fun_getbh( 中華人民共和國 )

执行结果：笔划总数分别为 39 和 46 ，简繁体都行。

当然，你也可以把上面 “union all” 内的汉字和笔划改存在固定表内，在汉字

列建 clustered index ，列排序规则设定为：

chinese_prc_stroke_cs_as_ks_ws

这样速度更快。如果你用的是 big5 码的操作系统，你得另外生成汉字，方法一样。

但有一点要记住：这些汉字是通过 sql 语句 select 出来的，不是手工输入的，更不

是查字典得来的，因为新华字典毕竟不同于 unicode 字符集，查字典的结果会不正

确。

用排序规则的特性得到汉字拼音首字母

用得到笔划总数相同的方法，我们也可以写出求汉字拼音首字母的函数。如下：

create function fun_getpy(@str nvarchar(4000))

returns nvarchar(4000)

begin

declare @word nchar(1),@py nvarchar(4000)

set @py=

while len(@str)>0

begin

set @word=left(@str,1)

-- 如果非汉字字符，返回原字符

set @py=@py+(case when unicode(@word) between 19968 and 19968+20901

then (select top 1 py from (

select a as py,n 驁 as word

union all select b,n 簿

union all select c,n 錯

union all select d,n 鵽

union all select e,n 樲

union all select f,n 鰒

union all select g,n 腂

union all select h,n 夻

union all select j,n 攈

union all select k,n 穒

union all select l,n 鱳

union all select m,n 旀

union all select n,n 桛

union all select o,n 漚

union all select p,n 曝

union all select q,n 囕

union all select r,n 鶸

union all select s,n 蜶

union all select t,n 籜

union all select w,n 鶩

union all select x,n 鑂

union all select y,n 韻

union all select z,n 咗

) t

where word>=@word collate chinese_prc_cs_as_ks_ws

order by py asc) else @word end)

set @str=right(@str,len(@str)-1)

end

return @py

end

-- 函数调用实例：

select dbo.fun_getpy( 中华人民共和国 ),dbo.fun_getpy( 中華人民共和國 )

结果都为： zhrmghg

也可用相同的方法，扩展为得到汉字全拼的函数，甚至还可以得到全拼的读音声调，不过全拼分类大多了。得到全拼最好是用对照表，两万多汉字搜索速度很快，用对照表还可以充分利用表的索引。

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。