300字范文 > cql oracle Cassandra CQL中的Where和Order By子句

cql oracle Cassandra CQL中的Where和Order By子句

时间：2020-06-16 14:02:49

Cassandra的PRIMARY KEY有两个部分：

>分区键

>群集密钥

PRIMARY KEY(partitionKey1,clusteringKey1,clusteringKey2)

要么

PRIMARY KEY((partitionKey1,partitionKey2),clusteringKey1,clusteringKey2)

分区键确定您的数据存储在哪个节点上.群集键确定分区键中数据的顺序.

在CQL中,ORDER BY子句实际上仅用于反转群集顺序的已定义排序方向.对于列本身,您只能在创建表时指定CLUSTERING ORDER BY子句中定义的列(并按照确切的顺序…不跳过).因此,您无法选择任意列来在查询时对结果集进行排序.

Cassandra通过使用聚类键对磁盘上的数据进行排序来实现性能,从而仅在单次读取中返回有序行(无随机读取).这就是您必须使用Cassandra采用基于查询的建模方法(通常将数据复制到多个查询表中)的原因.提前了解您的问题,并构建表格以便为其提供服务.

Select * from emp order by empno;

首先,您需要一个WHERE子句.如果你正在使用关系数据库,没有它可以查询.使用Cassandra,您应该尽力避免未绑定的SELECT查询.此外,Cassandra只能在分区中强制执行排序顺序,因此无论如何,在没有WHERE子句的情况下查询将不会按照您想要的顺序返回数据.

其次,如上所述,您需要定义群集密钥.如果要按empno订购结果集,则必须找到另一列来定义分区键.尝试这样的事情：

CREATE TABLE emp_by_dept (

empno text,

dept text,

name text,

PRIMARY KEY (dept,empno)

) WITH CLUSTERING ORDER BY (empno ASC);

现在,我可以按部门查询员工,他们将按照empno的要求返回给我：

SELECT * FROM emp_by_dept WHERE dept='IT';

但要明确的是,您将无法查询表中的每一行,并按单列排序.获得有意义的订单到结果集的唯一方法是首先以对业务案例有意义的方式对数据进行分区.运行未绑定的SELECT将返回所有行(假设查询在尝试查询集群中的每个节点时没有超时),但结果集排序只能在分区内强制执行.所以你必须通过分区键来限制,以便有意义.

我为自我推销道歉,但去年我为DataStax写了一篇名为We Shall Have Order!的文章,其中我提到了如何解决这些类型的问题.给它一个阅读,看看它是否有帮助.

编辑其他问题：

From your answer I concluded 2 things about Cassandra:

(1) There is no

way of getting a result set which is only order by a column that has

been defined as Unique.

(2) When we define a PK

(partition-key+clustering-key), then the results will always be order

by Clustering columns within any fixed partition key (we must restrict

to one partition-key value), that means there is no need of ORDER BY

clause, since it cannot ever change the order of rows (the order in

which rows are actually stored), i.e. Order By is useless.

1)Cassandra中的所有PRIMARY KEY都是独一无二的.您无法通过分区键来订购结果集.在我的例子中,我按empno排序(在dept之后进行分区). – Aaron 1小时前

2)没有说ORDER BY是无用的,我会说它唯一真正的用途是在ASC和DESC之间切换你的排序方向.

I created an index on “empno” column of “emp” table, it is still not

allowing ORDER BY empno. So, what Indexes are for? are they only for

searching records for specific value of index key?

您无法通过索引列对结果集进行排序.辅助索引(与其关系对应物不同)实际上仅对边缘情况,基于分析的查询有用.它们不会扩展,因此一般建议不要使用二级索引.

Ok, that simply means that one table cannot be used for getting

different result sets with different conditions and different sorting

order.

正确.

Hence for each new requirement, we need to create a new table.

IT means if we have a billion rows in a table (say Sales table), and

we need sum of sales (1) Product-wise, (2) Region-wise, then we will

duplicate all those billion rows in 2 tables with one in clustering

order of Product, the other in clustering order of Region,. and even

if we need to sum sales per Salesman_id, then we build a 3rd table,

again putting all those billion rows? is it sensible?

你真的要决定它是多么明智.但缺乏查询灵活性是Cassandra的缺点.为了解决这个问题,您可以继续创建查询表(I.E.,交易磁盘以获得性能).但是如果它变得笨拙或难以管理,那么现在是时候考虑Cassandra是否真的是正确的解决方案.

编辑0321

Hi Aaron, you said above “Stopping short of saying that ORDER BY is useless, I’ll say that its only real use is to switch your sort direction between ASC and DESC.”

But i found even that is not correct. Cassandra only allows ORDER by in the same direction as we define in the “CLUSTERING ORDER BY” caluse of CREATE TABLE. If in that clause we define ASC, it allows only order by ASC, and vice versa.

没有看到错误消息,很难知道在那个消息上告诉你什么.虽然在分区中存储了太多行时,我听说ORDER BY的查询失败了.

如果指定多个列进行排序,则ORDER BY的功能有点奇怪.如果我定义了两个聚类列,我可以不加选择地在第一列上使用ORDER BY.但是只要我将第二列添加到ORDER BY子句中,只有在指定两个排序方向相同(如CLUSTERING ORDER BY定义)或两者都不同时,我的查询才有效.如果我混合搭配,我得到这个：

InvalidRequest: code=2200 [Invalid query] message="Unsupported order by relation"

我认为这与数据如何存储在磁盘上有关.否则,Cassandra在准备结果集方面还有很多工作要做.然而,如果它要求所有内容匹配或镜像在CLUSTERING ORDER BY中指定的方向,它只能从磁盘中继顺序读取.因此,最好只在ORDER BY子句中使用单个列,以获得更可预测的结果.

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。