300字范文 > Pandas数据结构 —— Series和DataFrame

Pandas数据结构 —— Series和DataFrame

时间：2020-06-01 01:25:42

本文章是3.2的内容，如果想要源代码和数据可以看以下链接：

/download/Ahaha_biancheng/83338868

文章目录

3.2 Pandas数组结构3.2.1 Series对象3.2.2 Series数据选取（1）查询（2）修改(3) 增加(4) 删除(5) 更改索引 3.2.3 DataFrame对象DataFrame 创建方法：DataFrame数据访问(1) 访问（2）增加（3）修改（4）删除

3.2 Pandas数组结构

结构化数据分析是一种成熟的过程和技术。关系数据库用于结构化数据。

pandas是基于python的Numpy库的数据分析工具包，非常方便关系数据库的处理。

◆ Series数据结构用于处理一维数据◆ DataFrame数据结构用于处理二维数据和高维数据◆ 汇集多种数据源数据、处理缺失数据◆ 对数据进行切片、聚合和汇总统计◆ 实现数据可视化

import numpy as npimport pandas as pdfrom pandas import DataFrame, Series

3.2.1 Series对象

Series创建

Series([data, index, ....])data：Python的列表或Numpy的一维ndarray对象 index：列表，若省略则自动生成0 ~n-1的序号标签

例题3-1 创建5名篮球运动员身高的Series结构对象height，值是身高，

索引为球衣号码（数字字符串作为索引）。

# data=np.array([187,190,185,178,185])height=Series([187,190,185,178,185],index=['13','14','7','2','9']) # index可省略，但是最好带着height

13 18714 190718521789185dtype: int64

# 不赋值index默认数字索引height2=Series([187,190,185,178,185])height2

0 1871 1902 1853 1784 185dtype: int64

Series对象与字典类型类似，可以将index和valus数组中序号相同的一组元素视为字典的键-值对。用字典创建Series对象，将字典的key作为索引：

height3 = Series({'13':187, '14':190}) # 别忘了花括号height3

13 18714 190dtype: int64

3.2.2 Series数据选取

（1）查询

# 索引名查询单个值height['13']

187

# 索引名查询多个值height[['13','2']]

13 1872178dtype: int64

# 数字索引查询height[4]

185

# 数字索引切片查询height[0:3]

13 18714 1907185dtype: int64

# 条件筛选height[height.values>=185]

13 18714 19071859185dtype: int64

height=Series([187,190,185,178,185], index = ['13','14','7','2','9'])height

13 18714 190718521789185dtype: int64

height.values>=185

array([ True, True, True, False, True])

height[[ True, True, True, False, True]]

13 18714 19071859185dtype: int64

（2）修改

先查询后赋值

height['13'] = 180height

13 18014 190718521789185dtype: int64

height[['13','14']] = 180height

13 18014 180718521789185dtype: int64

height[:] = 180height

13 18014 180718021809180dtype: int64

(3) 增加

Series不能直接添加新数据

append()函数将两个Series拼接产生一个新的Series不改变原Series

height.append({'3':191}) # 出错

---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-12-34d946c9e2bd> in <module>----> 1 height.append({'3':191}) # 出错E:\AnacondaInstall\lib\site-packages\pandas\core\series.py in append(self, to_append, ignore_index, verify_integrity)2579 else:2580 to_concat = [self, to_append]-> 2581 return concat(2582 to_concat, ignore_index=ignore_index, verify_integrity=verify_integrity2583 )E:\AnacondaInstall\lib\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)269ValueError: Indexes have overlapping values: ['a']270"""--> 271op = _Concatenator(272 objs,273 axis=axis,E:\AnacondaInstall\lib\site-packages\pandas\core\reshape\concat.py in __init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)355 "only Series and DataFrame objs are valid".format(typ=type(obj))356 )--> 357 raise TypeError(msg)358 359 # consolidateTypeError: cannot concatenate object of type '<class 'dict'>'; only Series and DataFrame objs are valid

# 先创建一个新的Series数据 aa = Series([191, 182], index=['3','0'])a

3 1910 182dtype: int64

# append()函数将两个Series拼接产生一个新的Seriesnew = height.append(a)new

13 18714 19071852178918531910182dtype: int64

# 不改变原Seriesheight

13 18014 180718021809180dtype: int64

(4) 删除

height.drop(['13'])

14 190718521789185dtype: int64

height.drop('14')

13 187718521789185dtype: int64

# 不会改变原有数据height

13 18714 190718521789185dtype: int64

# 所以要及时保存new = height.drop(['13'])new

14 190718521789185dtype: int64

(5) 更改索引

用新的列表替换即可

height.index = [5, 6, 7, 8, 9]

height

5 1806 1807 1808 1809 180dtype: int64

height[[5, 6]]

5 1806 180dtype: int64

# Series的索引为数字，基于位置序号访问需要使用iloc方式height.iloc[0]

180

3.2.3 DataFrame对象

DataFrame 包括值（values）、行索引（index）和列索引（columns）3部分

DataFrame 创建方法：

DataFrame ( data，index = […]，columns=[…] )

* data：列表或NumPy的二维ndarray对象 * index，colunms：列表，若省略则自动生成0 ~n-1的序号标签

data = np.array([[19,170,68],[20,165,65],[18, 175, 65]])st = DataFrame(data, index=[11,12,13], columns=['age','height','weight'])st

DataFrame数据访问

(1) 访问

st

# 选择列 df[col]st[['age']]

# 选择多列 df[col]st[['age','height']]

# 利用切片选择行 df[0:2]st[0:2]st.iloc[0:2, :] # 列的“:”可以省略

# 利用索引选择行 df.loc[label]st.loc[11]

age 19height 170weight68Name: 11, dtype: int32

# 利用索引选择行，列 df.loc[index, column]st.loc[[11,13],['age','height']]

st

# 利用数字索引选择行 df.iloc[loc]st.iloc[[0,1],[0,1]]

# 利用切片选择行列 df[0:2, 0:2]st.iloc[0:2, 0:2]

# 利用表达式筛选行 df[bool_vec]，联想数据库查询st.loc[st['age']>=19, ['height']]

（2）增加

DataFrame对象可以添加新的列，但不能直接增加新的行，增加行需要通过两个DataFrame对象的合并实现（见章节3.5）

st

st['expense'] = [1100, 1000, 900] #列索引标签不存在，添加新列；存在则为值修改st

（3）修改

# 按索引先找到后赋值st['age'] = st['age'] + 1 # 列索引标签不存在，则为值修改st

st['expense'] = 1200st

# 按索引先找到后用列表赋值st['expense'] = [1300, 1400, 1500]st

# 修改1号同学数据，用列表赋值st.loc[[11]] = [21,180,70,20]st

# 先筛选后赋值st.loc[st['expense']<800, 'expense'] = 800st

st.loc[st['expense']==800, 'expense'] = 80# 第一步得到满足条件的布尔数组mask = st['expense']<800mask

11True12 False13 FalseName: expense, dtype: bool

# 第二步筛选出满足条件的人以后选中expense列进行赋值st.loc[mask, 'expense'] = 900st

（4）删除

不修改原始数据对象，如果需要直接删除原始对象的行或列，设置参数 inplace=Trueaxis = 0表示行，axis = 1表示列 ˈaksəs

# 删除行st.drop(11, axis=0)

# 删除列st.drop('age', axis=1)

# 删除多列st.drop(['height','age'], axis=1)

# 未改变原始数据st

# 要改变原始数据st.drop([13], axis=0, inplace=True)st

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。