# 添加列,用A列的值加上B列的值生成E列 >>> df['E'] = df['A'] + df['B'] >>> df A B C D E 2023-01-01 0.427900 -1.317100 -0.348818 -2.344505 -0.889200 2023-01-02 0.404621 -1.2705792.182677 -0.334435 -0.865958 2023-01-03 0.943640 -1.2057050.0389781.774918 -0.262065 2023-01-04 -0.0628850.207681 -0.756341 -0.2673510.144796 2023-01-05 1.522414 -1.3715671.6055610.2647260.150847 2023-01-06 -0.345359 -0.7718640.391735 -1.446381 -1.117223
# 删除列 >>> del df['E'] >>> p = df.pop('D') >>> df A B C 2023-01-01 0.427900 -1.317100 -0.348818 2023-01-02 0.404621 -1.2705792.182677 2023-01-03 0.943640 -1.2057050.038978 2023-01-04 -0.0628850.207681 -0.756341 2023-01-05 1.522414 -1.3715671.605561 2023-01-06 -0.345359 -0.7718640.391735 # 可以同时指定多个列标签来删除列,drop()函数返回的是删除后的结果,对原对象无副作用 >>> df.drop(columns=['B', 'C']) A 2023-01-01 0.427900 2023-01-02 0.404621 2023-01-03 0.943640 2023-01-04 -0.062885 2023-01-05 1.522414 2023-01-06 -0.345359
索引
可以通过行或者列两个维度去对DataFrame中的数据进行索引,有多种方式可以进行。
用列标签选择列
通过df['col']或者df.col进行,这在上面演示过,返回的是一个Series对象。
用行标签对行进行选择和切片
通过df.loc['label']进行:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
>>> df A B C D 2023-01-01 -1.465884 -2.3078651.422948 -0.568402 2023-01-02 0.6230380.212868 -1.2304731.132214 2023-01-03 1.147305 -0.7900690.4704472.206363 2023-01-04 0.186357 -0.255402 -1.2568930.200379 2023-01-05 0.4623340.3262360.592972 -0.261374 2023-01-06 0.518177 -0.080145 -1.3264051.150453 >>> df.loc['2023-01-01'] A -1.465884 B -2.307865 C 1.422948 D -0.568402 Name: 2023-01-01 00:00:00, dtype: float64 >>> df.loc['2023-01-01':'2023-01-03'] A B C D 2023-01-01 -1.465884 -2.3078651.422948 -0.568402 2023-01-02 0.6230380.212868 -1.2304731.132214 2023-01-03 1.147305 -0.7900690.4704472.206363
用整数位置对行进行选择和切片
通过df.iloc[loc]进行:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
>>> df.iloc[0] A -1.465884 B -2.307865 C 1.422948 D -0.568402 Name: 2023-01-01 00:00:00, dtype: float64 >>> df.iloc[0:3] A B C D 2023-01-01 -1.465884 -2.3078651.422948 -0.568402 2023-01-02 0.6230380.212868 -1.2304731.132214 2023-01-03 1.147305 -0.7900690.4704472.206363 # 或者 >>> df[0:3] A B C D 2023-01-01 -1.465884 -2.3078651.422948 -0.568402 2023-01-02 0.6230380.212868 -1.2304731.132214 2023-01-03 1.147305 -0.7900690.4704472.206363
仅选择一行返回的是一个Series对象,通过切片选择多行就返回一个DataFrame对象。
运算
DataFrame对象可以进行多种运算,包括转置和各种聚合。
转置
DataFrame对象的T属性可以转置DataFrame:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
>>> df A B C D 2023-01-01 -1.465884 -2.3078651.422948 -0.568402 2023-01-02 0.6230380.212868 -1.2304731.132214 2023-01-03 1.147305 -0.7900690.4704472.206363 2023-01-04 0.186357 -0.255402 -1.2568930.200379 2023-01-05 0.4623340.3262360.592972 -0.261374 2023-01-06 0.518177 -0.080145 -1.3264051.150453 >>> df.T 2023-01-01 2023-01-02 2023-01-03 2023-01-04 2023-01-05 2023-01-06 A -1.4658840.6230381.1473050.1863570.4623340.518177 B -2.3078650.212868 -0.790069 -0.2554020.326236 -0.080145 C 1.422948 -1.2304730.470447 -1.2568930.592972 -1.326405 D -0.5684021.1322142.2063630.200379 -0.2613741.150453
>>> df = pd.DataFrame({'A':['x','y','z','y','z','x'], 'B':np.random.randn(6), 'C':np.random.randn(6)}) >>> df A B C 0 x -0.956121 -1.465992 1 y 0.3870680.140719 2 z -0.711084 -1.684878 3 y -0.0009111.797802 4 z -0.0435870.594897 5 x -0.1183520.136071
# 先分组,再通过sum()函数计算每组的总和 >>> df.groupby('A').sum() B C A x -1.074473 -1.329921 y 0.3861571.938521 z -0.754671 -1.089981
>>> df A B C 0 x 0.1017370.133423 1 y 0.149723 -0.082645 2 z 1.9730250.651403 3 y 0.9342360.107589 4 z -0.653397 -1.817398 5 x -0.039525 -0.654297 >>> df.to_csv('test.csv', index_label='index')
>>> pd.read_csv('test.csv', index_col=0) A B C index 0 x 0.1017370.133423 1 y 0.149723 -0.082645 2 z 1.9730250.651403 3 y 0.9342360.107589 4 z -0.653397 -1.817398 5 x -0.039525 -0.654297