pandas 기초, series

2008년 Web McKinney가 pandas를 만들었다. python을 R처럼 데이터 구조를 잘 볼 수 있고 제어할 수 있도록 만든 것이다.

pandas의 기본 데이터 구조는
-Series
-DataFrame
-Panel
로 되어 있다.

1. Series

- Series는 1차원 Numpy 배열로 이해할 수 있다. Series는 배열라벨로 이루어진 numpy배열이다.
- ndarray or dictionary or scalar value 중 하나가 될 수 있다.

######################### ndarray examples #########################
# 1.0 random
series = pd.Series(np.random.rand(5))
print(series)
Out:
0    0.245100
1    0.525060
2    0.142222
3    0.061228
4    0.046563
dtype: float64

# 2.0 Month Name
import calendar as cal
month_names = [cal.month_name[i] for i in np.arange(1, 6)]
months = pd.Series(np.arange(1, 6), index=month_names)
print(months)
Out:
January     1
February    2
March       3
April       4
May         5
dtype: int32

######################### dictionary examples #########################
# 1.0
money_dict = {'US': 'dollar', 'UK': 'pound', 'Germany': 'euro', 'Korea': 'won'}
series = pd.Series(money_dict)
print(series)
Out:
Germany      euro
Korea         won
UK          pound
US         dollar
dtype: object

######################### scalar examples #########################
money = pd.Series('dollor')
print(money)
Out:
0    dollor
dtype: object

money = pd.Series('dollor', index=['US', 'NZ', 'Canada'])
print(money)
Out:
US        dollor
NZ        dollor
Canada    dollor
dtype: object

Series는 numpy의 operation에 적용 할 수 있다.
np.sqrt(series), np.std(series), np.mean(series) 등...

2. DataFrame

- table처럼 생각될 수 있다. 하나의 column은 Series와 같다. 여러 컬럼이 있다면 Series의 집합과 같다고 볼 수 있다.
ex)
columns nums strs bools decs
index
0 11 cat True 1.4
1 -6 hat False 6.9
2 25 bat False -0.6
3 8 mat True 3.7
4 -17 sat False 18.

data = {'search':['DFS','BFS','Binary Search','Linear','ShortestPath (Djikstra)'],
'sorting': ['Quicksort','Mergesort', 'Heapsort','Bubble Sort', 'Insertion Sort'],
'machine learning':['RandomForest','K Nearest Neighbor','Logistic Regression',
'K-Means Clustering','Linear Regression']}
df = pd.DataFrame(data)
df  # index 가 숫자 0~4
df = pd.DataFrame(data, index=['algo1', 'algo2', 'algo3', 'algo4', 'algo5'])
df  # index가 지정되어 있다.  

	machine learning	search	sorting
algo1	RandomForest	DFS	Quicksort
algo2	K Nearest Neighbor	BFS	Mergesort
algo3	Logistic Regression	Binary Search	Heapsort
algo4	K-Means Clustering	Linear	Bubble Sort
algo5	Linear Regression	ShortestPath (Djikstra)	Insertion Sort

* DataFrame 에 Insert Column
df.insert(1, 'number', range(5))

	machine learning	number	search	sorting
algo1	RandomForest	0	DFS	Quicksort
algo2	K Nearest Neighbor	1	BFS	Mergesort
algo3	Logistic Regression	2	Binary Search	Heapsort
algo4	K-Means Clustering	3	Linear	Bubble Sort
algo5	Linear Regression	4	ShortestPath (Djikstra)	Insertion Sort

3. Panel
-panel은 3차원 배열이다. Series나 DataFrame처럼 자주 사용되지는 않는다. 3차원이라서 print로 보기가 힘들다.
- 잘 사용도 안된다고 하니까 넘어가야지...

저작자표시 (새창열림)

'IT > Python' 카테고리의 다른 글

matplotlib 라이브러리 사용할 때 imread() 오류 (0)	2018.02.20
numpy 설치, basic (0)	2018.02.20
윈도우환경 Docker에서 Tensorflow 사용하기 (0)	2016.06.22
color histogram 연습 (0)	2015.06.02
[자연어처리] 간단하게 만든 긍정, 부정, 중립 분류 using naive bayes classifier (8)	2014.12.19

Creation & Freedom

pandas 기초, series

1. Series

2. DataFrame

'IT > Python' 카테고리의 다른 글

티스토리툴바

pandas 기초, series

1. Series

2. DataFrame

'IT > Python' 카테고리의 다른 글

관련글

티스토리툴바