matplotlib.pyplot.scatter – 散布図

2020-03-10 / tau / コメントする

概要

scatterはx座標とy座標のペアを与えて散布図を描く。

scatter(x, y, color/c=color, s=n, marker=marker, edgecolors=color): x、yは散布図の点の座標で、数値の場合は1点、配列の場合は複数の点を描く。color(またはc)とedgecolorはmatplotlibのcolor指定。markerはmatplotlibのmarkers指定。sはマーカーのサイズ。

基本形

import numpy as np
import numpy.random as rnd
import matplotlib.pyplot as plt

rnd.seed(0)
x = rnd.random(50)
y = rnd.random(50)

fig, ax = plt.subplots(figsize=(4.8, 3.6))

ax.scatter(x, y, s=40, marker='o', color='aquamarine', edgecolors='black')

ax.set_aspect('equal')

plt.show()

import numpy as np

import numpy.random as rnd

import matplotlib.pyplot as plt

rnd.seed(0)

x = rnd.random(50)

y = rnd.random(50)

fig, ax = plt.subplots(figsize=(4.8, 3.6))

ax.scatter(x, y, s=40, marker='o', color='aquamarine', edgecolors='black')

ax.set_aspect('equal')

plt.show()

複数系列

複数系列の場合は、系列ごとにscatterを実行する。

import numpy as np
import numpy.random as rnd
import matplotlib.pyplot as plt

x1 = rnd.random(50) + 0.5
y1 = rnd.random(50) + 1

x2 = rnd.random(50) + 1
y2 = rnd.random(50) + 0.5

fig, ax = plt.subplots(figsize=(6.4, 4.8))

ax.scatter(x1, y1, marker='o', s=40, c='blue', alpha=0.5)
ax.scatter(x2, y2, marker='^', s=80, color='red', alpha=0.5)

ax.set_aspect('equal')

plt.show()

import numpy as np

import numpy.random as rnd

import matplotlib.pyplot as plt

x1 = rnd.random(50) + 0.5

y1 = rnd.random(50) + 1

x2 = rnd.random(50) + 1

y2 = rnd.random(50) + 0.5

fig, ax = plt.subplots(figsize=(6.4, 4.8))

ax.scatter(x1, y1, marker='o', s=40, c='blue', alpha=0.5)

ax.scatter(x2, y2, marker='^', s=80, color='red', alpha=0.5)

ax.set_aspect('equal')

plt.show()

Irisデータセット

2020-03-08 / tau / コメントする

概要

Irisデータセットはアヤメの種類と特徴量に関するデータセットで、3種類のアヤメの花弁と萼(がく)に関する特徴量について多数のデータを提供する。

ここではPythonのscikit-learnにあるirisデータの使い方をまとめる。

データの取得とデータ構造

Pythonで扱う場合、scikit-learnのdatasetsモジュールにあるload_iris()でデータを取得できる。データはBunchクラスのオブジェクトととのことだが、通常の扱い方は辞書と同じようだ。

from sklearn.datasets import load_iris

iris_dataset = load_iris()

for key, value in zip(iris_dataset.keys(), iris_dataset.values()):
    print("{}:\n{}\n".format(key, value))

from sklearn.datasets import load_iris

iris_dataset = load_iris()

for key, value in zip(iris_dataset.keys(), iris_dataset.values()):

print("{}:\n{}\n".format(key, value))

データの構造は辞書型で、150個体のアヤメに関する特徴量の配列と各個体の種類、種類名などが格納されている。

data:
[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 .....
 [6.5 3.  5.2 2. ]
 [6.2 3.4 5.4 2.3]
 [5.9 3.  5.1 1.8]]

target:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]

target_names:
['setosa' 'versicolor' 'virginica']

DESCR:
.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

    ============== ==== ==== ======= ===== ====================
                    Min  Max   Mean    SD   Class Correlation
    ============== ==== ==== ======= ===== ====================
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)
    ============== ==== ==== ======= ===== ====================

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :Date: July, 1988

The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken
from Fisher's paper. Note that it's the same as in R, but not as in the UCI
Machine Learning Repository, which has two wrong data points.

This is perhaps the best known database to be found in the
pattern recognition literature.  Fisher's paper is a classic in the field and
is referenced frequently to this day.  (See Duda & Hart, for example.)  The
data set contains 3 classes of 50 instances each, where each class refers to a
type of iris plant.  One class is linearly separable from the other 2; the
latter are NOT linearly separable from each other.

.. topic:: References

   - Fisher, R.A. "The use of multiple measurements in taxonomic problems"
     Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
     Mathematical Statistics" (John Wiley, NY, 1950).
   - Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.
     (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.
   - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
     Structure and Classification Rule for Recognition in Partially Exposed
     Environments".  IEEE Transactions on Pattern Analysis and Machine
     Intelligence, Vol. PAMI-2, No. 1, 67-71.
   - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE Transactions
     on Information Theory, May 1972, 431-433.
   - See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al"s AUTOCLASS II
     conceptual clustering system finds 3 classes in the data.
   - Many, many more ...

feature_names:
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

filename:
C:\Users\tomo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\datasets\data\iris.csv

data:

[[5.1 3.5 1.4 0.2]

[4.9 3. 1.4 0.2]

[4.7 3.2 1.3 0.2]

.....

[6.5 3. 5.2 2. ]

[6.2 3.4 5.4 2.3]

[5.9 3. 5.1 1.8]]

target:

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

2 2]

target_names:

['setosa' 'versicolor' 'virginica']

DESCR:

.. _iris_dataset:

Iris plants dataset

--------------------

**Data Set Characteristics:**

:Number of Instances: 150 (50 in each of three classes)

:Number of Attributes: 4 numeric, predictive attributes and the class

:Attribute Information:

- sepal length in cm

- sepal width in cm

- petal length in cm

- petal width in cm

- class:

- Iris-Setosa

- Iris-Versicolour

- Iris-Virginica

:Summary Statistics:

============== ==== ==== ======= ===== ====================

Min Max Mean SD Class Correlation

============== ==== ==== ======= ===== ====================

sepal length: 4.3 7.9 5.84 0.83 0.7826

sepal width: 2.0 4.4 3.05 0.43 -0.4194

petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)

petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)

============== ==== ==== ======= ===== ====================

:Missing Attribute Values: None

:Class Distribution: 33.3% for each of 3 classes.

:Creator: R.A. Fisher

:Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)

:Date: July, 1988

The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken

from Fisher's paper. Note that it's the same as in R, but not as in the UCI

Machine Learning Repository, which has two wrong data points.

This is perhaps the best known database to be found in the

pattern recognition literature. Fisher's paper is a classic in the field and

is referenced frequently to this day. (See Duda & Hart, for example.) The

data set contains 3 classes of 50 instances each, where each class refers to a

type of iris plant. One class is linearly separable from the other 2; the

latter are NOT linearly separable from each other.

.. topic:: References

- Fisher, R.A. "The use of multiple measurements in taxonomic problems"

Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to

Mathematical Statistics" (John Wiley, NY, 1950).

- Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.

(Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.

- Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System

Structure and Classification Rule for Recognition in Partially Exposed

Environments". IEEE Transactions on Pattern Analysis and Machine

Intelligence, Vol. PAMI-2, No. 1, 67-71.

- Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE Transactions

on Information Theory, May 1972, 431-433.

- See also: 1988 MLC Proceedings, 54-64. Cheeseman et al"s AUTOCLASS II

conceptual clustering system finds 3 classes in the data.

- Many, many more ...

feature_names:

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

filename:

C:\Users\tomo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\datasets\data\iris.csv

データのキーは以下のようになっている。

from sklearn.datasets import load_iris

iris_dataset = load_iris()

print(iris_dataset.keys())

# dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

from sklearn.datasets import load_iris

iris_dataset = load_iris()

print(iris_dataset.keys())

# dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

データの内容

`'data'`～特徴量データセット

150個体のアヤメに関する、4つの特徴量をレコードとしたデータセット。各個体の4つの特徴量の配列を要素とした2次元配列。列のインデックス(0, 1, 2, 3)が四つの特徴量に対応している。

'data': array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       .....
       [6.7, 3. , 5.2, 2.3],
       [6.3, 2.5, 5. , 1.9],
       [6.5, 3. , 5.2, 2. ],
       [6.2, 3.4, 5.4, 2.3],
       [5.9, 3. , 5.1, 1.8]])

'data': array([[5.1, 3.5, 1.4, 0.2],

[4.9, 3. , 1.4, 0.2],

[4.7, 3.2, 1.3, 0.2],

[4.6, 3.1, 1.5, 0.2],

[5. , 3.6, 1.4, 0.2],

.....

[6.7, 3. , 5.2, 2.3],

[6.3, 2.5, 5. , 1.9],

[6.5, 3. , 5.2, 2. ],

[6.2, 3.4, 5.4, 2.3],

[5.9, 3. , 5.1, 1.8]])

`'target'`～アヤメの種類に対応したコード

3種類のアヤメに対応した0～2のコードの配列。150個体のアヤメに対応した1次元配列。

'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,

2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,

2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

`'target_names'`～アヤメの種類名

アヤメの3つの種類の種類名。stosaは「ヒオウギアヤメ」といって少し大人締めの色形だが、versicolorとvirginicaは素人にはその違いがよく分からない。

'target_names': array(['setosa', 'versicolor', 'virginica'], dtype='<U10'),

1	'target_names': array(['setosa', 'versicolor', 'virginica'], dtype='<U10'),

種類名とコードの関係は以下の通り。

setosa	0
versicolor	1
virginica	2

`'feature_names'`～特徴名

データの格納順はDESCRの後。アヤメの種類のクラス分けに使う特徴。

sepal(萼)とpetal(花弁)の長さと幅、計4つの特徴の名称が、単位cmを含む文字列で格納されている。

‘sepal length (cm)’　萼の長さ
‘sepal width (cm)’　萼の幅
‘petal length (cm)’　花弁の長さ
‘petal width (cm)’　花弁の幅

'feature_names': ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

1	'feature_names': ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

特徴名とコードの関係は以下の通り。

sepal length (cm)	0
sepal width (cm)	1
petal length (cm)	2
petal width (cm)	3

`'filename'`～ファイル名

これも格納順はDESCRの後で、CSVファイルの位置が示されている。1行目にはデータ数、特徴量数、特徴量名称が並んでおり、その後に150行のアヤメの個体に対する4列の特徴量と1列の種類データが格納されている。このファイルにはfeature_namesやDESCRに当たるデータは格納されていない。

'filename': 'C:...lib\\site-packages\\sklearn\\datasets\\data\\iris.csv'

1	'filename': 'C:...lib\\site-packages\\sklearn\\datasets\\data\\iris.csv'

`'DESCR'`～データセットの説明

データセットの説明。print(iris_dataset['DESCR'])のようにprint文で整形表示される。

レコード数は150個(3つのクラスで50個ずつ)
属性は、4つの数値属性とクラス(種類)
→predictiveの意味とclassが単数形なのがわからない

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

    ============== ==== ==== ======= ===== ====================
                    Min  Max   Mean    SD   Class Correlation
    ============== ==== ==== ======= ===== ====================
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)
    ============== ==== ==== ======= ===== ====================

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :Date: July, 1988

The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken
from Fisher's paper. Note that it's the same as in R, but not as in the UCI
Machine Learning Repository, which has two wrong data points.

This is perhaps the best known database to be found in the
pattern recognition literature.  Fisher's paper is a classic in the field and
is referenced frequently to this day.  (See Duda & Hart, for example.)  The
data set contains 3 classes of 50 instances each, where each class refers to a
type of iris plant.  One class is linearly separable from the other 2; the
latter are NOT linearly separable from each other.

.. topic:: References

   - Fisher, R.A. "The use of multiple measurements in taxonomic problems"
     Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
     Mathematical Statistics" (John Wiley, NY, 1950).
   - Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.
     (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.
   - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
     Structure and Classification Rule for Recognition in Partially Exposed
     Environments".  IEEE Transactions on Pattern Analysis and Machine
     Intelligence, Vol. PAMI-2, No. 1, 67-71.
   - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE Transactions
     on Information Theory, May 1972, 431-433.
   - See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al"s AUTOCLASS II
     conceptual clustering system finds 3 classes in the data.
   - Many, many more ...

.. _iris_dataset:

Iris plants dataset

--------------------

**Data Set Characteristics:**

:Number of Instances: 150 (50 in each of three classes)

:Number of Attributes: 4 numeric, predictive attributes and the class

:Attribute Information:

- sepal length in cm

- sepal width in cm

- petal length in cm

- petal width in cm

- class:

- Iris-Setosa

- Iris-Versicolour

- Iris-Virginica

:Summary Statistics:

============== ==== ==== ======= ===== ====================

Min Max Mean SD Class Correlation

============== ==== ==== ======= ===== ====================

sepal length: 4.3 7.9 5.84 0.83 0.7826

sepal width: 2.0 4.4 3.05 0.43 -0.4194

petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)

petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)

============== ==== ==== ======= ===== ====================

:Missing Attribute Values: None

:Class Distribution: 33.3% for each of 3 classes.

:Creator: R.A. Fisher

:Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)

:Date: July, 1988

The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken

from Fisher's paper. Note that it's the same as in R, but not as in the UCI

Machine Learning Repository, which has two wrong data points.

This is perhaps the best known database to be found in the

pattern recognition literature. Fisher's paper is a classic in the field and

is referenced frequently to this day. (See Duda & Hart, for example.) The

data set contains 3 classes of 50 instances each, where each class refers to a

type of iris plant. One class is linearly separable from the other 2; the

latter are NOT linearly separable from each other.

.. topic:: References

- Fisher, R.A. "The use of multiple measurements in taxonomic problems"

Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to

Mathematical Statistics" (John Wiley, NY, 1950).

- Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.

(Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.

- Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System

Structure and Classification Rule for Recognition in Partially Exposed

Environments". IEEE Transactions on Pattern Analysis and Machine

Intelligence, Vol. PAMI-2, No. 1, 67-71.

- Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE Transactions

on Information Theory, May 1972, 431-433.

- See also: 1988 MLC Proceedings, 54-64. Cheeseman et al"s AUTOCLASS II

conceptual clustering system finds 3 classes in the data.

- Many, many more ...

データの利用

データの取得方法

irisデータセットから各データを取り出すのに、以下の2つの方法がある。

辞書のキーを使って呼び出す（例：iris_dataset['DESCR']）
キーの文字列をプロパティーに指定する（例：iris_dataset.DESCR）

全レコードの特徴量データの取得

'data'から、150の個体に関する4つの特徴量が150行4列の2次元配列で得られる。4つの特徴量は’feature_names’の4つの特徴名に対応している。

from sklearn.datasets import load_iris

iris_data = load_iris()

X = iris_data['data']

print(X)

# [[5.1 3.5 1.4 0.2]
#  [4.9 3.  1.4 0.2]
#  [4.7 3.2 1.3 0.2]
#  .....
#  [6.5 3.  5.2 2. ]
#  [6.2 3.4 5.4 2.3]
#  [5.9 3.  5.1 1.8]]

from sklearn.datasets import load_iris

iris_data = load_iris()

X = iris_data['data']

print(X)

# [[5.1 3.5 1.4 0.2]

# [4.9 3. 1.4 0.2]

# [4.7 3.2 1.3 0.2]

# .....

# [6.5 3. 5.2 2. ]

# [6.2 3.4 5.4 2.3]

# [5.9 3. 5.1 1.8]]

特定の特徴量のデータのみ取得

特定の特徴量に関する全個体のデータを取り出すときにはX[:, n]の形で指定する。

from sklearn.datasets import load_iris

iris_data = load_iris()

features = iris_data['feature_names']
X = iris_data['data']
n_feature = 2

feature = X[:, n_feature]

print("feature name : {}".format(features[n_feature]))
print("feature data :\n{}".format(feature))

# feature name : petal length (cm)
# feature data :
# [1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 1.5 1.6 1.4 1.1 1.2 1.5 1.3 1.4
#  .....
#  5.7 5.2 5.  5.2 5.4 5.1]

from sklearn.datasets import load_iris

iris_data = load_iris()

features = iris_data['feature_names']

X = iris_data['data']

n_feature = 2

feature = X[:, n_feature]

print("feature name : {}".format(features[n_feature]))

print("feature data :\n{}".format(feature))

# feature name : petal length (cm)

# feature data :

# [1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 1.5 1.6 1.4 1.1 1.2 1.5 1.3 1.4

# .....

# 5.7 5.2 5. 5.2 5.4 5.1]

特定のクラスのデータのみ抽出

特定のクラス（この場合は種類）のレコードのみを抽出する方法。ndarrayの条件による要素抽出を使う。

from sklearn.datasets import load_iris

iris_data = load_iris()

targets = iris_data['target_names']
features = iris_data['feature_names']
X = iris_data['data']
y = iris_data['target']

n_class = 1
data_1 = X[y==1]

print("data for class {}:\n{}".format(targets[n_class], X[y==n_class]))

# data for class versicolor:
# [[7.  3.2 4.7 1.4]
#  [6.4 3.2 4.5 1.5]
#  [6.9 3.1 4.9 1.5]
#  .....
#  [6.2 2.9 4.3 1.3]
#  [5.1 2.5 3.  1.1]
#  [5.7 2.8 4.1 1.3]]

from sklearn.datasets import load_iris

iris_data = load_iris()

targets = iris_data['target_names']

features = iris_data['feature_names']

X = iris_data['data']

y = iris_data['target']

n_class = 1

data_1 = X[y==1]

print("data for class {}:\n{}".format(targets[n_class], X[y==n_class]))

# data for class versicolor:

# [[7. 3.2 4.7 1.4]

# [6.4 3.2 4.5 1.5]

# [6.9 3.1 4.9 1.5]

# .....

# [6.2 2.9 4.3 1.3]

# [5.1 2.5 3. 1.1]

# [5.7 2.8 4.1 1.3]]

2次元で配置されたAxesを一括で扱う

2020-03-08 / tau / コメントする

subplots()やadd_subplot()で複数の行数・列数のAxesを生成すると、Axesオブジェクトの2次元の配列となる。

この結果に対して一律に処理をしたい場合（たとえば軸の値や凡例を設定したい、アスペクトを揃えたいなどの場合）、いちいち二重ループを回すのが面倒。

import numpy as np
import matplotlib.pyplot as plt

fig, axs = plt.subplots(2, 2)

x = np.linspace(-np.pi, np.pi)
n = 1
for row in axs:
    for ax in row:
        ax.plot(x, np.sin(n*x), label="n={}".format(n))
        n += 1
        ax.set_ylim(-1.2, 1.8)
        ax.legend(loc='upper left')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

fig, axs = plt.subplots(2, 2)

x = np.linspace(-np.pi, np.pi)

n = 1

for row in axs:

for ax in row:

ax.plot(x, np.sin(n*x), label="n={}".format(n))

n += 1

ax.set_ylim(-1.2, 1.8)

ax.legend(loc='upper left')

plt.show()

これを1次元配列に変換して一括で扱う方法。

変換方法の1つは、以下のように1次元配列で取り出してしまう方法

axs_1d = [ax for row in axs for ax in row]

1	axs_1d = [ax for row in axs for ax in row]

あるいは、以下のように2次元配列を1次元に変換する方法（当初、reshape(1, -1)[0]のようなことをしていたが、reshape(-1)とすればよいことがわかった）。

axs_1d = axs.reshape(-1)

1	axs_1d = axs.reshape(-1)

こうすると1次元配列axs_1dで2次元のaxsの全要素に対してアクセス可能になる。

import numpy as np
import matplotlib.pyplot as plt

fig, axs = plt.subplots(2, 2)

axs_1d = axs.reshape(-1)

x = np.linspace(-np.pi, np.pi)
for n, ax in enumerate(axs_1d):
    ax.plot(x, np.sin(n*x), label="n={}".format(n))
    ax.set_ylim(-1.2, 1.8)
    ax.legend(loc='upper left')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

fig, axs = plt.subplots(2, 2)

axs_1d = axs.reshape(-1)

x = np.linspace(-np.pi, np.pi)

for n, ax in enumerate(axs_1d):

ax.plot(x, np.sin(n*x), label="n={}".format(n))

ax.set_ylim(-1.2, 1.8)

ax.legend(loc='upper left')

plt.show()

このほかにflatten()、ravel()を使う方法もある。flatten()はコピーを返すが、Axesオブジェクトへの参照先は変わらないので同じ効果。

各グラフにカウンターの値を適用するときはenumerate、他のリストなどと同時に変えていくときはzipを使う。

matplotlib.pyplot.hist – ヒストグラム

2020-03-08 / tau / コメントする

概要

matplotlib.pyplot.histは、配列データのヒストグラムを描画する。主なパラメータのみ示す。

hist(X, bins, range, density, cumulative, histtype, rwidth, color, stacked): Xはヒストグラムのデータで一つのグラフに一つの1次元配列。その他のパラメーターは以下で説明。

ヒストグラムの形式

度数分布

1次元の配列でデータを渡すと、その度数分布が描かれる。

ただしデフォルトでは各ビンのエッジが区別できないため、これを描くためにはedghecolorの指定が必要。edgecolorの代わりにecで指定してもよい。

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)

x = np.random.normal(0, 1, 100)

fig, axs = plt.subplots(1,2, figsize=(6.4, 2.4))

axs[0].hist(x)
axs[1].hist(x, edgecolor='k')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

np.random.seed(0)

x = np.random.normal(0, 1, 100)

fig, axs = plt.subplots(1,2, figsize=(6.4, 2.4))

axs[0].hist(x)

axs[1].hist(x, edgecolor='k')

plt.show()

頻度分布

density=Trueを指定すると、頻度分布になる。各ビンの総和が1となるように調整され、形状は度数分布と同じ。

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)

x = np.random.normal(0, 1, 100)

fig, axs = plt.subplots(1,2, figsize=(6.4, 2.4))

axs[0].hist(x, ec='k')
axs[1].hist(x, density=True, ec='k')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

np.random.seed(0)

x = np.random.normal(0, 1, 100)

fig, axs = plt.subplots(1,2, figsize=(6.4, 2.4))

axs[0].hist(x, ec='k')

axs[1].hist(x, density=True, ec='k')

plt.show()

累積分布図

cumulative=Trueを指定すると、累積度数分布、累積頻度分布を描く。

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)

x = np.random.normal(0, 1, 100)

fig, axs = plt.subplots(1,2, figsize=(6.4, 2.4))

axs[0].hist(x, cumulative=True, ec='k')
axs[1].hist(x, cumulative=True, density=True, ec='k')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

np.random.seed(0)

x = np.random.normal(0, 1, 100)

fig, axs = plt.subplots(1,2, figsize=(6.4, 2.4))

axs[0].hist(x, cumulative=True, ec='k')

axs[1].hist(x, cumulative=True, density=True, ec='k')

plt.show()

ビン数

binsでヒストグラムの柱（ビン）の数等を指定する。デフォルトはbins=10。

bins=n: ビンの数を数値で指定する。
bins=sequence: ビンの境界値をリスト等で指定する。

ビン(bin)は英語で、店の商品や工場の部品などを入れておく大きなケース、ストッカーのことをいい、British Englishではごみ箱を指す。日本語の瓶(びん)の呼び名とは関係ないらしい。

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)

x = np.random.normal(0, 1, 100)

fig, axs = plt.subplots(1, 2, figsize=(6.4, 2.4))

axs[0].hist(x, bins=5, ec='k')
axs[1].hist(x, bins=[-3, -2, -1, 0, 0.5, 1, 1.5, 2], ec='k')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

np.random.seed(0)

x = np.random.normal(0, 1, 100)

fig, axs = plt.subplots(1, 2, figsize=(6.4, 2.4))

axs[0].hist(x, bins=5, ec='k')

axs[1].hist(x, bins=[-3, -2, -1, 0, 0.5, 1, 1.5, 2], ec='k')

plt.show()

レンジ

rangeでヒストグラムのビンを分割する範囲を指定する。そのレンジの上下限値の間でbinsで指定されたビンに分割される。

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)

x = np.random.normal(0, 1, 100)

fig, axs = plt.subplots(1, 3, figsize=(9.6, 2.4))

axs[0].hist(x, edgecolor='k')
axs[1].hist(x, range=(-3, 3), ec='k')
axs[2].hist(x, range=(-1.5, 1.5), ec='k')

for ax in axs:
    ax.set_xlim(-3, 3)
    ax.set_xticks(np.arange(-3, 4, 1))

plt.show()

import numpy as np

import matplotlib.pyplot as plt

np.random.seed(0)

x = np.random.normal(0, 1, 100)

fig, axs = plt.subplots(1, 3, figsize=(9.6, 2.4))

axs[0].hist(x, edgecolor='k')

axs[1].hist(x, range=(-3, 3), ec='k')

axs[2].hist(x, range=(-1.5, 1.5), ec='k')

for ax in axs:

ax.set_xlim(-3, 3)

ax.set_xticks(np.arange(-3, 4, 1))

plt.show()

色・エッジの指定

colorでビンの色、edgecolor/ecでエッジの色、linewidthでエッジの幅を指定する。

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)

x = np.random.normal(0, 1, 100)

fig, axs = plt.subplots(1, 2, figsize=(6.4, 2.4))

axs[0].hist(x, color='lightseagreen', edgecolor='k')
axs[1].hist(x, color='lightcoral', ec='navy', linewidth=3)

plt.show()

import numpy as np

import matplotlib.pyplot as plt

np.random.seed(0)

x = np.random.normal(0, 1, 100)

fig, axs = plt.subplots(1, 2, figsize=(6.4, 2.4))

axs[0].hist(x, color='lightseagreen', edgecolor='k')

axs[1].hist(x, color='lightcoral', ec='navy', linewidth=3)

plt.show()

ビンの幅

rwidthでビンの幅を指定する。デフォルトはrwidth=1で各ビンが密着。

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)

x = np.random.normal(0, 1, 100)

fig, axs = plt.subplots(1, 2, figsize=(6.4, 2.4))

axs[0].hist(x, rwidth=0.8, ec='k')
axs[1].hist(x, bins=13, width=1.2, ec='k')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

np.random.seed(0)

x = np.random.normal(0, 1, 100)

fig, axs = plt.subplots(1, 2, figsize=(6.4, 2.4))

axs[0].hist(x, rwidth=0.8, ec='k')

axs[1].hist(x, bins=13, width=1.2, ec='k')

plt.show()

ヒストグラムのタイプ

histtype='type'でヒストグラムのタイプを指定する。

bar: 一般的なヒストグラムの形状。
barstacked: 複数のヒストグラムの場合に、同じビンの値を積み上げていく。
step: ビンの間の境界を描かない。
stepfilled: ビンの間の境界を描かず、中を塗りつぶす。

以下の例では、'bar'、'step'、'stepfilled'について例示。

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)

x = np.random.normal(0, 1, 100)

fig, axs = plt.subplots(1, 3, figsize=(9.6, 2.4))

axs[0].hist(x, histtype='bar', ec='k')
axs[1].hist(x, histtype='step', ec='k')
axs[2].hist(x, histtype='stepfilled', ec='k')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

np.random.seed(0)

x = np.random.normal(0, 1, 100)

fig, axs = plt.subplots(1, 3, figsize=(9.6, 2.4))

axs[0].hist(x, histtype='bar', ec='k')

axs[1].hist(x, histtype='step', ec='k')

axs[2].hist(x, histtype='stepfilled', ec='k')

plt.show()

複数のヒストグラム

単純な重ね合わせ

複数のヒストグラムを重ね合わせるには、同じターゲットに対して各データについてhistを実行する。

単に重ね合わせると、初めの方のヒストグラムが後の方で塗りつぶされてしまうので、それらを見えるようにするにはalphaで透明度を指定する。

ただし単に重ね合わせただけの場合、各ヒストグラムのビンの境界が必ずしも一致しない。

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)

x1 = np.random.normal(-2, 1, 500)
x2 = np.random.normal(0, 1, 500)
x3 = np.random.normal(2, 1, 500)

fig, axs = plt.subplots(1,2, figsize=(6.4, 2.4))

axs[0].hist(x1, color='c', edgecolor='k')
axs[0].hist(x2, color='m', edgecolor='k')
axs[0].hist(x3, color='g', edgecolor='k')

axs[1].hist(x1, color='c', edgecolor='k', alpha=0.5)
axs[1].hist(x2, color='m', edgecolor='k', alpha=0.5)
axs[1].hist(x3, color='g', edgecolor='k', alpha=0.5)

plt.show()

import numpy as np

import matplotlib.pyplot as plt

np.random.seed(0)

x1 = np.random.normal(-2, 1, 500)

x2 = np.random.normal(0, 1, 500)

x3 = np.random.normal(2, 1, 500)

fig, axs = plt.subplots(1,2, figsize=(6.4, 2.4))

axs[0].hist(x1, color='c', edgecolor='k')

axs[0].hist(x2, color='m', edgecolor='k')

axs[0].hist(x3, color='g', edgecolor='k')

axs[1].hist(x1, color='c', edgecolor='k', alpha=0.5)

axs[1].hist(x2, color='m', edgecolor='k', alpha=0.5)

axs[1].hist(x3, color='g', edgecolor='k', alpha=0.5)

plt.show()

ヒストグラムとplotの重ね合わせ

こちらを参照。

ビン境界の整合

複数のヒストグラムのビンの境界を一致させるには以下の方法がある。

各グラフに対して同じrangeとbins=nを指定する
各グラフに対して同じbins=sequenceを指定する

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)

x1 = np.random.normal(-2, 1, 500)
x2 = np.random.normal(0, 1, 500)
x3 = np.random.normal(2, 1, 500)

fig, axs = plt.subplots(1,2, figsize=(6.4, 2.4))

axs[0].hist(x1, range=(-4, 4), bins=20, color='c', ec='k', alpha=0.5)
axs[0].hist(x2, range=(-4, 4), bins=20, color='m', ec='k', alpha=0.5)
axs[0].hist(x3, range=(-4, 4), bins=20, color='g', ec='k', alpha=0.5)

bn = np.linspace(-4, 4, 21)
axs[1].hist(x1, bins=bn, color='c', ec='k', alpha=0.5)
axs[1].hist(x2, bins=bn, color='m', ec='k', alpha=0.5)
axs[1].hist(x3, bins=bn, color='g', ec='k', alpha=0.5)

plt.show()

import numpy as np

import matplotlib.pyplot as plt

np.random.seed(0)

x1 = np.random.normal(-2, 1, 500)

x2 = np.random.normal(0, 1, 500)

x3 = np.random.normal(2, 1, 500)

fig, axs = plt.subplots(1,2, figsize=(6.4, 2.4))

axs[0].hist(x1, range=(-4, 4), bins=20, color='c', ec='k', alpha=0.5)

axs[0].hist(x2, range=(-4, 4), bins=20, color='m', ec='k', alpha=0.5)

axs[0].hist(x3, range=(-4, 4), bins=20, color='g', ec='k', alpha=0.5)

bn = np.linspace(-4, 4, 21)

axs[1].hist(x1, bins=bn, color='c', ec='k', alpha=0.5)

axs[1].hist(x2, bins=bn, color='m', ec='k', alpha=0.5)

axs[1].hist(x3, bins=bn, color='g', ec='k', alpha=0.5)

plt.show()

並べる・積み上げる

複数のデータを配列とした場合（複数の1次元のデータを並べて2次元配列として与えた場合）、デフォルトでは各ビンが横に並べられる。

また、stacked=Trueあるいはhisttype='barstacked'を指定した場合、同じ階級のビンが積み上げられる。

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)

x1 = np.random.normal(-2, 1, 500)
x2 = np.random.normal(0, 1, 500)
x3 = np.random.normal(2, 1, 500)
X = [x1, x2, x3]

fig, axs = plt.subplots(1,3, figsize=(9.6, 2.4))

axs[0].hist(X, bins=20, ec='k')
axs[1].hist(X, bins=20, stacked=True, ec='k')
axs[2].hist(X, bins=20, histtype='barstacked', ec='k')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

np.random.seed(0)

x1 = np.random.normal(-2, 1, 500)

x2 = np.random.normal(0, 1, 500)

x3 = np.random.normal(2, 1, 500)

X = [x1, x2, x3]

fig, axs = plt.subplots(1,3, figsize=(9.6, 2.4))

axs[0].hist(X, bins=20, ec='k')

axs[1].hist(X, bins=20, stacked=True, ec='k')

axs[2].hist(X, bins=20, histtype='barstacked', ec='k')

plt.show()

戻り値

n: 各ビンの値(度数または頻度)
bins: 各ビンの境界値
patches: ヒストグラム描画に使われたpatcheオブジェクトのリスト

単一のヒストグラムの場合

以下の例では、nの結果として10個のビンの度数が得られている。

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)

x = np.random.normal(0, 1, 100)

fig, ax = plt.subplots(1, 1, figsize=(6.4, 2.4))

n, bins, patches = ax.hist(x)

print(n)
print(bins)
print(patches)

plt.show()

# [0.02073508 0.10367541 0.14514557 0.26955606 0.35249639 0.37323147
#  0.33176131 0.2280859  0.14514557 0.10367541]
# [-2.55298982 -2.07071537 -1.58844093 -1.10616648 -0.62389204 -0.1416176
#   0.34065685  0.82293129  1.30520574  1.78748018  2.26975462]
# <a list of 10 Patch objects>

import numpy as np

import matplotlib.pyplot as plt

np.random.seed(0)

x = np.random.normal(0, 1, 100)

fig, ax = plt.subplots(1, 1, figsize=(6.4, 2.4))

n, bins, patches = ax.hist(x)

print(n)

print(bins)

print(patches)

plt.show()

# [0.02073508 0.10367541 0.14514557 0.26955606 0.35249639 0.37323147

# 0.33176131 0.2280859 0.14514557 0.10367541]

# [-2.55298982 -2.07071537 -1.58844093 -1.10616648 -0.62389204 -0.1416176

# 0.34065685 0.82293129 1.30520574 1.78748018 2.26975462]

# <a list of 10 Patch objects>

複数のヒストグラムの配列の場合

複数のデータを配列で与えた場合の戻り値。nが3つのデータごとの配列になっている。

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)

x1 = np.random.normal(-2, 1, 500)
x2 = np.random.normal(0, 1, 500)
x3 = np.random.normal(2, 1, 500)
X = [x1, x2, x3]

fig, ax = plt.subplots(1,1, figsize=(3.2, 2.4))

n, bins, patches = ax.hist(X, bins=20, ec='k')

print(n)
print(bins)
print(patches)

plt.show()

# [array([ 6., 10., 34., 59., 98., 92., 87., 62., 24., 20.,  8.,  0.,  0.,
#         0.,  0.,  0.,  0.,  0.,  0.,  0.]), array([ 0.,  0.,  0.,  2.,  5., 12., 33., 64., 89., 99., 89., 55., 33.,
#        12.,  6.,  1.,  0.,  0.,  0.,  0.]), array([  0.,   0.,   0.,   0.,   0.,   0.,   0.,   2.,   2.,  11.,  27.,
#         54.,  89.,  89., 106.,  66.,  34.,  12.,   6.,   2.])]
# [-4.77259276 -4.27541438 -3.778236   -3.28105763 -2.78387925 -2.28670087
#  -1.7895225  -1.29234412 -0.79516574 -0.29798737  0.19919101  0.69636938
#   1.19354776  1.69072614  2.18790451  2.68508289  3.18226127  3.67943964
#   4.17661802  4.6737964   5.17097477]
# <a list of 3 Lists of Patches objects>

import numpy as np

import matplotlib.pyplot as plt

np.random.seed(0)

x1 = np.random.normal(-2, 1, 500)

x2 = np.random.normal(0, 1, 500)

x3 = np.random.normal(2, 1, 500)

X = [x1, x2, x3]

fig, ax = plt.subplots(1,1, figsize=(3.2, 2.4))

n, bins, patches = ax.hist(X, bins=20, ec='k')

print(n)

print(bins)

print(patches)

plt.show()

# [array([ 6., 10., 34., 59., 98., 92., 87., 62., 24., 20., 8., 0., 0.,

# 0., 0., 0., 0., 0., 0., 0.]), array([ 0., 0., 0., 2., 5., 12., 33., 64., 89., 99., 89., 55., 33.,

# 12., 6., 1., 0., 0., 0., 0.]), array([ 0., 0., 0., 0., 0., 0., 0., 2., 2., 11., 27.,

# 54., 89., 89., 106., 66., 34., 12., 6., 2.])]

# [-4.77259276 -4.27541438 -3.778236 -3.28105763 -2.78387925 -2.28670087

# -1.7895225 -1.29234412 -0.79516574 -0.29798737 0.19919101 0.69636938

# 1.19354776 1.69072614 2.18790451 2.68508289 3.18226127 3.67943964

# 4.17661802 4.6737964 5.17097477]

# <a list of 3 Lists of Patches objects>

Python3 – 配列要素の重複を除く

2020-02-29 / tau / コメントする

リスト要素の重複を除く

リストをset()関数の引数にすると、重複する要素がなくなり全ての要素がユニークになる。

import random

a = [random.randint(0, 2) for n in range(20)]

print(a)
print(set(a))

# [2, 2, 2, 2, 1, 0, 1, 0, 0, 0, 0, 1, 1, 2, 1, 1, 0, 2, 2, 0]
# {0, 1, 2}

import random

a = [random.randint(0, 2) for n in range(20)]

print(a)

print(set(a))

# [2, 2, 2, 2, 1, 0, 1, 0, 0, 0, 0, 1, 1, 2, 1, 1, 0, 2, 2, 0]

# {0, 1, 2}

ただし結果は集合なので、これをリスト化するにはlist()関数を使う。

set()関数の結果、要素は昇順に並んでいるが、hashの計算方法によって必ずしも昇順になると決まってはいない。そこでsorted()関数でリストをソートしておく(生成されたリストをほかで再利用しないなら、sort()メソッドを使ってもよい)。

import random

a = [random.randint(0, 2) for n in range(20)]

print(a)
print(sorted(list(set(a))))

# [2, 2, 2, 2, 1, 0, 1, 0, 0, 0, 0, 1, 1, 2, 1, 1, 0, 2, 2, 0]
# [0, 1, 2]

import random

a = [random.randint(0, 2) for n in range(20)]

print(a)

print(sorted(list(set(a))))

# [2, 2, 2, 2, 1, 0, 1, 0, 0, 0, 0, 1, 1, 2, 1, 1, 0, 2, 2, 0]

# [0, 1, 2]

ndarrayの要素の重複を除く

元の配列がndarrayで与えられた場合でも、set()関数を適用すると結果は集合となるが、ndarrayの生成時に引数を集合とすると配列として生成されずに集合のまま。

import numpy as np
import numpy.random as rnd

a = rnd.randint(0, 3, 20)

print(a)
print(set(a))
print(np.array(set(a)))

# [2 2 1 2 2 0 1 1 0 1 0 1 2 1 0 1 0 2 2 0]
# {0, 1, 2}
# {0, 1, 2}

import numpy as np

import numpy.random as rnd

a = rnd.randint(0, 3, 20)

print(a)

print(set(a))

print(np.array(set(a)))

# [2 2 1 2 2 0 1 1 0 1 0 1 2 1 0 1 0 2 2 0]

# {0, 1, 2}

そこで、list()関数でいったん集合をリスト化してからndarrayにする必要がある。

また、要素の昇順を保証するためにnp.array()関数でソートしておく。生成した配列を再利用しないのであれば、ndarrayのsort()メソッドを使ってもよい。

import numpy as np
import numpy.random as rnd

a = rnd.randint(0, 3, 20)

print(a)
print(np.array(list(set(a))))

# [2 1 2 0 0 0 2 2 2 1 0 1 0 2 0 0 2 0 2 2]
# [0 1 2]

import numpy as np

import numpy.random as rnd

a = rnd.randint(0, 3, 20)

print(a)

print(np.array(list(set(a))))

# [2 1 2 0 0 0 2 2 2 1 0 1 0 2 0 0 2 0 2 2]

# [0 1 2]

利用例～クラス値を持つデータの分類

たとえば多数のデータの特性値とクラス区分が配列で与えられた場合、クラスごとにマーカーの形や色を変えてプロットするなど、クラスごとに元のデータを分けて処理したい場合。

以下のようにset()関数で重複を除いてループ処理できる。

import numpy as np

X = np.array(['swallow', 'carp', 'dog', 'horse', 'hawk', 'bonito'])
y = np.array([1, 0, 3, 3, 1, 0])

for cls in set(y):
    print(X[y==cls])

# ['carp' 'bonito']
# ['swallow' 'hawk']
# ['dog' 'horse']

import numpy as np

X = np.array(['swallow', 'carp', 'dog', 'horse', 'hawk', 'bonito'])

y = np.array([1, 0, 3, 3, 1, 0])

for cls in set(y):

print(X[y==cls])

# ['carp' 'bonito']

# ['swallow' 'hawk']

# ['dog' 'horse']

matplotlib.pyplot.quiver – ベクトル場

2020-02-23 / tau / 3件のコメント

概要

matplotlib.pyplot.quiver()はベクトル場を可視化する。基本的なパラメーターは以下の通り。

quiver(X, Y, U, V, [C]): X, Yはベクトルの開始点、U, Vはベクトルの成分、Cはベクトルの大きさに応じたカラーマップ上の色をつけるための配列。

単一のベクトルの描画例

以下の例では、始点の位置と成分を1つずつ指定してベクトルを描画している。デフォルトではベクトルのスケールは描画領域に対して自動的に調節されるが、ここでは描画領域のスケールと同じになるようパラメーターを設定している。

matplotlibのドキュメントでは、scale_unitsのところに以下のように書かれている。

“To plot vectors in the x-y plane, with u and v having the same units as x and y, use angles='xy', scale_units='xy', scale=1”

import numpy as np
import matplotlib.pyplot as plt

fig, ax = plt.subplots()

ax.quiver(1, 2, color='blue', angles='xy', scale_units='xy', scale=1)
ax.quiver(1, 2, 2, -1, color='red', angles='xy', scale_units='xy', scale=1)

ax.set_xlim(0, 3)
ax.set_ylim(0, 3)

ax.set_xticks(np.arange(0, 4, 1))
ax.set_yticks(np.arange(0, 4, 1))

ax.grid()
ax.set_aspect('equal')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

fig, ax = plt.subplots()

ax.quiver(1, 2, color='blue', angles='xy', scale_units='xy', scale=1)

ax.quiver(1, 2, 2, -1, color='red', angles='xy', scale_units='xy', scale=1)

ax.set_xlim(0, 3)

ax.set_ylim(0, 3)

ax.set_xticks(np.arange(0, 4, 1))

ax.set_yticks(np.arange(0, 4, 1))

ax.grid()

ax.set_aspect('equal')

plt.show()

ベクトル場の描画例

以下の例では、xy平面上の位置に応じた成分を持つベクトルを描画している。関数のgradientのイメージ。

描画にあたって、開始点の座標とベクトルの成分をmeshgridで生成している。

1つ目の図は単に開始点と成分を与えただけで、単一の色で、ベクトルのスケールは自動調節されている。

2つ目の図はスケールと色付けのための配列を指定し、ベクトルの大きさに応じてcolormapで色を付けている。

import numpy as np
import matplotlib.pyplot as plt

x = y = np.arange(-5, 6)
u = 2 * x
v = 3 * y

x, y = np.meshgrid(x, y)
u, v = np.meshgrid(u, v)

fig, axes = plt.subplots(1, 2, figsize=(12, 4.8))

lim = 8
for ax in axes:
    ax.set_xlim(-lim, lim)
    ax.set_ylim(-lim, lim)

    ax.set_xticks(np.arange(-lim, lim, 1))
    ax.set_yticks(np.arange(-lim, lim, 1))

    ax.grid()
    ax.set_aspect('equal')

C = np.sqrt(u * u + v * v)
axes[0].quiver(x, y, u, v)
axes[1].quiver(x, y, u, v, C, scale=100, cmap='Blues')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = y = np.arange(-5, 6)

u = 2 * x

v = 3 * y

x, y = np.meshgrid(x, y)

u, v = np.meshgrid(u, v)

fig, axes = plt.subplots(1, 2, figsize=(12, 4.8))

lim = 8

for ax in axes:

ax.set_xlim(-lim, lim)

ax.set_ylim(-lim, lim)

ax.set_xticks(np.arange(-lim, lim, 1))

ax.set_yticks(np.arange(-lim, lim, 1))

ax.grid()

ax.set_aspect('equal')

C = np.sqrt(u * u + v * v)

axes[0].quiver(x, y, u, v)

axes[1].quiver(x, y, u, v, C, scale=100, cmap='Blues')

plt.show()

Axes.spines～軸の設定

2020-02-22 / tau / コメントする

概要

グラフのx軸、y軸の位置や表示の有無については、Axesオブジェクトのspinesプロパティーで制御する。

spinesは辞書型でbottom、top、left、rightのキーで対象を指定し、表示位置はset_position()メソッド、表示の有無はset_visible()で操作する。

軸の指定

spines['bottom']とspines['left']は下と左の軸で、軸の値が表示される。

spines['top']とspines['right']は上と右の軸で、ただ線が引かれるだけ。

各軸に対して、set_positon()、set_visible()の各メソッドを実行して、位置や可視／不可視を設定する。

軸の表示・非表示

set_visible(False)で軸を非表示にする。

以下の例では、上の軸と右の軸を非表示にしている。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 1, 50)

fig , axes = plt.subplots(1, 2)

for ax in axes:
    ax.plot(x, 1 - x)

    ax.set_xlim(0, 1)
    ax.set_ylim(0, 1)

    ax.grid()
    ax.set_aspect('equal')

axes[1].spines['right'].set_visible(False)
axes[1].spines['top'].set_visible(False)

plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(0, 1, 50)

fig , axes = plt.subplots(1, 2)

for ax in axes:

ax.plot(x, 1 - x)

ax.set_xlim(0, 1)

ax.set_ylim(0, 1)

ax.grid()

ax.set_aspect('equal')

axes[1].spines['right'].set_visible(False)

axes[1].spines['top'].set_visible(False)

plt.show()

ゼロ位置／中央に軸位置を設定

set_position(‘zero’)でゼロの位置に、set_position(‘center’)で描画位置の中央に軸をセットできる。set_visible()と組み合わせて使うケースが多そう。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-1, 2, 50)

fig , axes = plt.subplots(1, 2)

for ax in axes:
    ax.plot(x, 1 - x)

    ax.set_xlim(-1, 2)
    ax.set_ylim(-1, 2)

    ax.set_xticks(np.arange(-1, 2.5, 0.5))
    ax.set_yticks(np.arange(-1, 2.5, 0.5))

    ax.grid()
    ax.set_aspect('equal')

    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)

axes[0].spines['bottom'].set_position('zero')
axes[0].spines['left'].set_position('zero')

axes[1].spines['bottom'].set_position('center')
axes[1].spines['left'].set_position('center')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(-1, 2, 50)

fig , axes = plt.subplots(1, 2)

for ax in axes:

ax.plot(x, 1 - x)

ax.set_xlim(-1, 2)

ax.set_ylim(-1, 2)

ax.set_xticks(np.arange(-1, 2.5, 0.5))

ax.set_yticks(np.arange(-1, 2.5, 0.5))

ax.grid()

ax.set_aspect('equal')

ax.spines['right'].set_visible(False)

ax.spines['top'].set_visible(False)

axes[0].spines['bottom'].set_position('zero')

axes[0].spines['left'].set_position('zero')

axes[1].spines['bottom'].set_position('center')

axes[1].spines['left'].set_position('center')

plt.show()

軸の位置の数値指定

set_position()の引数として、タプルで('指定方法', 値)の形で与える。

指定方法	値
data	各軸を配置するx、yの値。
outward	単位はポイントで、正なら描画領域の内側、負なら外側に配置。
axes	描画領域の高さ・幅に対する割合。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 10, 50)

fig , axes = plt.subplots(2, 2)

for row in axes:
    for ax in row:
        ax.plot(x, 10 - x)

        ax.set_xlim(0, 10)
        ax.set_ylim(0, 10)

        ax.set_xticks(np.arange(0, 12, 2))
        ax.set_yticks(np.arange(0, 12, 2))

        ax.grid()
        ax.set_aspect('equal')

ax = axes[0, 1]
ax.set_title('data')
ax.spines['bottom'].set_position(('data', 1))
ax.spines['left'].set_position(('data', 2))
ax.spines['top'].set_position(('data', 7))
ax.spines['right'].set_position(('data', 8))

ax = axes[1, 0]
ax.set_title('outward')
ax.spines['bottom'].set_position(('outward', 5))
ax.spines['left'].set_position(('outward', -10))
ax.spines['top'].set_position(('outward', 15))
ax.spines['right'].set_position(('outward', -20))

ax = axes[1, 1]
ax.set_title('axes')
ax.spines['bottom'].set_position(('axes', 0.1))
ax.spines['left'].set_position(('axes', 0.2))
ax.spines['top'].set_position(('axes', 0.7))
ax.spines['right'].set_position(('axes', 0.8))

fig.subplots_adjust(hspace=0.5)
plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(0, 10, 50)

fig , axes = plt.subplots(2, 2)

for row in axes:

for ax in row:

ax.plot(x, 10 - x)

ax.set_xlim(0, 10)

ax.set_ylim(0, 10)

ax.set_xticks(np.arange(0, 12, 2))

ax.set_yticks(np.arange(0, 12, 2))

ax.grid()

ax.set_aspect('equal')

ax = axes[0, 1]

ax.set_title('data')

ax.spines['bottom'].set_position(('data', 1))

ax.spines['left'].set_position(('data', 2))

ax.spines['top'].set_position(('data', 7))

ax.spines['right'].set_position(('data', 8))

ax = axes[1, 0]

ax.set_title('outward')

ax.spines['bottom'].set_position(('outward', 5))

ax.spines['left'].set_position(('outward', -10))

ax.spines['top'].set_position(('outward', 15))

ax.spines['right'].set_position(('outward', -20))

ax = axes[1, 1]

ax.set_title('axes')

ax.spines['bottom'].set_position(('axes', 0.1))

ax.spines['left'].set_position(('axes', 0.2))

ax.spines['top'].set_position(('axes', 0.7))

ax.spines['right'].set_position(('axes', 0.8))

fig.subplots_adjust(hspace=0.5)

plt.show()

Python3 – コレクションのソート

2020-02-10 / tau / コメントする

リストのソート

`sort()`メソッドは破壊的処理

sort()はリストのメソッドで、元のリストの内容を変更する(破壊的処理)。メソッドの実行結果はNone。

降順にソートしたいときは、引数reverseをTrueで指定。

lst = [3, 2, 1, 5, 4]

print(lst.sort())
print(lst)

lst.sort(reverse=True)
print(lst)

# None
# [1, 2, 3, 4, 5]
# [5, 4, 3, 2, 1]

lst = [3, 2, 1, 5, 4]

print(lst.sort())

print(lst)

lst.sort(reverse=True)

print(lst)

# None

# [1, 2, 3, 4, 5]

# [5, 4, 3, 2, 1]

`sorted()`関数は非破壊的処理

sorted()関数は引数のリストのソート結果を返す。元のリストの内容は変更されない(非破壊的処理)。

降順ソートの指定はsort()メソッドと同じ。

lst = [3, 2, 1, 5, 4]

print(sorted(lst))
print(lst)

print(sorted(lst, reverse=True))

print()

# [1, 2, 3, 4, 5]
# [3, 2, 1, 5, 4]
# [5, 4, 3, 2, 1]

lst = [3, 2, 1, 5, 4]

print(sorted(lst))

print(lst)

print(sorted(lst, reverse=True))

print()

# [1, 2, 3, 4, 5]

# [3, 2, 1, 5, 4]

# [5, 4, 3, 2, 1]

文字列は辞書順でソートされる

lst = ['ca', 'ba', 'aa', 'bb', 'ab']

print(sorted(lst))

print()

# ['aa', 'ab', 'ba', 'bb', 'ca']

lst = ['ca', 'ba', 'aa', 'bb', 'ab']

print(sorted(lst))

print()

# ['aa', 'ab', 'ba', 'bb', 'ca']

`ndarray`の場合の注意

`sorted()`はそのままでは`ndarray`にならない

ndarrayをsorted()関数の引数にすると、エラーにはならないが結果はリストで返されるため、配列への変換が必要。

import numpy as np

a = np.array([3, 2, 1, 5, 4])
print(sorted(a))
print(np.array(sorted(a)))

print()

# [1, 2, 3, 4, 5]
# [1 2 3 4 5]

import numpy as np

a = np.array([3, 2, 1, 5, 4])

print(sorted(a))

print(np.array(sorted(a)))

print()

# [1, 2, 3, 4, 5]

# [1 2 3 4 5]

`numpy.sort()`は非破壊的に`ndarray`をソートできる

numpy.sort()関数は、引数のndarrayのソート結果を返し、元のndarrayは変更しない。リストの場合のsorted()関数と同じ動作。

a = np.array([3, 2, 1, 5, 4])

print(np.sort(a))
print(a)

# [1 2 3 4 5]
# [3 2 1 5 4]

a = np.array([3, 2, 1, 5, 4])

print(np.sort(a))

print(a)

# [1 2 3 4 5]

# [3 2 1 5 4]

`ndarray`の`sort()`メソッドは破壊的

ndarrayのsort()メソッドは、元の配列の内容を書き換える。リストのsort()メソッドと同じ挙動で、実行結果の戻り値はNone。

a = np.array([3, 2, 1, 5, 4])
print(a.sort())
print(a)

# None
# [1 2 3 4 5]

a = np.array([3, 2, 1, 5, 4])

print(a.sort())

print(a)

# None

# [1 2 3 4 5]

辞書のソート

今後

Python3 – zipによる複数リストの並行ループ

2020-02-10 / tau / コメントする

2つのリストの要素を並行して取得しつつ処理したい場合、zip()関数を用いる。

names = ['Jane', 'Bill', 'Lucy', 'Amanda']
ages = [34, 18, 25, 44]

for name, age in zip(names, ages):
    print("{} is {} years old.".format(name, age))

# Jane is 34 years old.
# Bill is 18 years old.
# Lucy is 25 years old.
# Amanda is 44 years old.

names = ['Jane', 'Bill', 'Lucy', 'Amanda']

ages = [34, 18, 25, 44]

for name, age in zip(names, ages):

print("{} is {} years old.".format(name, age))

# Jane is 34 years old.

# Bill is 18 years old.

# Lucy is 25 years old.

# Amanda is 44 years old.

zip()関数は、引数で与えた複数のコレクションの要素が対になったタプルのイテレーターを返す。各コレクションの長さが異なる場合、イテレーターの長さは最も短いコレクションの長さとなり、それ以降の各コレクションの要素は無視される。

lst1 = ['A', 'B', 'C']
lst3 = ['zero', 'one', 'two', 'three']
lst2 = [0, 1, 2, 3, 4]

z = zip(lst1, lst2, lst3)

for t in z:
    print(t)

# ('A', 0, 'zero')
# ('B', 1, 'one')
# ('C', 2, 'two')

lst1 = ['A', 'B', 'C']

lst3 = ['zero', 'one', 'two', 'three']

lst2 = [0, 1, 2, 3, 4]

z = zip(lst1, lst2, lst3)

for t in z:

print(t)

# ('A', 0, 'zero')

# ('B', 1, 'one')

# ('C', 2, 'two')

matplotlib.pyplot.contour/contourf – 等高線

2020-02-09 / tau / コメントする

contour～コンターライン

matplotlib.pyplot.contour()は2次元平面上のコンター(等値線)を描く。x, yの値から計算されたzの値が等しい点を曲線で結ぶ。

x, y, zの指定方法

引数がzのみの場合

以下のコードでは、zの値のみを2次元配列で与えている。この場合は、配列zのインデックスの値が座標値となる(行番号0～20がy座標、列番号0～16がx座標)。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 17)
y = np.linspace(-1.2, 1.2, 21)
z = np.array([[u*u + v*v for u in x] for v in y])

fig = plt.figure(figsize=(5, 5))
ax = fig.add_subplot(111)
ax.contour(z)
ax.set_aspect('equal')
plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 17)

y = np.linspace(-1.2, 1.2, 21)

z = np.array([[u*u + v*v for u in x] for v in y])

fig = plt.figure(figsize=(5, 5))

ax = fig.add_subplot(111)

ax.contour(z)

ax.set_aspect('equal')

plt.show()

引数に1次元のx, yを指定

以下のコードでは、x, yを1次元配列で引数として渡している。この場合は、x, yの値が座標値として用いられる(x座標が-1～1、y座標が-1.2～1.2)。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 17)
y = np.linspace(-1.2, 1.2, 21)
z = np.array([[u*u + v*v for u in x] for v in y])

fig = plt.figure(figsize=(5, 5))
ax = fig.add_subplot(111)
ax.contour(x, y, z)
ax.set_aspect('equal')
plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 17)

y = np.linspace(-1.2, 1.2, 21)

z = np.array([[u*u + v*v for u in x] for v in y])

fig = plt.figure(figsize=(5, 5))

ax = fig.add_subplot(111)

ax.contour(x, y, z)

ax.set_aspect('equal')

plt.show()

meshgridを使う方法

最も一般的な方法。x, yの1次元配列からnumpy.meshgrid()メソッドでそれぞれの2次元配列を生成し、それらを使ってzを計算するとともに、contour()の引数に与える。結果は上と同じになる。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 17)
y = np.linspace(-1.2, 1.2, 21)
x, y = np.meshgrid(x, y)
z = x * x + y * y

fig = plt.figure(figsize=(5, 5))
ax = fig.add_subplot(111)
ax.contour(x, y, z)
ax.set_aspect('equal')
plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 17)

y = np.linspace(-1.2, 1.2, 21)

x, y = np.meshgrid(x, y)

z = x * x + y * y

fig = plt.figure(figsize=(5, 5))

ax = fig.add_subplot(111)

ax.contour(x, y, z)

ax.set_aspect('equal')

plt.show()

ラベル

コンターの値をラベル表示させる。手順としては、contour()で描画した際の戻り値のオブジェクトを保存しておき、それをclabel()の引数として与える。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)
y = np.linspace(-1, 1, 21)
z = np.array([[u*u + v*v for u in x] for v in y])

fig = plt.figure(figsize=(5, 5))
ax = fig.add_subplot(111)
cntr = ax.contour(x, y, z)
ax.clabel(cntr)
ax.set_aspect('equal')
plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)

y = np.linspace(-1, 1, 21)

z = np.array([[u*u + v*v for u in x] for v in y])

fig = plt.figure(figsize=(5, 5))

ax = fig.add_subplot(111)

cntr = ax.contour(x, y, z)

ax.clabel(cntr)

ax.set_aspect('equal')

plt.show()

コンターレベル

コンターの数をlevelsで指定する。数値で指定する方法と、配列等で指定する方法があるが、数値で指定する方法は条件によって期待した結果にならないことがある。

数値による指定

1つの整数値で指定する場合、ドキュメンテーションでは以下のように書かれている。

“If an int n, use n data intervals; i.e. draw n+1 contour lines. The level heights are automatically chosen.”

すなわち、n個の間隔に対してn+1本のコンターが描かれることになっている。

以下のコードを実行してみる。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)
y = np.linspace(-1, 1, 21)
x, y = np.meshgrid(x, y)
z1 = x + y
z2 = x + y + 2

fig = plt.figure(figsize=(12, 5))

ax1 = fig.add_subplot(121)
cntr = ax1.contour(x, y, z1, levels=5)
ax1.clabel(cntr)
ax1.set_aspect('equal')

ax2 = fig.add_subplot(122)
cntr = ax2.contour(x, y, z2, levels=5)
ax2.clabel(cntr)
ax2.set_aspect('equal')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)

y = np.linspace(-1, 1, 21)

x, y = np.meshgrid(x, y)

z1 = x + y

z2 = x + y + 2

fig = plt.figure(figsize=(12, 5))

ax1 = fig.add_subplot(121)

cntr = ax1.contour(x, y, z1, levels=5)

ax1.clabel(cntr)

ax1.set_aspect('equal')

ax2 = fig.add_subplot(122)

cntr = ax2.contour(x, y, z2, levels=5)

ax2.clabel(cntr)

ax2.set_aspect('equal')

plt.show()

範囲と関数の関係によって、同じlevels=5を指定しているのにコンターの本数が異なる。ドキュメント通りなら、5つの間隔に対して6本のコンターが描かれるはずだが、左は6つの間隔に対して5本、右は5つの間隔に対して4本。

右の図の場合は左下でz2 = 0、右上でz2 = 4となり、コーナーの点がコンターに含まれているとすると勘定は合う。左はこれが合わないが、桁落ちなのかゼロが含まれるときに挙動が違うのか、よくわからない。

以下のようにlevels=1とした場合も、挙動が一定しない。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-2, 2, 21)
y = np.linspace(-2, 2, 21)
x, y = np.meshgrid(x, y)
z1 = x * x + y * y - 1
z2 = x * x + y * y

fig = plt.figure(figsize=(12, 5))

ax1 = fig.add_subplot(121)
cntr = ax1.contour(x, y, z1, levels=1)
ax1.clabel(cntr)
ax1.set_aspect('equal')

ax2 = fig.add_subplot(122)
cntr = ax2.contour(x, y, z2, levels=1)
ax2.clabel(cntr)
ax2.set_aspect('equal')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(-2, 2, 21)

y = np.linspace(-2, 2, 21)

x, y = np.meshgrid(x, y)

z1 = x * x + y * y - 1

z2 = x * x + y * y

fig = plt.figure(figsize=(12, 5))

ax1 = fig.add_subplot(121)

cntr = ax1.contour(x, y, z1, levels=1)

ax1.clabel(cntr)

ax1.set_aspect('equal')

ax2 = fig.add_subplot(122)

cntr = ax2.contour(x, y, z2, levels=1)

ax2.clabel(cntr)

ax2.set_aspect('equal')

plt.show()

配列等による指定

levelsをリストなどで指定すると、その要素で指定された値のコンターを描く。要素は昇順でなければならない(昇順でない場合は実行時エラー)。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)
y = np.linspace(-1, 1, 21)
x, y = np.meshgrid(x, y)
z = x + y

fig = plt.figure(figsize=(6, 5))

ax = fig.add_subplot(111)
cntr = ax.contour(x, y, z, levels=[-2, -1, 0, 0.5, 1, 1.25, 1.5])
ax.clabel(cntr)
ax.set_aspect('equal')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)

y = np.linspace(-1, 1, 21)

x, y = np.meshgrid(x, y)

z = x + y

fig = plt.figure(figsize=(6, 5))

ax = fig.add_subplot(111)

cntr = ax.contour(x, y, z, levels=[-2, -1, 0, 0.5, 1, 1.25, 1.5])

ax.clabel(cntr)

ax.set_aspect('equal')

plt.show()

線のデザイン

linewidths～線の太さ

線の太さはlinewidthsで指定する。1つの数値で指定した場合は全てのコンターラインに適用、配列等で指定した場合は、サイクリックにその太さが適用される。赤字で示したように、引数名の最後に”s”が着く点に注意。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)
y = np.linspace(-1, 1, 21)
x, y = np.meshgrid(x, y)
z = x * x + y * y - 1

fig = plt.figure(figsize=(12, 5))

ax1 = fig.add_subplot(121)
ax1.contour(x, y, z, linewidths=3.0)
ax1.set_aspect('equal')

ax2 = fig.add_subplot(122)
ax2.contour(x, y, z, linewidths=[1, 2, 3])
ax2.set_aspect('equal')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)

y = np.linspace(-1, 1, 21)

x, y = np.meshgrid(x, y)

z = x * x + y * y - 1

fig = plt.figure(figsize=(12, 5))

ax1 = fig.add_subplot(121)

ax1.contour(x, y, z, linewidths=3.0)

ax1.set_aspect('equal')

ax2 = fig.add_subplot(122)

ax2.contour(x, y, z, linewidths=[1, 2, 3])

ax2.set_aspect('equal')

plt.show()

linestyles～線のスタイル

linestilesで、線のスタイルをスタイル名で指定する。複数指定した場合はサイクリックに適用される。

なお、線の色に単色を用いた場合は、負の値のコンターラインが破線('dashed')で描かれるが、この例は次のcolorsのところで示す。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)
y = np.linspace(-1, 1, 21)
x, y = np.meshgrid(x, y)
z = x * x + y * y - 1

fig = plt.figure(figsize=(12, 5))

ax1 = fig.add_subplot(121)
ax1.contour(x, y, z, linestyles='dashdot')
ax1.set_aspect('equal')

ax2 = fig.add_subplot(122)
ax2.contour(x, y, z, linestyles=['solid', 'dashed', 'dashdot', 'dotted'])
ax2.set_aspect('equal')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)

y = np.linspace(-1, 1, 21)

x, y = np.meshgrid(x, y)

z = x * x + y * y - 1

fig = plt.figure(figsize=(12, 5))

ax1 = fig.add_subplot(121)

ax1.contour(x, y, z, linestyles='dashdot')

ax1.set_aspect('equal')

ax2 = fig.add_subplot(122)

ax2.contour(x, y, z, linestyles=['solid', 'dashed', 'dashdot', 'dotted'])

ax2.set_aspect('equal')

plt.show()

colors～線の色

コンターラインの色はcolorsで指定する。配列等で指定するのが標準だが、単色の場合は配列化せず色名のみで指定可能。ただしその場合は、red, blue等の色名による指定方法のみ。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)
y = np.linspace(-1, 1, 21)
x, y = np.meshgrid(x, y)
z = x * x + y * y - 1

fig = plt.figure(figsize=(12, 5))

ax1 = fig.add_subplot(121)
ax1.contour(x, y, z, colors='blue')
ax1.set_aspect('equal')

ax2 = fig.add_subplot(122)
ax2.contour(x, y, z, colors=['blue', 'green', 'red'])
ax2.set_aspect('equal')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)

y = np.linspace(-1, 1, 21)

x, y = np.meshgrid(x, y)

z = x * x + y * y - 1

fig = plt.figure(figsize=(12, 5))

ax1 = fig.add_subplot(121)

ax1.contour(x, y, z, colors='blue')

ax1.set_aspect('equal')

ax2 = fig.add_subplot(122)

ax2.contour(x, y, z, colors=['blue', 'green', 'red'])

ax2.set_aspect('equal')

plt.show()

単色指定の場合、負の値に対するコンターは破線となる。

If linestyles is None, the default is ‘solid’ unless the lines are monochrome. In that case, negative contours will take their linestyle fromrcParams["contour.negative_linestyle"] = 'dashed' setting.

cmap～カラーマップ

線の色にcolormapを適用できる。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)
y = np.linspace(-1, 1, 21)
x, y = np.meshgrid(x, y)
z = x * x + y * y - 1

fig = plt.figure(figsize=(12, 5))

ax1 = fig.add_subplot(121)
ax1.contour(x, y, z, cmap='summer')
ax1.set_aspect('equal')

ax2 = fig.add_subplot(122)
ax2.contour(x, y, z, cmap='seismic')
ax2.set_aspect('equal')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)

y = np.linspace(-1, 1, 21)

x, y = np.meshgrid(x, y)

z = x * x + y * y - 1

fig = plt.figure(figsize=(12, 5))

ax1 = fig.add_subplot(121)

ax1.contour(x, y, z, cmap='summer')

ax1.set_aspect('equal')

ax2 = fig.add_subplot(122)

ax2.contour(x, y, z, cmap='seismic')

ax2.set_aspect('equal')

plt.show()

alpha～透過度

線によるコンターではあまり意味がないが、線の透過度を0 ～1の実数で指定できる。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)
y = np.linspace(-1, 1, 21)
x, y = np.meshgrid(x, y)
z1 = (x + 0.5)**2 + y * y - 1
z2 = (x - 0.5)**2 + y * y - 1

fig = plt.figure(figsize=(8, 5))

ax = fig.add_subplot(111)
ax.contour(x, y, z1, alpha=0.5)
ax.contour(x, y, z2, alpha=0.5)
ax.set_aspect('equal')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)

y = np.linspace(-1, 1, 21)

x, y = np.meshgrid(x, y)

z1 = (x + 0.5)**2 + y * y - 1

z2 = (x - 0.5)**2 + y * y - 1

fig = plt.figure(figsize=(8, 5))

ax = fig.add_subplot(111)

ax.contour(x, y, z1, alpha=0.5)

ax.contour(x, y, z2, alpha=0.5)

ax.set_aspect('equal')

plt.show()

contourf～色付きのコンターエリア

contourfはコンターラインで区切られた各エリアを色付けする。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)
y = np.linspace(-1, 1, 21)
x, y = np.meshgrid(x, y)
z = x * x + y * y - 1

fig = plt.figure(figsize=(5, 5))
ax = fig.add_subplot(111)
ax.contourf(x, y, z)
ax.set_aspect('equal')
plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)

y = np.linspace(-1, 1, 21)

x, y = np.meshgrid(x, y)

z = x * x + y * y - 1

fig = plt.figure(figsize=(5, 5))

ax = fig.add_subplot(111)

ax.contourf(x, y, z)

ax.set_aspect('equal')

plt.show()

ラベル

contourfでラベルを付けることはあまり想定されないが、contourと同じようにすると図が崩れてしまう。contourとcontourfの合わせ技がよい。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)
y = np.linspace(-1, 1, 21)
x, y = np.meshgrid(x, y)
z = x * x + y * y - 1

fig = plt.figure(figsize=(12, 5))

ax1 = fig.add_subplot(121)
cntr = ax1.contourf(x, y, z)
ax1.clabel(cntr, colors='black')
ax1.set_aspect('equal')

ax2 = fig.add_subplot(122)
ax2.contourf(x, y, z)
cntr = ax2.contour(x, y, z)
ax2.clabel(cntr, colors='black')
ax2.set_aspect('equal')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)

y = np.linspace(-1, 1, 21)

x, y = np.meshgrid(x, y)

z = x * x + y * y - 1

fig = plt.figure(figsize=(12, 5))

ax1 = fig.add_subplot(121)

cntr = ax1.contourf(x, y, z)

ax1.clabel(cntr, colors='black')

ax1.set_aspect('equal')

ax2 = fig.add_subplot(122)

ax2.contourf(x, y, z)

cntr = ax2.contour(x, y, z)

ax2.clabel(cntr, colors='black')

ax2.set_aspect('equal')

plt.show()

デザイン

cmap～カラーマップ

コンターエリアの色分けにカラーマップを指定できる。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)
y = np.linspace(-1, 1, 21)
x, y = np.meshgrid(x, y)
z = x * x + y * y - 1

fig = plt.figure(figsize=(12, 5))

ax1 = fig.add_subplot(121)
ax1.contourf(x, y, z, cmap='seismic')
ax1.set_aspect('equal')

ax2 = fig.add_subplot(122)
ax2.contourf(x, y, z, cmap='cividis')
ax2.set_aspect('equal')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)

y = np.linspace(-1, 1, 21)

x, y = np.meshgrid(x, y)

z = x * x + y * y - 1

fig = plt.figure(figsize=(12, 5))

ax1 = fig.add_subplot(121)

ax1.contourf(x, y, z, cmap='seismic')

ax1.set_aspect('equal')

ax2 = fig.add_subplot(122)

ax2.contourf(x, y, z, cmap='cividis')

ax2.set_aspect('equal')

plt.show()

alpha～透過度

透過度を1未満に設定して、透過させることができる。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)
y = np.linspace(-1, 1, 21)
x, y = np.meshgrid(x, y)
z1 = (x + 0.5)**2 + y * y - 1
z2 = (x - 0.5)**2 + y * y - 1

fig = plt.figure(figsize=(12, 5))

ax1 = fig.add_subplot(131)
ax1.contourf(x, y, z1, cmap='autumn', alpha=0.5)
ax1.set_aspect('equal')

ax2 = fig.add_subplot(132)
ax2.contourf(x, y, z2, cmap='winter', alpha=0.5)
ax2.set_aspect('equal')

ax3 = fig.add_subplot(133)
ax3.contourf(x, y, z1, cmap='autumn', alpha=0.5)
ax3.contourf(x, y, z2, cmap='winter', alpha=0.5)
ax3.set_aspect('equal')

plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 21)

y = np.linspace(-1, 1, 21)

x, y = np.meshgrid(x, y)

z1 = (x + 0.5)**2 + y * y - 1

z2 = (x - 0.5)**2 + y * y - 1

fig = plt.figure(figsize=(12, 5))

ax1 = fig.add_subplot(131)

ax1.contourf(x, y, z1, cmap='autumn', alpha=0.5)

ax1.set_aspect('equal')

ax2 = fig.add_subplot(132)

ax2.contourf(x, y, z2, cmap='winter', alpha=0.5)

ax2.set_aspect('equal')

ax3 = fig.add_subplot(133)

ax3.contourf(x, y, z1, cmap='autumn', alpha=0.5)

ax3.contourf(x, y, z2, cmap='winter', alpha=0.5)

ax3.set_aspect('equal')

plt.show()

概要

基本形

複数系列

概要

データの取得とデータ構造

データの内容

'data'～特徴量データセット

'target'～アヤメの種類に対応したコード

'target_names'～アヤメの種類名

'feature_names'～特徴名

'filename'～ファイル名

'DESCR'～データセットの説明

データの利用

データの取得方法

全レコードの特徴量データの取得

特定の特徴量のデータのみ取得

特定のクラスのデータのみ抽出

概要

ヒストグラムの形式

度数分布

頻度分布

累積分布図

ビン数

レンジ

色・エッジの指定

ビンの幅

ヒストグラムのタイプ

複数のヒストグラム

単純な重ね合わせ

ヒストグラムとplotの重ね合わせ

ビン境界の整合

並べる・積み上げる

戻り値

単一のヒストグラムの場合

複数のヒストグラムの配列の場合

リスト要素の重複を除く

ndarrayの要素の重複を除く

利用例～クラス値を持つデータの分類

概要

単一のベクトルの描画例

ベクトル場の描画例

概要

軸の指定

軸の表示・非表示

ゼロ位置／中央に軸位置を設定

軸の位置の数値指定

リストのソート

sort()メソッドは破壊的処理

sorted()関数は非破壊的処理

文字列は辞書順でソートされる

ndarrayの場合の注意

sorted()はそのままではndarrayにならない

numpy.sort()は非破壊的にndarrayをソートできる

ndarrayのsort()メソッドは破壊的

辞書のソート

contour～コンターライン

x, y, zの指定方法

引数がzのみの場合

引数に1次元のx, yを指定

meshgridを使う方法

ラベル

コンターレベル

数値による指定

配列等による指定

線のデザイン

linewidths～線の太さ

linestyles～線のスタイル

colors～線の色

cmap～カラーマップ

alpha～透過度

contourf～色付きのコンターエリア

ラベル

デザイン

cmap～カラーマップ

alpha～透過度

`'data'`～特徴量データセット

`'target'`～アヤメの種類に対応したコード

`'target_names'`～アヤメの種類名

`'feature_names'`～特徴名

`'filename'`～ファイル名

`'DESCR'`～データセットの説明

`sort()`メソッドは破壊的処理

`sorted()`関数は非破壊的処理

`ndarray`の場合の注意

`sorted()`はそのままでは`ndarray`にならない

`numpy.sort()`は非破壊的に`ndarray`をソートできる

`ndarray`の`sort()`メソッドは破壊的