Normalizer

2020-10-08 / tau / コメントする

概要

sklearn.preprocessorsモジュールのNormalizerは、特徴量ベクトルのノルムが1になるようにする。具体的には、データごとに特徴量F_iを以下の式によってF_i^*に変換する。

(1) $\begin{equation*} {F_i}^* = \frac{\sum F_i}{\left( \sum {|F_i|}^p \right) ^\frac{1}{p}} \end{equation*}$

ノルムのタイプはコンストラクターの引数で指定する。デフォルトは'l2'で、その他に'l1'、'max'を指定可能。

Normalizer(norm='l2')

挙動

それぞれ異なる正規分布に従う2つの特徴量について、Normalizerを適用したときの挙動を以下に示す。

scalerのような相似性の変換ではないので左下の変換後のヒストグラムは変換前の形状と異なっている。

データの空間的な分布は、デフォルトのL2ノルムの指定によって全データが半径1の円周上に位置するよう変換される。

変換後のデータを拡大してみると以下の通りで、原点を中心とした半径1の円周上に各点が並んでいる。

他の2つ、L1ノルムと最大値ノルムを指定して実行した結果が下記の通りで、それぞれのノルムに応じた線上に各点が並んでいる。

コードは以下の通りで、データに対してfit()メソッドでスケールパラメーターを決定し、transform()メソッドで変換を行うところを、これらを連続して実行するfit_transform()メソッドを使っている。

import numpy as np
import numpy.random as rnd
import matplotlib.pyplot as plt
import matplotlib.patches as patch
from sklearn.preprocessing import Normalizer

rnd.seed(0)
x1 = rnd.normal(loc=1, scale=2, size=100)
x2 = rnd.normal(loc=5, scale=1, size=100)
X = np.hstack((x1.reshape(-1, 1), x2.reshape(-1, 1)))

X_trans = Normalizer().fit_transform(X)

fig1 = plt.figure(figsize=(9.6, 4.8))

ax1 = fig1.add_subplot(2, 2, 1)
ax2 = fig1.add_subplot(2, 2, 3)
ax3 = fig1.add_subplot(1, 2, 2)

ax1.hist(X[:, 0], ec='k', range=(-5, 10), bins=40, alpha=0.5)
ax1.hist(X[:, 1], ec='k', range=(-5, 10), bins=40, alpha=0.5)

ax2.hist(X_trans[:, 0], range=(-1.2, 1.2), bins=40, ec='k', alpha=0.5)
ax2.hist(X_trans[:, 1], range=(-1.2, 1.2), bins=40, ec='k', alpha=0.5)

ax3.scatter(X[:, 0], X[:, 1], ec='k', fc='w')
ax3.scatter(X_trans[:, 0], X_trans[:, 1], ec='k', fc='gray')
ax3.set_aspect('equal')
ax3.set_xlim(-5, 8)
ax3.set_ylim(-5, 8)

fig2, ax4 = plt.subplots()

ax4.scatter(X_trans[:, 0], X_trans[:, 1], ec='k', fc='gray')
ax4.set_aspect('equal')
ax4.set_xlim(-1.5, 1.5)
ax4.set_ylim(-1.5, 1.5)
ax4.grid()
ax4.spines['top'].set_visible(False)
ax4.spines['right'].set_visible(False)
ax4.spines['bottom'].set_position('zero')
ax4.spines['left'].set_position('zero')
circ = patch.Circle(xy=(0, 0), radius=1, ec='k', fill=False)
ax4.add_patch(circ)

X_trans_l1 = Normalizer('l1').fit_transform(X)
X_trans_max = Normalizer('max').fit_transform(X)

fig3, axes = plt.subplots(1, 2, figsize=(9.6, 4.8))

axes[0].scatter(X_trans_l1[:, 0],X_trans_l1[:, 1], ec='k', fc='gray')
axes[0].plot([0, 1, 0, -1, 0], [1, 0, -1, 0, 1], c='k')
axes[1].scatter(X_trans_max[:, 0],X_trans_max[:, 1], ec='k', fc='gray')
axes[1].plot([1, 1, -1, -1, 1], [1, -1, -1, 1, 1], c='k')

for ax in axes.reshape(-1):
    ax.set_aspect('equal')
    ax.set_xlim(-1.5, 1.5)
    ax.set_ylim(-1.5, 1.5)
    ax.grid()
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.spines['bottom'].set_position('zero')
    ax.spines['left'].set_position('zero')

plt.show()

import numpy as np

import numpy.random as rnd

import matplotlib.pyplot as plt

import matplotlib.patches as patch

from sklearn.preprocessing import Normalizer

rnd.seed(0)

x1 = rnd.normal(loc=1, scale=2, size=100)

x2 = rnd.normal(loc=5, scale=1, size=100)

X = np.hstack((x1.reshape(-1, 1), x2.reshape(-1, 1)))

X_trans = Normalizer().fit_transform(X)

fig1 = plt.figure(figsize=(9.6, 4.8))

ax1 = fig1.add_subplot(2, 2, 1)

ax2 = fig1.add_subplot(2, 2, 3)

ax3 = fig1.add_subplot(1, 2, 2)

ax1.hist(X[:, 0], ec='k', range=(-5, 10), bins=40, alpha=0.5)

ax1.hist(X[:, 1], ec='k', range=(-5, 10), bins=40, alpha=0.5)

ax2.hist(X_trans[:, 0], range=(-1.2, 1.2), bins=40, ec='k', alpha=0.5)

ax2.hist(X_trans[:, 1], range=(-1.2, 1.2), bins=40, ec='k', alpha=0.5)

ax3.scatter(X[:, 0], X[:, 1], ec='k', fc='w')

ax3.scatter(X_trans[:, 0], X_trans[:, 1], ec='k', fc='gray')

ax3.set_aspect('equal')

ax3.set_xlim(-5, 8)

ax3.set_ylim(-5, 8)

fig2, ax4 = plt.subplots()

ax4.scatter(X_trans[:, 0], X_trans[:, 1], ec='k', fc='gray')

ax4.set_aspect('equal')

ax4.set_xlim(-1.5, 1.5)

ax4.set_ylim(-1.5, 1.5)

ax4.grid()

ax4.spines['top'].set_visible(False)

ax4.spines['right'].set_visible(False)

ax4.spines['bottom'].set_position('zero')

ax4.spines['left'].set_position('zero')

circ = patch.Circle(xy=(0, 0), radius=1, ec='k', fill=False)

ax4.add_patch(circ)

X_trans_l1 = Normalizer('l1').fit_transform(X)

X_trans_max = Normalizer('max').fit_transform(X)

fig3, axes = plt.subplots(1, 2, figsize=(9.6, 4.8))

axes[0].scatter(X_trans_l1[:, 0],X_trans_l1[:, 1], ec='k', fc='gray')

axes[0].plot([0, 1, 0, -1, 0], [1, 0, -1, 0, 1], c='k')

axes[1].scatter(X_trans_max[:, 0],X_trans_max[:, 1], ec='k', fc='gray')

axes[1].plot([1, 1, -1, -1, 1], [1, -1, -1, 1, 1], c='k')

for ax in axes.reshape(-1):

ax.set_aspect('equal')

ax.set_xlim(-1.5, 1.5)

ax.set_ylim(-1.5, 1.5)

ax.grid()

ax.spines['top'].set_visible(False)

ax.spines['right'].set_visible(False)

ax.spines['bottom'].set_position('zero')

ax.spines['left'].set_position('zero')

plt.show()

特徴

Normalizerは特徴量ベクトルの方向だけが重要な場合に用いる。たとえば空間内の特定の方向範囲にあるクラスターの分離などかと思うが、抽象的なものになると想像がつかない。実際、サイト上で見ても、Normalizerの意義とデータの性質に基づいて適用しているケースは、検索上位には出てこない。

なおNormalizerによる変換は不可逆であり、scalerのようなinverse_transform()を持たない。

イテレーターは再利用不可

2020-10-06 / tau / コメントする

イテレーターで生成されたオブジェクトを変数にセットして実行できるが、これをそのまま再度利用することはできない。

from itertools import repeat

rpt = repeat("Ha", 3)

for x in rpt:
    print(x, end="")
print()
# HaHaHa

for x in rpt:
    print(x)
print()
# nothing displayed

from itertools import repeat

rpt = repeat("Ha", 3)

for x in rpt:

print(x, end="")

print()

# HaHaHa

for x in rpt:

print(x)

print()

# nothing displayed

イテレーターはインスタンス生成時に__init__()メソッドにより初期化され、その後イテレーターとして使用が終わった直後の値を保持している。正確には、再利用が禁止されているのではなく初期状態から再度実行することができない、ということになる。

たとえば次の例を見ると、最初の実行が終わったのちに再度利用することは不可能ではない。ただし結果を見るとわかるように、2度目の実行の初期値が1度目の終了判定時の値4ではなく5から始まっている。

from itertools import count

cnt = count(0)

for x in cnt:
    if x > 3: break
    print(x, end=" ")
print()
# 0 1 2 3 

for x in cnt:
    if x > 7: break
    print(x, end=" ")
print()
# 5 6 7

from itertools import count

cnt = count(0)

for x in cnt:

if x > 3: break

print(x, end=" ")

print()

# 0 1 2 3

for x in cnt:

if x > 7: break

print(x, end=" ")

print()

# 5 6 7

おそらくcountイテレーターの__next__()メソッドの最初で内部カウンターをインクリメントしていると考えられる。

このようにイテレーターの再利用は予想外の動作をすることがあるので控えた方がよさそうだ。

preprocessor – 異常値に対する頑健性

2020-10-04 / tau / コメントする

機械学習モデルにデータを適用するための前処理としていくつかのアルゴリズムによっては、異常値の影響を受けやすいことがある。

たとえば下図の左のような分布のデータがあるとする（平均が1、分散が1の正規分布に従う500個のランダムデータ）。そしてこのデータに値20の異常値が10個発生したとすると、全体の分布は右のようになる。

このデータに対して、MinMaxScaler、StandardScaler、RobustScalerで変換した結果を以下に示す。ただしStandardScalerとRobustScalerについては、異常値は表示させず元の正規分布に係る範囲のみを表示している。

まず左側のMinMaxScalerについては、異常値を含めてレンジが0～1となるので、本体の正規分布のデータが0付近の小さな値に集中する。このため、本来学習の精度に効いてくるべき本体部分のデータの分離が十分でない可能性が出てくる。

真ん中のStandardScalerと右側のRobustScalerについては、本体部分の形は元の正規分布の形と大きく変わらず、頑健であることがわかる。

ここで異常値の個数を10個から20個に増やして、同じく3種類の変換を施してみる。

左側のMinMaxScalerについては、異常値の個数とは関係なくその値のみでレンジが決まり、元の分布が0付近に押し込められている状況は同じ。

真ん中のStandardScalerについては、10個の時に比べて少し分布の形が変わっていて、レンジが狭くなっている。

右側のRobustScalerについては、元の分布の形は大きくは変わっていない。

以上のことから、少なくとも3つの変換器について以下のような特徴があることがわかる。

MinMaxScalerは異常値によって本来分析したいデータのレンジが狭くなる可能性がある
StandardScalerは異常値の影響を受けにくいが、その大きさや頻度によって若干本体部分の分布が影響を受ける
RobustScalerは異常値の個数が極端に多くなければ、本来のデータの特性を頑健に保持する

なお、上記の作図のコードは以下の通り。

import numpy as np
import numpy.random as rnd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import RobustScaler

rnd.seed(0)
x = rnd.normal(loc=1, scale=1, size=500)
x1 = np.append(x, [20] * 10)
x2 = np.append(x, [20] * 20)

scaler = MinMaxScaler()
x1_scaled_by_minmax = scaler.fit_transform(x1.reshape(-1, 1))
x2_scaled_by_minmax = scaler.fit_transform(x2.reshape(-1, 1))

scaler = StandardScaler()
x1_scaled_by_standard = scaler.fit_transform(x1.reshape(-1, 1))
x2_scaled_by_standard = scaler.fit_transform(x2.reshape(-1, 1))

scaler = RobustScaler()
x1_scaled_by_robust = scaler.fit_transform(x1.reshape(-1, 1))
x2_scaled_by_robust = scaler.fit_transform(x2.reshape(-1, 1))

fig0, axes = plt.subplots(1, 2, figsize=(12.8, 4.8))
axes[0].hist(x1, ec='k', bins=10, range=(-2, 4))
axes[1].hist(x1, ec='k', bins=40)

fig1, axes = plt.subplots(1, 3, figsize=(18.6, 4.8))

ax = axes[0]
ax.hist(x1_scaled_by_minmax, ec='k', bins=40)
ax.set_title("MinMaxScaler")

ax = axes[1]
ax.hist(x1_scaled_by_standard, ec='k', bins=10, range=(-1.5, 1))
ax.set_title("StandardScaler")

ax = axes[2]
ax.hist(x1_scaled_by_robust, ec='k', bins=10, range=(-2.5, 2.5))
ax.set_title("RobustScaler")

fig2, axes = plt.subplots(1, 3, figsize=(18.6, 4.8))

ax = axes[0]
ax.hist(x2_scaled_by_minmax, ec='k', bins=40)
ax.set_title("MinMaxScaler")

ax = axes[1]
ax.hist(x2_scaled_by_standard, ec='k', bins=10, range=(-1.5, 1))
ax.set_title("StandardScaler")

ax = axes[2]
ax.hist(x2_scaled_by_robust, ec='k', bins=10, range=(-2.5, 2.5))
ax.set_title("RobustScaler")

plt.show()

import numpy as np

import numpy.random as rnd

import matplotlib.pyplot as plt

from sklearn.preprocessing import MinMaxScaler

from sklearn.preprocessing import StandardScaler

from sklearn.preprocessing import RobustScaler

rnd.seed(0)

x = rnd.normal(loc=1, scale=1, size=500)

x1 = np.append(x, [20] * 10)

x2 = np.append(x, [20] * 20)

scaler = MinMaxScaler()

x1_scaled_by_minmax = scaler.fit_transform(x1.reshape(-1, 1))

x2_scaled_by_minmax = scaler.fit_transform(x2.reshape(-1, 1))

scaler = StandardScaler()

x1_scaled_by_standard = scaler.fit_transform(x1.reshape(-1, 1))

x2_scaled_by_standard = scaler.fit_transform(x2.reshape(-1, 1))

scaler = RobustScaler()

x1_scaled_by_robust = scaler.fit_transform(x1.reshape(-1, 1))

x2_scaled_by_robust = scaler.fit_transform(x2.reshape(-1, 1))

fig0, axes = plt.subplots(1, 2, figsize=(12.8, 4.8))

axes[0].hist(x1, ec='k', bins=10, range=(-2, 4))

axes[1].hist(x1, ec='k', bins=40)

fig1, axes = plt.subplots(1, 3, figsize=(18.6, 4.8))

ax = axes[0]

ax.hist(x1_scaled_by_minmax, ec='k', bins=40)

ax.set_title("MinMaxScaler")

ax = axes[1]

ax.hist(x1_scaled_by_standard, ec='k', bins=10, range=(-1.5, 1))

ax.set_title("StandardScaler")

ax = axes[2]

ax.hist(x1_scaled_by_robust, ec='k', bins=10, range=(-2.5, 2.5))

ax.set_title("RobustScaler")

fig2, axes = plt.subplots(1, 3, figsize=(18.6, 4.8))

ax = axes[0]

ax.hist(x2_scaled_by_minmax, ec='k', bins=40)

ax.set_title("MinMaxScaler")

ax = axes[1]

ax.hist(x2_scaled_by_standard, ec='k', bins=10, range=(-1.5, 1))

ax.set_title("StandardScaler")

ax = axes[2]

ax.hist(x2_scaled_by_robust, ec='k', bins=10, range=(-2.5, 2.5))

ax.set_title("RobustScaler")

plt.show()

RobustScaler

2020-10-04 / tau / コメントする

概要

sklearn.preprocessingモジュールのRobustScalerは、各特徴量の中央値(med_i)と第1-4分位数(q_1i)、第3-4分位数(q_3i)を用いて特徴量を標準化する。

(1) $\begin{equation*} {F_i}^* = \frac{F_i - med_i}{q_{3i} - q_{1i}} \end{equation*}$

挙動

それぞれ異なる正規分布に従う2つの特徴量について、RobustScalerを適用したときの挙動を以下に示す。異なる大きさとレンジの特徴量が、変換後には原点を中心としてほぼ同じような広がりになっているのがわかる。

import numpy as np
import numpy.random as rnd
import matplotlib.pyplot as plt
from sklearn.preprocessing import RobustScaler

rnd.seed(0)
x1 = rnd.normal(loc=2, scale=3, size=100)
x2 = rnd.normal(loc=7, scale=1, size=100)
X = np.hstack((x1.reshape(-1, 1), x2.reshape(-1, 1)))

scaler = RobustScaler()
X_transformed = scaler.fit_transform(X)

fig = plt.figure(figsize=(9.6, 4.8))

ax1 = fig.add_subplot(2, 2, 1)
ax2 = fig.add_subplot(2, 2, 3)
ax3 = fig.add_subplot(1, 2, 2)

ax1.hist(X[:, 0], ec='k', range=(-5, 10), bins=40, alpha=0.5, label="Feature 0")
ax1.hist(X[:, 1], ec='k', range=(-5, 10), bins=40, alpha=0.5, label="Feature 1")
ax1.legend(loc='upper left')

ax2.hist(X_transformed[:, 0], range=(-3, 3), bins=40, ec='k', alpha=0.5,
    label="Feature 0")
ax2.hist(X_transformed[:, 1], range=(-3, 3), bins=40, ec='k', alpha=0.5,
    label="Feature 1")
ax2.legend(loc='upper left')

ax3.scatter(X[:, 0], X[:, 1], ec='k', fc='w', label="before transformation")
ax3.scatter(X_transformed[:, 0], X_transformed[:, 1], ec='k', fc='gray',
    label="after transformation")
ax3.set_aspect('equal')
ax3.set_xlim(-7, 10)
ax3.set_ylim(-7, 10)
ax3.set_xlabel("Feature 0")
ax3.set_ylabel("Feature 1")
ax3.legend()

plt.show()

import numpy as np

import numpy.random as rnd

import matplotlib.pyplot as plt

from sklearn.preprocessing import RobustScaler

rnd.seed(0)

x1 = rnd.normal(loc=2, scale=3, size=100)

x2 = rnd.normal(loc=7, scale=1, size=100)

X = np.hstack((x1.reshape(-1, 1), x2.reshape(-1, 1)))

scaler = RobustScaler()

X_transformed = scaler.fit_transform(X)

fig = plt.figure(figsize=(9.6, 4.8))

ax1 = fig.add_subplot(2, 2, 1)

ax2 = fig.add_subplot(2, 2, 3)

ax3 = fig.add_subplot(1, 2, 2)

ax1.hist(X[:, 0], ec='k', range=(-5, 10), bins=40, alpha=0.5, label="Feature 0")

ax1.hist(X[:, 1], ec='k', range=(-5, 10), bins=40, alpha=0.5, label="Feature 1")

ax1.legend(loc='upper left')

ax2.hist(X_transformed[:, 0], range=(-3, 3), bins=40, ec='k', alpha=0.5,

label="Feature 0")

ax2.hist(X_transformed[:, 1], range=(-3, 3), bins=40, ec='k', alpha=0.5,

label="Feature 1")

ax2.legend(loc='upper left')

ax3.scatter(X[:, 0], X[:, 1], ec='k', fc='w', label="before transformation")

ax3.scatter(X_transformed[:, 0], X_transformed[:, 1], ec='k', fc='gray',

label="after transformation")

ax3.set_aspect('equal')

ax3.set_xlim(-7, 10)

ax3.set_ylim(-7, 10)

ax3.set_xlabel("Feature 0")

ax3.set_ylabel("Feature 1")

ax3.legend()

plt.show()

簡単なデータでRobustScalerの計算過程を確認しておく。以下の例では5個のデータにRobustScalerを適用している。これは1つの特徴量を持つ5個のデータを模していることになる。

インスタンス内に保持されたパラメーターのうち、center_は特徴量の標本平均、scale_が第3-4分位数－第1-4分位数となっていて、これらで各特徴量が標準化されているのが確認できる。

import numpy as np
from sklearn.preprocessing import RobustScaler

x = np.array([2, 3, 4, 5, 6, 8, 10, 12])
print(np.percentile(x, q=[0, 25, 50, 75, 100]))

scaler = RobustScaler()
x_transformed = scaler.fit_transform(x.reshape(-1, 1))
print(x_transformed.reshape(-1))
print("centers:{}".format(scaler.center_))
print("scales :{}".format(scaler.scale_))

# [ 2.    3.75  5.5   8.5  12.  ]
# [-0.73684211 -0.52631579 -0.31578947 -0.10526316  0.10526316  0.52631579
#   0.94736842  1.36842105]
# centers:[5.5]
# scales :[4.75]

import numpy as np

from sklearn.preprocessing import RobustScaler

x = np.array([2, 3, 4, 5, 6, 8, 10, 12])

print(np.percentile(x, q=[0, 25, 50, 75, 100]))

scaler = RobustScaler()

x_transformed = scaler.fit_transform(x.reshape(-1, 1))

print(x_transformed.reshape(-1))

print("centers:{}".format(scaler.center_))

print("scales :{}".format(scaler.scale_))

# [ 2. 3.75 5.5 8.5 12. ]

# [-0.73684211 -0.52631579 -0.31578947 -0.10526316 0.10526316 0.52631579

# 0.94736842 1.36842105]

# centers:[5.5]

# scales :[4.75]

特徴

RobustScalerは異常値に対して頑健であり、StandardScalerより頑健性が高い。

StandardScaler

2020-10-04 / tau / コメントする

概要

sklearn.preprocessingモジュールのStandardScalerは、各特徴量の標本平均と標本分散を用いて特徴量を標準化する。

具体的には、特徴量F_iの標本平均(m_i)と標本分散(v_i)から以下の式により各特徴量F_iをF_i^*に変換する。

(1) $\begin{equation*} {F_i}^* = \frac{F_i -m_i}{\sqrt{v_i}} \end{equation*}$

挙動

それぞれ異なる正規分布に従う2つの特徴量について、StandardScalerを適用したときの挙動を以下に示す。異なる大きさとレンジの特徴量が、変換後には原点を中心としてほぼ同じような広がりになっているのがわかる。

import numpy as np
import numpy.random as rnd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

rnd.seed(0)
x1 = rnd.normal(loc=2, scale=3, size=100)
x2 = rnd.normal(loc=7, scale=1, size=100)
X = np.hstack((x1.reshape(-1, 1), x2.reshape(-1, 1)))

scaler = StandardScaler()
X_transformed = scaler.fit_transform(X)

fig = plt.figure(figsize=(9.6, 4.8))
fig.subplots_adjust(wspace=0.3)

ax1 = fig.add_subplot(2, 2, 1)
ax2 = fig.add_subplot(2, 2, 3)
ax3 = fig.add_subplot(1, 2, 2)

ax1.hist(X[:, 0], ec='k', range=(-5, 10), bins=40, alpha=0.5, label="Feature 0")
ax1.hist(X[:, 1], ec='k', range=(-5, 10), bins=40, alpha=0.5, label="Feature 1")
ax1.legend(loc='upper left')

ax2.hist(X_transformed[:, 0], range=(-3, 3), bins=40, ec='k', alpha=0.5,
    label="Feature 0")
ax2.hist(X_transformed[:, 1], range=(-3, 3), bins=40, ec='k', alpha=0.5,
    label="Feature 0")
ax2.legend(loc='upper left')

ax3.scatter(X[:, 0], X[:, 1], ec='k', fc='w', label="before transformation")
ax3.scatter(X_transformed[:, 0], X_transformed[:, 1], ec='k', fc='gray',
    label="after transformation")
ax3.set_aspect('equal')
ax3.set_xlim(-10, 10)
ax3.set_ylim(-10, 10)
ax3.set_xlabel("Feature 0")
ax3.set_ylabel("Feature 1")
ax3.legend()

plt.show()

import numpy as np

import numpy.random as rnd

import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler

rnd.seed(0)

x1 = rnd.normal(loc=2, scale=3, size=100)

x2 = rnd.normal(loc=7, scale=1, size=100)

X = np.hstack((x1.reshape(-1, 1), x2.reshape(-1, 1)))

scaler = StandardScaler()

X_transformed = scaler.fit_transform(X)

fig = plt.figure(figsize=(9.6, 4.8))

fig.subplots_adjust(wspace=0.3)

ax1 = fig.add_subplot(2, 2, 1)

ax2 = fig.add_subplot(2, 2, 3)

ax3 = fig.add_subplot(1, 2, 2)

ax1.hist(X[:, 0], ec='k', range=(-5, 10), bins=40, alpha=0.5, label="Feature 0")

ax1.hist(X[:, 1], ec='k', range=(-5, 10), bins=40, alpha=0.5, label="Feature 1")

ax1.legend(loc='upper left')

ax2.hist(X_transformed[:, 0], range=(-3, 3), bins=40, ec='k', alpha=0.5,

label="Feature 0")

ax2.hist(X_transformed[:, 1], range=(-3, 3), bins=40, ec='k', alpha=0.5,

label="Feature 0")

ax2.legend(loc='upper left')

ax3.scatter(X[:, 0], X[:, 1], ec='k', fc='w', label="before transformation")

ax3.scatter(X_transformed[:, 0], X_transformed[:, 1], ec='k', fc='gray',

label="after transformation")

ax3.set_aspect('equal')

ax3.set_xlim(-10, 10)

ax3.set_ylim(-10, 10)

ax3.set_xlabel("Feature 0")

ax3.set_ylabel("Feature 1")

ax3.legend()

plt.show()

簡単なデータでStandardScalerの計算過程を確認しておく。以下の例では5個のデータにStandardScalerを適用している。これは1つの特徴量を持つ5個のデータを模していることになる。

import numpy as np
from sklearn.preprocessing import StandardScaler

x = np.array([1, 2, 3, 4, 5])

scaler = StandardScaler()
x_transformed = scaler.fit_transform(x.reshape(-1, 1))
print(x_transformed.reshape(-1))

print("mean_ :{}".format(scaler.mean_))
print("var_  :{}".format(scaler.var_))
print("scale_:{}".format(scaler.scale_))

# [-1.41421356 -0.70710678  0.          0.70710678  1.41421356]
# mean_ :[3.]
# var_  :[2.]
# scale_:[1.41421356]

import numpy as np

from sklearn.preprocessing import StandardScaler

x = np.array([1, 2, 3, 4, 5])

scaler = StandardScaler()

x_transformed = scaler.fit_transform(x.reshape(-1, 1))

print(x_transformed.reshape(-1))

print("mean_ :{}".format(scaler.mean_))

print("var_ :{}".format(scaler.var_))

print("scale_:{}".format(scaler.scale_))

# [-1.41421356 -0.70710678 0. 0.70710678 1.41421356]

# mean_ :[3.]

# var_ :[2.]

# scale_:[1.41421356]

インスタンス内に保持されたパラメーターのうち、mean_は特徴量の標本平均、var_は標本分散（不偏分散ではない）となっている。scale_はvar_の平方根。

各データの特徴量は次式で標準化されているのが計算で確認できる。

(2) $\begin{equation*} {F_i}^* = \frac{F_i - \rm{mean\_}}{\rm{scale\_}} = \frac{F_i - \rm{mean\_}}{\sqrt{\rm{var\_}}} \end{equation*}$

特徴

StandardScalerは異常値の影響に対して比較的頑健である。

MinMaxScaler

2020-10-04 / tau / コメントする

概要

sklearn.preprocessingモジュールのMinMaxScalerは、各特徴量が0～1の範囲に納まるように変換する。具体的には、特徴量F_iの最小値(min_i)と最大値(max_i)から以下の式により各特徴量F_iをF_i^*に変換する。

(1) $\begin{equation*} {F_i}^* = \frac{F_i - min_i}{max_i - min_i} \end{equation*}$

挙動

それぞれ異なる正規分布に従う2つの特徴量について、MinMaxScalerを適用したときの挙動を以下に示す。異なる大きさとレンジの特徴量が、変換後にはいずれも0～1の間に納まっているのが確認できる。

import numpy as np
import numpy.random as rnd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler

rnd.seed(0)
x1 = rnd.normal(loc=1, scale=1, size=100)
x2 = rnd.normal(loc=3, scale=0.5, size=100)
X = np.hstack((x1.reshape(-1, 1), x2.reshape(-1, 1)))

scaler = MinMaxScaler()
X_transformed = scaler.fit_transform(X)

fig = plt.figure(figsize=(9.6, 4.8))

ax1 = fig.add_subplot(2, 2, 1)
ax2 = fig.add_subplot(2, 2, 3)
ax3 = fig.add_subplot(1, 2, 2)

ax1.hist(X[:, 0], ec='k', range=(-2, 5), bins=40, alpha=0.5, label="Feature 0")
ax1.hist(X[:, 1], ec='k', range=(-2, 5), bins=40, alpha=0.5, label="Feature 1")
ax1.legend(loc='upper left')

ax2.hist(X_transformed[:, 0], range=(-0.2, 1.2), bins=40, ec='k', alpha=0.5,
    label="Feature 0")
ax2.hist(X_transformed[:, 1], range=(-0.2, 1.2), bins=40, ec='k', alpha=0.5,
    label="Feature 1")
ax2.legend(loc='upper left')

ax3.scatter(X[:, 0], X[:, 1], ec='k', fc='w', label="before transformation")
ax3.scatter(X_transformed[:, 0], X_transformed[:, 1], ec='k', fc='gray',
    label="after transformation")
ax3.set_aspect('equal')
ax3.set_xlim(-2, 5)
ax3.set_ylim(-2, 5)
ax3.set_xlabel("Feature 0")
ax3.set_ylabel("Feature 1")
ax3.legend()

plt.show()

import numpy as np

import numpy.random as rnd

import matplotlib.pyplot as plt

from sklearn.preprocessing import MinMaxScaler

rnd.seed(0)

x1 = rnd.normal(loc=1, scale=1, size=100)

x2 = rnd.normal(loc=3, scale=0.5, size=100)

X = np.hstack((x1.reshape(-1, 1), x2.reshape(-1, 1)))

scaler = MinMaxScaler()

X_transformed = scaler.fit_transform(X)

fig = plt.figure(figsize=(9.6, 4.8))

ax1 = fig.add_subplot(2, 2, 1)

ax2 = fig.add_subplot(2, 2, 3)

ax3 = fig.add_subplot(1, 2, 2)

ax1.hist(X[:, 0], ec='k', range=(-2, 5), bins=40, alpha=0.5, label="Feature 0")

ax1.hist(X[:, 1], ec='k', range=(-2, 5), bins=40, alpha=0.5, label="Feature 1")

ax1.legend(loc='upper left')

ax2.hist(X_transformed[:, 0], range=(-0.2, 1.2), bins=40, ec='k', alpha=0.5,

label="Feature 0")

ax2.hist(X_transformed[:, 1], range=(-0.2, 1.2), bins=40, ec='k', alpha=0.5,

label="Feature 1")

ax2.legend(loc='upper left')

ax3.scatter(X[:, 0], X[:, 1], ec='k', fc='w', label="before transformation")

ax3.scatter(X_transformed[:, 0], X_transformed[:, 1], ec='k', fc='gray',

label="after transformation")

ax3.set_aspect('equal')

ax3.set_xlim(-2, 5)

ax3.set_ylim(-2, 5)

ax3.set_xlabel("Feature 0")

ax3.set_ylabel("Feature 1")

ax3.legend()

plt.show()

特徴

MinMaxScalerは簡明な方法だが、極端に値が離れた異常値が発生すると本来のデータがその影響を受ける場合がある。

matplot.pyplot – 格子でないグラフの組み合わせ

2020-10-04 / tau / コメントする

通常、Figure.subplots()やpyplot.add_subplot()でグラフの描画領域を指定するとき、m行n列の格子状のグラフエリアが生成される。

これに対して、たとえば1行目に2つのグラフエリアを表示して2行目に全幅のグラフを1つ、だとか、1列目に2列ぶち抜きのグラフエリアを表示して2列目に縦2つのグラフエリアを表示したいときがある。

このような場合の1つの方法が、Figure.add_subplotで加えたいグラフエリアの構成自体を変える方法がある。

以下の例は、1行目に2つのグラフを並べ、2行目は全幅で1つのグラフエリアを表示させる方法。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-np.pi, np.pi, 100)

fig = plt.figure()

ax1 = fig.add_subplot(2, 2, 1)
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 1, 2)

plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(-np.pi, np.pi, 100)

fig = plt.figure()

ax1 = fig.add_subplot(2, 2, 1)

ax2 = fig.add_subplot(2, 2, 2)

ax3 = fig.add_subplot(2, 1, 2)

plt.show()

また、1列目に2行分を占有する一つのグラフエリアと、2列目に2つのグラフエリアを縦に並べる方法。

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-np.pi, np.pi, 100)

fig = plt.figure()

ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 2, 4)

plt.show()

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(-np.pi, np.pi, 100)

fig = plt.figure()

ax1 = fig.add_subplot(1, 2, 1)

ax2 = fig.add_subplot(2, 2, 2)

ax3 = fig.add_subplot(2, 2, 4)

plt.show()

ndarrayの書式設定 – printoptions

2020-10-02 / tau / コメントする

概要

配列をprintで表示させようとして、書式設定でよく間違える。たとえば以下のように。

import numpy as np

a = np.array([0.0123, 1.2345, 12.3456789])
print("{:.3f}".format(a))
# TypeError: unsupported format string passed to numpy.ndarray.__format__

import numpy as np

a = np.array([0.0123, 1.2345, 12.3456789])

print("{:.3f}".format(a))

# TypeError: unsupported format string passed to numpy.ndarray.__format__

配列の各要素の書式を指定して表示させたい場合、formatメソッドではなく、Numpyのset_printoptionsを使う必要がある。

`get_printoptions()`

配列の書式オプションの一覧は、numpy.get_printoptions()で得られる。各オプションは辞書形式で保存されている。

import numpy as np

options = np.get_printoptions()

for k, v in zip(options.keys(), options.values()):
    print("{:<9}: {}".format(k, v))

# edgeitems: 3
# threshold: 1000
# floatmode: maxprec
# precision: 8
# suppress : False
# linewidth: 75
# nanstr   : nan
# infstr   : inf
# sign     : -
# formatter: None
# legacy   : False

import numpy as np

options = np.get_printoptions()

for k, v in zip(options.keys(), options.values()):

print("{:<9}: {}".format(k, v))

# edgeitems: 3

# threshold: 1000

# floatmode: maxprec

# precision: 8

# suppress : False

# linewidth: 75

# nanstr : nan

# infstr : inf

# sign : -

# formatter: None

# legacy : False

`set_printoptions()`

これらのオプションを個別に設定するにはnumpy.set_printoptions()メソッドでキーと値を指定する。

numpy.set_printoptions([キー]=[値])

よく使いそうないくつかのオプションについてまとめる。

省略表示

`threshold`と`edgeitems`

要素数（列数・行数）がthresholdに指定した値を越えた場合に省略表示する。

np.set_printoptions(threshold=20)

print(np.arange(20))
# [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]

print(np.arange(21))
# [ 0  1  2 ... 18 19 20]

np.set_printoptions(threshold=20)

print(np.arange(20))

# [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]

print(np.arange(21))

# [ 0 1 2 ... 18 19 20]

edgeitemsは省略時に表示する要素数（列数・行数）を指定する。

np.set_printoptions(edgeitems=5)

print(np.arange(40))

# [ 0  1  2  3  4 ... 35 36 37 38 39]

np.set_printoptions(edgeitems=5)

print(np.arange(40))

# [ 0 1 2 3 4 ... 35 36 37 38 39]

threshold=0を指定すると、edgeitemsの値を超えると常に省略表示する（デフォルトの場合、edgeitems=3を越えると省略表示）。

np.set_printoptions(threshold=0, edgeitems=3)

print(np.arange(6))
# [0 1 2 3 4 5]

print(np.arange(7))
# [0 1 2 ... 4 5 6]

np.set_printoptions(threshold=0, edgeitems=3)

print(np.arange(6))

# [0 1 2 3 4 5]

print(np.arange(7))

# [0 1 2 ... 4 5 6]

2次元配列の行も同じ条件で省略表示される。

print(np.arange(36).reshape(6, 6))

# [[ 0  1  2  3  4  5]
#  [ 6  7  8  9 10 11]
#  [12 13 14 15 16 17]
#  [18 19 20 21 22 23]
#  [24 25 26 27 28 29]
#  [30 31 32 33 34 35]]

print(np.arange(49).reshape(7, 7))

# [[ 0  1  2 ...  4  5  6]
#  [ 7  8  9 ... 11 12 13]
#  [14 15 16 ... 18 19 20]
#  ...
#  [28 29 30 ... 32 33 34]
#  [35 36 37 ... 39 40 41]
#  [42 43 44 ... 46 47 48]]

print(np.arange(36).reshape(6, 6))

# [[ 0 1 2 3 4 5]

# [ 6 7 8 9 10 11]

# [12 13 14 15 16 17]

# [18 19 20 21 22 23]

# [24 25 26 27 28 29]

# [30 31 32 33 34 35]]

print(np.arange(49).reshape(7, 7))

# [[ 0 1 2 ... 4 5 6]

# [ 7 8 9 ... 11 12 13]

# [14 15 16 ... 18 19 20]

# ...

# [28 29 30 ... 32 33 34]

# [35 36 37 ... 39 40 41]

# [42 43 44 ... 46 47 48]]

数値の書式

`supress`

デフォルトでは要素にオーダーが小さい数値が含まれていると浮動小数点表示となり、1つの要素でも浮動小数点表示になるとすべての要素が浮動小数点表示になる。

オプションで'supress=True'を指定すると、強制的に固定小数点で表示される。

import numpy as np

a = np.array([0.0000123, 0.123, 12.3])

print(a)
# [1.23e-05 1.23e-01 1.23e+01]

np.set_printoptions(suppress=True)

print(a)
# [ 0.0000123  0.123     12.3      ]

import numpy as np

a = np.array([0.0000123, 0.123, 12.3])

print(a)

# [1.23e-05 1.23e-01 1.23e+01]

np.set_printoptions(suppress=True)

print(a)

# [ 0.0000123 0.123 12.3 ]

`precision`

precisionで精度の桁数を指定する。固定小数点数の場合は小数点以下の桁数、浮動小数点数の場合は仮数部の桁数。

import numpy as np

a = np.array([x / 7 for x in [0.1, 1, 10, 100]])
print(a)
# [ 0.01428571  0.14285714  1.42857143 14.28571429]

np.set_printoptions(precision=3)
print(a)
# [ 0.014  0.143  1.429 14.286]

b = np.array([x / 7 for x in [0.01, 1, 10, 100]])
print(b)
# [1.429e-03 1.429e-01 1.429e+00 1.429e+01]

import numpy as np

a = np.array([x / 7 for x in [0.1, 1, 10, 100]])

print(a)

# [ 0.01428571 0.14285714 1.42857143 14.28571429]

np.set_printoptions(precision=3)

print(a)

# [ 0.014 0.143 1.429 14.286]

b = np.array([x / 7 for x in [0.01, 1, 10, 100]])

print(b)

# [1.429e-03 1.429e-01 1.429e+00 1.429e+01]

`floatmode`

floatmodeでキーワードを指定し、あらかじめ定められた書式を設定する。

次のような配列でキーワードごとの挙動を確認する。配列aは最大でもprecision設定より低い精度、配列bはprecisionを超える精度の要素を持ち、デフォルトのprecision=8で表示が丸められている。

a = np.array([0.1, 0.123, 0.123456])
b = np.array([0.1, 0.12345, 0.123456789])
print("default      :{}".format(a))
print("             :{}".format(b))

# default      :[0.1      0.123    0.123456]
#              :[0.1        0.12345    0.12345679]

a = np.array([0.1, 0.123, 0.123456])

b = np.array([0.1, 0.12345, 0.123456789])

print("default :{}".format(a))

print(" :{}".format(b))

# default :[0.1 0.123 0.123456]

# :[0.1 0.12345 0.12345679]

`maxprec`

デフォルトの設定。各要素がそれぞれ最大の精度で表示される。いずれの配列も、最大精度となる最後尾の要素の桁幅に統一されていて、0埋めはされない。デフォルトはこの設定なので、結果は上と同じ。

np.set_printoptions(floatmode='maxprec')
print("maxprec      :{}".format(a))
print("             :{}".format(b))

# maxprec      :[0.1      0.123    0.123456]
#              :[0.1        0.12345    0.12345679]

np.set_printoptions(floatmode='maxprec')

print("maxprec :{}".format(a))

print(" :{}".format(b))

# maxprec :[0.1 0.123 0.123456]

# :[0.1 0.12345 0.12345679]

`maxprec_equal`

maxplecは0埋めされなかったが、maxprec_equalは最大精度の桁数に統一された上で0で埋められる(equalの意味が曖昧、maxprec_zeroとでもしてくれればよかったのに)。

np.set_printoptions(floatmode='maxprec')
print("maxprec      :{}".format(a))
print("             :{}".format(b))

# maxprec_equal:[0.100000 0.123000 0.123456]
#              :[0.10000000 0.12345000 0.12345679]

np.set_printoptions(floatmode='maxprec')

print("maxprec :{}".format(a))

print(" :{}".format(b))

# maxprec_equal:[0.100000 0.123000 0.123456]

# :[0.10000000 0.12345000 0.12345679]

`fixed`

全ての要素の精度がprecisionに統一され、それより低い精度の場合は0で埋められる。下の例では、2つの配列のすべての要素が小数点以下8桁に統一され、0で埋められている。

np.set_printoptions(floatmode='fixed')
print("fixed        :{}".format(a))
print("             :{}".format(b))

# fixed        :[0.10000000 0.12300000 0.12345600]
#              :[0.10000000 0.12345000 0.12345679]

np.set_printoptions(floatmode='fixed')

print("fixed :{}".format(a))

print(" :{}".format(b))

# fixed :[0.10000000 0.12300000 0.12345600]

# :[0.10000000 0.12345000 0.12345679]

`unique`

precisionは無視され、各要素で必要な分だけの精度が保たれ、桁数は最大精度に統一される。配列bの最後の要素が丸められていないことに注意。

np.set_printoptions(floatmode='unique')
print("unique       :{}".format(a))
print("             :{}".format(b))

# unique       :[0.1      0.123    0.123456]
#              :[0.1         0.12345     0.123456789]

np.set_printoptions(floatmode='unique')

print("unique :{}".format(a))

print(" :{}".format(b))

# unique :[0.1 0.123 0.123456]

# :[0.1 0.12345 0.123456789]

`formatter`

書式設定文字列とformatを渡して、任意の書式を設定する。渡し方は以下の通り。

formatter={'型名' : "{:書式}".format }

型名としては'int'、'float'のほか'numpystr'で文字列も指定できる。

import numpy as np

a = np.array([0.0123, 1.2345, 12.3456789])

np.set_printoptions(formatter={'float' : "{:10.5f}".format})
print(a)

np.set_printoptions(formatter={'float' : "{:15.7e}".format})
print(a)

# [   0.01230    1.23450   12.34568]
# [  1.2300000e-02   1.2345000e+00   1.2345679e+01]

import numpy as np

a = np.array([0.0123, 1.2345, 12.3456789])

np.set_printoptions(formatter={'float' : "{:10.5f}".format})

print(a)

np.set_printoptions(formatter={'float' : "{:15.7e}".format})

print(a)

# [ 0.01230 1.23450 12.34568]

# [ 1.2300000e-02 1.2345000e+00 1.2345679e+01]

Ruby – クラス

2020-10-02 / tau / コメントする

基本形

クラスの基本形は以下の通り。

クラス定義はclassで始めてendで終える
メソッドはdefで初めてendで終える
初期化メソッド(コンストラクター)は'initialize()'
プロパティー(インスタンス変数)は頭に'@'をつけて、initialise()で定義
インスタンスの生成は[クラス名].new([引数])

class MyClass
  def initialize(val)
    @property = val
  end

  def method(val)
    return @property + val
  end
end

instance = MyClass.new(2)
puts instance.method(5)

# 7

class MyClass

def initialize(val)

@property = val

end

def method(val)

return @property + val

end

instance = MyClass.new(2)

puts instance.method(5)

# 7

initialize()～コンストラクター

インスタンス生成時の初期化処理をinitialize()に書く(コンストラクター)。インスタンス生成時にinitialize()が内部で実行され、その内容に沿った初期化が行われる（initialize()についてはこちら）。

メソッド

インスタンスメソッドはdef...endで定義する。引数を持たないメソッドの場合、()を省略してメソッド名だけで呼び出せる（これはnewについてもあてはまる）。

class Rectangle
  def initialize(side1, side2)
    @side1 = side1
    @side2 = side2
  end

  def area()
    return @side1 * @side2
  end

  def perimeter()
    return (@side1 + @side2) * 2
  end
end

rect = Rectangle.new(3, 5)
puts rect.perimeter()
puts rect.area

class Rectangle

def initialize(side1, side2)

@side1 = side1

@side2 = side2

end

def area()

return @side1 * @side2

end

def perimeter()

return (@side1 + @side2) * 2

end

rect = Rectangle.new(3, 5)

puts rect.perimeter()

puts rect.area

インスタンス変数へのアクセス

インスタンス変数はカプセル化（encapsulation、隠蔽）されている。参照したり値をセットしようとするとエラー。

class MyClass
  def initialize()
    @a = 0
  end
end

instance = MyClass.new

puts instance.@a
# ERROR
# syntax error, unexpected tIVAR

instance.a = 1
# ERROR
# in `<main>': undefined method `a=' for #<MyClass:0x00000000050d53e0 @a=0> (NoMethodError)

class MyClass

def initialize()

@a = 0

end

instance = MyClass.new

puts instance.@a

# ERROR

# syntax error, unexpected tIVAR

instance.a = 1

# ERROR

# in `<main>': undefined method `a=' for #<MyClass:0x00000000050d53e0 @a=0> (NoMethodError)

インスタンス変数にアクセスするのにアクセスメソッドによる方法とgetter/setterを定義する方法がある。

クラス変数・クラスメソッド

クラス変数は、クラスから生成された全インスタンスが共通して利用する変数。

クラスメソッドはクラスレベルで定義されるメソッド。

クラスの継承

継承は'<'を使う。クラスの継承の詳細についてはこちら。

class ParentClass
  def initialize()
    @parent_property = "parent property"
  end

  def parent_method()
    puts "in parent"
  end
end

class ChildClass < ParentClass
  def child_method()
    puts "in child"
    puts "#{@parent_property} accessible"
  end
end

child = ChildClass.new
child.parent_method
child.child_method

# in parent
# in child
# parent property accessible

class ParentClass

def initialize()

@parent_property = "parent property"

end

def parent_method()

puts "in parent"

end

class ChildClass < ParentClass

def child_method()

puts "in child"

puts "#{@parent_property} accessible"

end

child = ChildClass.new

child.parent_method

child.child_method

# in parent

# in child

# parent property accessible

Ruby – クラスの継承について

2020-10-02 / tau / コメントする

標準的

親クラスのメソッドは子クラスから利用可能で、子クラス独自のメソッド定義が可能。

class Vehicle
  def activate()
    puts "始動しました"
  end

  def deactivate()
    puts "機能停止しました"
  end
end

class Car < Vehicle
  def run()
    puts "走行します"
  end

  def stop()
    puts "停車します"
  end
end

class Airplane < Vehicle
  def fly()
    puts "飛行します"
  end

  def make_landing()
    puts "着陸します"
  end
end

volkswagen = Car.new
volkswagen.activate
volkswagen.run
volkswagen.stop
volkswagen.deactivate

# 始動しました
# 走行します
# 停車します
# 機能停止しました

a300 = Airplane.new
a300.activate
a300.fly
a300.make_landing
a300.deactivate

# 始動しました
# 飛行します
# 着陸します
# 機能停止しました

class Vehicle

def activate()

puts "始動しました"

end

def deactivate()

puts "機能停止しました"

end

class Car < Vehicle

def run()

puts "走行します"

end

def stop()

puts "停車します"

end

class Airplane < Vehicle

def fly()

puts "飛行します"

end

def make_landing()

puts "着陸します"

end

volkswagen = Car.new

volkswagen.activate

volkswagen.run

volkswagen.stop

volkswagen.deactivate

# 始動しました

# 走行します

# 停車します

# 機能停止しました

a300 = Airplane.new

a300.activate

a300.fly

a300.make_landing

a300.deactivate

# 始動しました

# 飛行します

# 着陸します

# 機能停止しました

親クラスのメソッドの、子クラスでのオーバーライドも普通。

class MobileBanking
  def authenticate()
    puts "パスワード認証"
  end
end

class SaferBanking < MobileBanking
  def authenticate()
    puts "生体認証->認証番号確認"
  end
end

bank_system = MobileBanking.new
bank_system.authenticate

# パスワード認証

safer_system = SaferBanking.new
safer_system.authenticate

# 生体認証->認証番号確認

class MobileBanking

def authenticate()

puts "パスワード認証"

end

class SaferBanking < MobileBanking

def authenticate()

puts "生体認証->認証番号確認"

end

bank_system = MobileBanking.new

bank_system.authenticate

# パスワード認証

safer_system = SaferBanking.new

safer_system.authenticate

# 生体認証->認証番号確認

super

親クラスのメソッドの呼び出し

メソッドをオーバーライドするとき、superを使うと親クラスの同じ名前のメソッドを呼び出せる。

以下の例では、子クラスのメソッドが親クラスのメソッドをオーバーライドしつつ、その中で親クラスのメソッドをsuperで呼び出している。その結果、まず親クラスのメソッドが実行され、次に子クラスのメソッドで定義された処理が実行されている。

class MobileBanking
  def authenticate()
    puts "パスワード認証"
  end
end

class SaferBanking < MobileBanking
  def authenticate()
    super
    puts "第二暗唱番号認証"
  end
end

safer_system = SaferBanking.new
safer_system.authenticate

# パスワード認証
# 第二暗唱番号認証

class MobileBanking

def authenticate()

puts "パスワード認証"

end

class SaferBanking < MobileBanking

def authenticate()

super

puts "第二暗唱番号認証"

end

safer_system = SaferBanking.new

safer_system.authenticate

# パスワード認証

# 第二暗唱番号認証

initialize内でのsuper

initialize()でもsuperを使える。以下の例での流れは次の通り。

子クラスBirdインスタンスbirdの生成時、Birdのコンストラクターが実行される
Birdのコンストラクターはsuperで親クラスCreatureのコンストラクターを呼び出して実行し、親クラスのプロパティー@num_legsに2をセット
その後子クラスBirdのコンストラクターで子クラスのプロパティー@num_wingsに2をセット
bird.form()は子クラスのform()メソッドを呼び出し
Birdのform()メソッドはsuperで親クラスCreatureのform()メソッドを呼び出し、@num_legsを表示
その後Birdのform()メソッドで@num_wingsを表示

class Creature
  def initialize(num_legs)
    @num_legs = num_legs
  end

  def form()
    puts "#{@num_legs} legs"
  end
end

class Bird < Creature
  def initialize()
    super(2)
    @num_wings = 2
  end

  def form()
    super
    puts "#{@num_wings} wings"
  end
end

bird = Bird.new
bird.form

# 2 legs
# 2 wings

class Creature

def initialize(num_legs)

@num_legs = num_legs

end

def form()

puts "#{@num_legs} legs"

end

class Bird < Creature

def initialize()

super(2)

@num_wings = 2

end

def form()

super

puts "#{@num_wings} wings"

end

bird = Bird.new

bird.form

# 2 legs

# 2 wings

public/protected/private

これらの挙動はC++やJavaにおける挙動と一部で異なる。詳しくはこちら。

概要

挙動

特徴

概要

挙動

特徴

概要

挙動

特徴

概要

挙動

特徴

概要

get_printoptions()

set_printoptions()

省略表示

thresholdとedgeitems

数値の書式

supress

precision

floatmode

maxprec

maxprec_equal

fixed

unique

formatter

基本形

initialize()～コンストラクター

メソッド

インスタンス変数へのアクセス

クラス変数・クラスメソッド

クラスの継承

標準的

super

親クラスのメソッドの呼び出し

initialize内でのsuper

public/protected/private

`get_printoptions()`

`set_printoptions()`

`threshold`と`edgeitems`

`supress`

`precision`

`floatmode`

`maxprec`

`maxprec_equal`

`fixed`

`unique`

`formatter`