clintk.utils.outliers module

Scripts to remove the outliers and na values from the different tables

To be used for the values that are mistyped

class clintk.utils.outliers.OutlierRemover(dic_path, inplace=True)[source]

Bases: object

removes outliers and replaces them by value given in dic_path or by imputing the column mean value

Parameters:
  • dic_path (str) – path to the dictionary containing outliers information
  • inplace (bool, default=True) – True to perform the transformation inplace False to do it on a copy of the dataframe
fit(X, y=None)[source]
transform(X, y=None)[source]
clintk.utils.outliers.impute_col(X, lbound, ubound, impute)[source]

imputes missing and mistyped values of one col of the dataframe

Parameters:

X (iterable, array-like) – column to which we want to impute missing values

name of the column

lbound : float
lower bound for normal values
ubound : float
upper bound for normal values
impute : float or None
if float is given, replaces outlier by the given value if None, the mean value is returned
Returns:df.col_name except its wrong values are imputed according to strategy
Return type:df.Series
clintk.utils.outliers.impute_df(df, dic_path, inplace)[source]

cleans the df from missing/mistyped values

Parameters:
  • df (pd.DataFrame)
  • dic_path (str) – path containing name of the columns to clean and the upper/lower limits to consider point as outlier and optionnal third value is the imputing value
  • inplace (bool) – if True, performs the transformation inline
Returns:

Return type:

pd.DataFrame