clintk.utils.outliers module¶

Scripts to remove the outliers and na values from the different tables

To be used for the values that are mistyped

class clintk.utils.outliers.OutlierRemover(dic_path, inplace=True)[source]¶

removes outliers and replaces them by value given in dic_path or by imputing the column mean value

Parameters:	dic_path (str) – path to the dictionary containing outliers information inplace (bool, default=True) – True to perform the transformation inplace False to do it on a copy of the dataframe

clintk.utils.outliers.impute_col(X, lbound, ubound, impute)[source]¶

imputes missing and mistyped values of one col of the dataframe

Parameters:

X (iterable, array-like) – column to which we want to impute missing values

name of the column

lbound : float: lower bound for normal values
ubound : float: upper bound for normal values
impute : float or None: if float is given, replaces outlier by the given value if None, the mean value is returned

Returns:	df.col_name except its wrong values are imputed according to strategy
Return type:	df.Series

clintk.utils.outliers.impute_df(df, dic_path, inplace)[source]¶

cleans the df from missing/mistyped values

Parameters:	df (pd.DataFrame) dic_path (str) – path containing name of the columns to clean and the upper/lower limits to consider point as outlier and optionnal third value is the imputing value inplace (bool) – if True, performs the transformation inline
Returns:
Return type:	pd.DataFrame