clintk.utils.outliers module¶
Scripts to remove the outliers and na values from the different tables
To be used for the values that are mistyped
- 
class 
clintk.utils.outliers.OutlierRemover(dic_path, inplace=True)[source]¶ Bases:
objectremoves outliers and replaces them by value given in dic_path or by imputing the column mean value
Parameters: - dic_path (str) – path to the dictionary containing outliers information
 - inplace (bool, default=True) – True to perform the transformation inplace False to do it on a copy of the dataframe
 
- 
clintk.utils.outliers.impute_col(X, lbound, ubound, impute)[source]¶ imputes missing and mistyped values of one col of the dataframe
Parameters: X (iterable, array-like) – column to which we want to impute missing values
name of the column
- lbound : float
 - lower bound for normal values
 - ubound : float
 - upper bound for normal values
 - impute : float or None
 - if float is given, replaces outlier by the given value if None, the mean value is returned
 
Returns: df.col_name except its wrong values are imputed according to strategy Return type: df.Series 
- 
clintk.utils.outliers.impute_df(df, dic_path, inplace)[source]¶ cleans the df from missing/mistyped values
Parameters: - df (pd.DataFrame)
 - dic_path (str) – path containing name of the columns to clean and the upper/lower limits to consider point as outlier and optionnal third value is the imputing value
 - inplace (bool) – if True, performs the transformation inline
 
Returns: Return type: pd.DataFrame