clintk.utils.outliers module¶
Scripts to remove the outliers and na values from the different tables
To be used for the values that are mistyped
-
class
clintk.utils.outliers.
OutlierRemover
(dic_path, inplace=True)[source]¶ Bases:
object
removes outliers and replaces them by value given in dic_path or by imputing the column mean value
Parameters: - dic_path (str) – path to the dictionary containing outliers information
- inplace (bool, default=True) – True to perform the transformation inplace False to do it on a copy of the dataframe
-
clintk.utils.outliers.
impute_col
(X, lbound, ubound, impute)[source]¶ imputes missing and mistyped values of one col of the dataframe
Parameters: X (iterable, array-like) – column to which we want to impute missing values
name of the column
- lbound : float
- lower bound for normal values
- ubound : float
- upper bound for normal values
- impute : float or None
- if float is given, replaces outlier by the given value if None, the mean value is returned
Returns: df.col_name except its wrong values are imputed according to strategy Return type: df.Series
-
clintk.utils.outliers.
impute_df
(df, dic_path, inplace)[source]¶ cleans the df from missing/mistyped values
Parameters: - df (pd.DataFrame)
- dic_path (str) – path containing name of the columns to clean and the upper/lower limits to consider point as outlier and optionnal third value is the imputing value
- inplace (bool) – if True, performs the transformation inline
Returns: Return type: pd.DataFrame