177-3、实验数据处理import pandas as pd# 每次实验的测量结果results = [5.2, 5.3, 5.1, 5.4, 5.3, 5.5]df = pd.DataFrame(results, columns=['experiment'])# 计算实验结果的标准差std_dev = df['experiment'].std()print(f"实验结果的标准差为: {std_dev}")# 179-5、特征工程import pandas as pdimport numpy as npfrom sklearn.preprocessing import PowerTransformer# 创建一个带偏度的特征df_feature = pd.DataFrame({'D': np.random.exponential(scale=1, size=1000)})# 计算原始特征的偏度original_skewness = df_feature['D'].skew()# 使用PowerTransformer进行变换pt = PowerTransformer()df_transformed = pd.DataFrame(pt.fit_transform(df_feature), columns=['D_transformed'])# 计算变换后特征的偏度transformed_skewness = df_transformed['D_transformed'].skew()print(f"Original skewness: {original_skewness:.2f}")print(f"Transformed skewness: {transformed_skewness:.2f}", end='\n\n')# 178-4、pandas.Series.std方法pandas.Series.std(axis=None, skipna=True, ddof=1, numeric_only=False, **kwargs)Return sample standard deviation over requested axis.Normalized by N-1 by default. This can be changed using the ddof argument.Parameters:axis{index (0)}For Series this parameter is unused and defaults to 0.WarningThe behavior of DataFrame.std with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).skipnabool, default TrueExclude NA/null values. If an entire row/column is NA, the result will be NA.ddofint, default 1Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.numeric_onlybool, default FalseInclude only float, int, boolean columns. Not implemented for Series.Returns:scalar or Series (if level specified)NotesTo have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)
176-2-2、pandas.Series.skew方法pandas.Series.skew(axis=0, skipna=True, numeric_only=False, **kwargs)Return unbiased skew over requested axis.Normalized by N-1.Parameters:axis{index (0)}Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.For DataFrames, specifying axis=None will apply the aggregation across both axes.New in version 2.0.0.skipnabool, default TrueExclude NA/null values when computing the result.numeric_onlybool, default FalseInclude only float, int, boolean columns. Not implemented for Series.**kwargsAdditional keyword arguments to be passed to the function.Returns:scalar or scalar
176、pandas.Series.sem方法pandas.Series.sem(axis=None, skipna=True, ddof=1, numeric_only=False, **kwargs)Return unbiased standard error of the mean over requested axis.Normalized by N-1 by default. This can be changed using the ddof argumentParameters:axis{index (0)}For Series this parameter is unused and defaults to 0.WarningThe behavior of DataFrame.sem with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).skipnabool, default TrueExclude NA/null values. If an entire row/column is NA, the result will be NA.ddofint, default 1Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.numeric_onlybool, default FalseInclude only float, int, boolean columns. Not implemented for Series.Returns:scalar or Series (if level specified)
178-6-2、pandas.Series.rank方法pandas.Series.rank(axis=0, method='average', numeric_only=False, na_option='keep', ascending=True, pct=False)Compute numerical data ranks (1 through n) along axis.By default, equal values are assigned a rank that is the average of the ranks of those values.Parameters:axis{0 or ‘index’, 1 or ‘columns’}, default 0Index to direct ranking. For Series this parameter is unused and defaults to 0.method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’How to rank the group of records that have the same value (i.e. ties):average: average rank of the groupmin: lowest rank in the groupmax: highest rank in the groupfirst: ranks assigned in order they appear in the arraydense: like ‘min’, but rank always increases by 1 between groups.numeric_onlybool, default FalseFor DataFrame objects, rank only numeric columns if set to True.Changed in version 2.0.0: The default value of numeric_only is now False.na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’How to rank NaN values:keep: assign NaN rank to NaN valuestop: assign lowest rank to NaN valuesbottom: assign highest rank to NaN valuesascendingbool, default TrueWhether or not the elements should be ranked in ascending order.pctbool, default FalseWhether or not to display the returned rankings in percentile form.Returns:same type as callerReturn a Series or DataFrame with data ranks as values.
180-4、质量控制import pandas as pd# 生产过程中测量的产品尺寸measurements = [10.1, 10.2, 10.1, 10.3, 10.5, 10.2, 10.4]df = pd.DataFrame(measurements, columns=['size'])# 计算尺寸的标准差std_dev = df['size'].std()print(f"产品尺寸的标准差为: {std_dev}")# 179-3、实验数据处理# 实验结果的标准差为: 0.14142135623730964# 179-5、pandas.Series.sum方法pandas.Series.sum(axis=None, skipna=True, numeric_only=False, min_count=0, **kwargs)Return the sum of the values over the requested axis.This is equivalent to the method numpy.sum.Parameters:axis{index (0)}Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.WarningThe behavior of DataFrame.sum with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).New in version 2.0.0.skipnabool, default TrueExclude NA/null values when computing the result.numeric_onlybool, default FalseInclude only float, int, boolean columns. Not implemented for Series.min_countint, default 0The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.**kwargsAdditional keyword arguments to be passed to the function.Returns:scalar or scalar