/> relative_path_to_root = ../../../../..

File Metadata for data/test-crawler/test-crawler/files/abalone2.csv

Data Details for data/test-crawler/test-crawler/files/abalone2.csv

Last Modified: December 02, 2024 @ 14:10

Size in Bytes: 259,397

File Path: data/test-crawler/test-crawler/files/abalone2.csv

File Source: s3

Quality / Validation Overview:

Locate Data

File Validation

  • Schema File "schemas/abalone_schema.json" assigned to "data/test-crawler/test-crawler/files/abalone2.csv"

Data Validation

  • The cell "nan" in row at position "2" and field "Shell weight" at position "9" does not conform to a constraint: constraint "minimum" is "0"
  • The cell "None" in row at position "2" and field "No Correlation" at position "12" does not conform to a constraint: constraint "required" is "True"
  • The cell "I" in row at position "3" and field "No Correlation" at position "12" does not conform to a constraint: constraint "enum" is "['A', 'B', 'C']"
  • The cell "nan" in row at position "5" and field "Whole weight" at position "6" does not conform to a constraint: constraint "minimum" is "0"
  • The cell "nan" in row at position "5" and field "Shell weight" at position "9" does not conform to a constraint: constraint "minimum" is "0"

Data Quality

  • Column "No Correlation" is missing > 95% of values
  • Column "Is_Large" has only one value

File Overview for data/test-crawler/test-crawler/files/abalone2.csv

Missingness

The missingness plot shows the proportion of missing values for each field in the file. Fields with a high proportion of missing values may be less useful for analysis. This plot may additionally help identify patterns in missingness across fields.

Correlations

Correlations

Field Correlation

Field Correlation

Field Correlation

Field Correlation

Field Correlation

Field Overview for data/test-crawler/test-crawler/files/abalone2.csv

Obs

Data Type: integer

Count: 4177

Mean: 2088

Standard Deviation: 1206

Minimum: 0

25th Percentile: 1044

Median: 2088

75th Percentile: 3132

Maximum: 4176

Missing: 0

Percent Missing: 0

Unique: 4177

Percent Unique: 1

Highest Precision: 4

Average Precision: 3.734

Lowest Precision: 1

Sex

Data Type: text

Most Frequent Characters:
e: 5486 n: 4025 a: 2836 l: 2835 M: 1528
Most Frequent Numbers: No values available
Most Frequent Punctuation: No values available
Most Frequent Words:
Male: 1528 SexUnknown: 1341 Female: 1307 Indeterminate: 1

Average Word Length: 6.6

Standard Deviation Word Length: 2.5

Average Sentence Length: 6.6

Standard Deviation Sentence Length: 2.5

Count: 4.2e+03

Unique: 4

Percent Unique: 0.00096

Missing: 0

Percent Missing: 0

Length

Data Type: integer

Count: 4.18e+03

Mean: 0.524

Standard Deviation: 0.12

Minimum: 0

25th Percentile: 0.45

Median: 0.545

75th Percentile: 0.615

Maximum: 0

Missing: 0

Percent Missing: 0

Unique: 134

Percent Unique: 0.0321

Highest Precision: 3

Average Precision: 2.41

Lowest Precision: 1

Diameter

Data Type: integer

Count: 4.18e+03

Mean: 0.408

Standard Deviation: 0.0992

Minimum: 0

25th Percentile: 0.35

Median: 0.425

75th Percentile: 0.48

Maximum: 0

Missing: 0

Percent Missing: 0

Unique: 111

Percent Unique: 0.0266

Highest Precision: 3

Average Precision: 2.4

Lowest Precision: 1

Height

Data Type: integer

Count: 4.18e+03

Mean: 0.14

Standard Deviation: 0.0418

Minimum: 0

25th Percentile: 0.115

Median: 0.14

75th Percentile: 0.165

Maximum: 1

Missing: 0

Percent Missing: 0

Unique: 51

Percent Unique: 0.0122

Highest Precision: 3

Average Precision: 2.44

Lowest Precision: 1

Whole weight

Data Type: integer

Count: 4177

Mean: 0.8302

Standard Deviation: 0.4915

Minimum: 0

25th Percentile: 0.445

Median: 0.8023

75th Percentile: 1.153

Maximum: 2

Missing: 663

Percent Missing: 0.1587

Unique: 2218

Percent Unique: 0.6309

Highest Precision: 4

Average Precision: 2.886

Lowest Precision: 0

Shucked weight

Data Type: integer

Count: 4177

Mean: 0.3594

Standard Deviation: 0.222

Minimum: 0

25th Percentile: 0.186

Median: 0.336

75th Percentile: 0.502

Maximum: 1

Missing: 0

Percent Missing: 0

Unique: 1515

Percent Unique: 0.3627

Highest Precision: 4

Average Precision: 3.429

Lowest Precision: 1

Viscera weight

Data Type: integer

Count: 4177

Mean: 0.1851

Standard Deviation: 0.1098

Minimum: 0

25th Percentile: 0.098

Median: 0.1745

75th Percentile: 0.2595

Maximum: 0

Missing: 1000

Percent Missing: 0.2394

Unique: 850

Percent Unique: 0.2672

Highest Precision: 4

Average Precision: 2.624

Lowest Precision: 0

Shell weight

Data Type: integer

Count: 4177

Mean: 0.2401

Standard Deviation: 0.1412

Minimum: 0

25th Percentile: 0.13

Median: 0.235

75th Percentile: 0.33

Maximum: 1

Missing: 848

Percent Missing: 0.203

Unique: 857

Percent Unique: 0.2571

Highest Precision: 4

Average Precision: 2.295

Lowest Precision: 0

Rings

Data Type: integer

Count: 4.2e+03

Mean: 9.9

Standard Deviation: 3.2

Minimum: 1

25th Percentile: 8

Median: 9

75th Percentile: 11

Maximum: 29

Missing: 0

Percent Missing: 0

Unique: 28

Percent Unique: 0.0067

Highest Precision: 2

Average Precision: 1.5

Lowest Precision: 1

Some Correlation

Data Type: categorical

Count: 4.2e+03

Missing: 0

Percent Missing: 0

Unique: 3

Unique Ratio: 0.00072

Most Common Value: A

Most Common Value Count: 1.8e+03

Most Common Value Ratio: 0.44

Least Common Value: C

Least Common Value Count: 5.5e+02

Least Common Value Ratio: 0.13

No Correlation

Data Type: categorical

Count: 2

Missing: 4.2e+03

Percent Missing: 1

Unique: 2

Unique Ratio: 1

Most Common Value: B

Most Common Value Count: 1

Most Common Value Ratio: 0.5

Least Common Value: B

Least Common Value Count: 1

Least Common Value Ratio: 0.5

Is_Large

Data Type: boolean

Count: 4.2e+03

Most Frequent: False

True/False Ratio: 0

Missing: 0

Percent Missing: 0