Split
- exception neer_match_utilities.split.SplitError[source]
- Custom exception for errors in data splitting. 
- neer_match_utilities.split.split_test_train(left, right, matches, test_ratio=0.3, validation_ratio=0.1)[source]
- Splits datasets into training, validation, and testing subsets. - This function ensures that only observations from left and right that are referenced in the matches DataFrame are included in the split process. - Parameters:
- left (pd.DataFrame) – The left dataset to split. 
- right (pd.DataFrame) – The right dataset to split. 
- matches (pd.DataFrame) – A DataFrame containing matching pairs between the left and right datasets. It must include columns ‘left’ and ‘right’, referencing indices in left and right. 
- test_ratio (float, optional) – The proportion of the data to be used for testing (default is 0.3). 
- validation_ratio (float, optional) – The proportion of the data to be used for validation (default is 0.1). 
 
- Returns:
- A tuple containing: - left_train : pd.DataFrame - right_train : pd.DataFrame - matches_train : pd.DataFrame - left_validation : pd.DataFrame - right_validation : pd.DataFrame - matches_validation : pd.DataFrame - left_test : pd.DataFrame - right_test : pd.DataFrame - matches_test : pd.DataFrame 
- Return type:
- tuple 
- Raises:
- SplitError – If the total counts of split subsets do not match the original dataset size.