gitlab2pandas package¶
gitlab2pandas.core module¶
- class gitlab2pandas.core.Core(data_root_dir: str, project: Optional[gitlab.v4.objects.projects.Project] = None, project_namespace: Optional[str] = None, project_name: Optional[str] = None)[source]¶
Bases:
objectInitializes core object with general information. Decide wheather to initialize with a project object or with the project namespace and name. Extractions can only be done with a project object or after connecting to a server with the project namespace and name.
- Parameters
data_root_dir (str) – A existing top level directory for data extraction.
project (Project, default=None) – Project object from gitlab.
project_namespace (str, default=None) – Namespace of the project.
project_name (str, default=None) – Name of the project.
- class Features[source]¶
Bases:
object- BRANCHES = 'Branches'¶
- COMMITS = 'Commits'¶
- COMMITS_COMMENTS = 'CommitsComments'¶
- COMMITS_DIFFS = 'CommitsDiffs'¶
- COMMITS_REFS = 'CommitsRefs'¶
- COMMITS_STATUSES = 'CommitStatuses'¶
- EVENTS = 'Events'¶
- ISSUES = 'Issues'¶
- ISSUES_AWARD_EMOJIS = 'IssuesAwardEmojis'¶
- ISSUES_CLOSED_BY_MR = 'IssuesClosedByMR'¶
- ISSUES_LINKS = 'IssuesLinks'¶
- ISSUES_NOTES = 'IssuesNotes'¶
- ISSUES_NOTES_AWARD_EMOJIS = 'IssuesNotesAwardEmojis'¶
- ISSUES_RELATED_MR = 'IssuesRelatedMR'¶
- ISSUES_RESOURCELABELEVENTS = 'IssuesResourcelabelevents'¶
- ISSUES_RESOURCEMILESTONESEVENTS = 'IssuesResourcemilestonesevents'¶
- ISSUES_RESOURCESTATEEVENTS = 'IssuesResourcestateevents'¶
- ISSUE_BOARDS = 'IssueBoards'¶
- ISSUE_BOARDS_LISTS = 'IssueBoardsLists'¶
- JOBS = 'Jobs'¶
- LABELS = 'Labels'¶
- MERGE_REQUESTS = 'MergeRequests'¶
- MERGE_REQUESTS_AWARD_EMOJIS = 'MRsAwardEmojis'¶
- MERGE_REQUESTS_CHANGES = 'MRsChanges'¶
- MERGE_REQUESTS_COMMITS = 'MRsCommits'¶
- MERGE_REQUESTS_DIFFS = 'MRsDiffs'¶
- MERGE_REQUESTS_NOTES = 'MRsNotes'¶
- MERGE_REQUESTS_NOTES_AWARD_EMOJIS = 'MRsNotesAwardEmojis'¶
- MERGE_REQUESTS_RESOURCELABELEVENTS = 'MRsResourcelabelevents'¶
- MERGE_REQUESTS_RESOURCEMILESTONESEVENTS = 'MRsResourcemilestonesevents'¶
- MERGE_REQUESTS_RESOURCESTATEEVENTS = 'MRsResourcestateevents'¶
- MILESTONES = 'Milestones'¶
- PIPELINES = 'Pipelines'¶
- PIPELINES_BRIDGES = 'PipelinesBridges'¶
- PIPELINES_REPORT = 'PipelinesReport'¶
- PIPELINE_SCHEDULES = 'PipelineSchedules'¶
- PROJECTS = 'Projects'¶
- RELEASES = 'Releases'¶
- RUNNERS = 'Runners'¶
- RUNNERS_JOBS = 'RunnersJobs'¶
- SNIPPETS = 'Snippets'¶
- TRIGGERS = 'Triggers'¶
- USERS = 'Users'¶
- WIKIS = 'Wikis'¶
- connect(server_url: str, private_token: Optional[str] = None, oauth_token: Optional[str] = None, job_token: Optional[str] = None) None[source]¶
Get the project object from GitLab and using the project namespace and name. Only public projects can be accessed (read-only) without a token. Extraction can be done after a connection.
- Parameters
server_url (str) – Url to the GitLab server.
private_token (str, default=None) – Private token or personal token for authentication.
project_name (str, default=None) – Oauth token for authentication
project_name – Job token for authentication (to be used in CI).
- convert_to_excel(excel_filename, features: Optional[list] = None) None[source]¶
Converts features to an excel file. If no features are passed, then all features will be converted.
- Parameters
excel_filename (str) – Name for the file.
features (list, default=None) – Features to convert. If no features are passed, then all features will be converted.
- get_pandas_data_frame(filename: str) Optional[pandas.core.frame.DataFrame][source]¶
Get a pandas DataFrame from the project directory. The project metadata will be excessed from the top level directory.
- Parameters
filename (str) – Name of the file to import.
- Returns
DataFrame – Return a DataFrame of the existing file.
None – Return None because the file does not exists.
- get_pandas_data_frame_path(filename: str) Optional[pathlib.Path][source]¶
Get a pandas DataFrame path from the project directory. The project metadata will be excessed from the top level directory.
- Parameters
filename (str) – Name of the feature to get the file path.
- Returns
Path – Return a str path of the feature.
None – Return None because the file does not exists.
- save_as_pandas(filename: str, data: pandas.core.frame.DataFrame) None[source]¶
Saves a pandas DataFrame to the project directory. The project metadata will be saved in the top level directory with a filename as pandas file.
- Parameters
filename (str) – Name for the file.
data (pd.DataFrame) – DataFrame to be saved.
- set_input_type(input_file_type: str) bool[source]¶
Set the input type file and check if the file type is supported by gitlab2pandas. Input file type is needed for the update feature.
- Parameters
input_file_type (str) – File ending of the desired input type.
- Returns
Return if the input file type was changed.
- Return type
bool
- set_output_type(output_file_type: str) bool[source]¶
Set the output type file and check if the file type is supported by gitlab2pandas. Output file type is needed for the automatically dataframe storage of the extrations.
- Parameters
output_file_type (str) – File ending of the desired input type.
- Returns
Return if the input file type was changed.
- Return type
bool
gitlab2pandas.extractions module¶
- class gitlab2pandas.extractions.Extractions(data_root_dir: str, project=None, project_namespace=None, project_name=None, extract_parallel=False)[source]¶
Bases:
gitlab2pandas.core.CoreInitializes extractions object with general information. Decide wheather to initialize with a project object or with the project namespace and name. Extractions can only be done with a project object or after connecting to a server with the project namespace and name.
- Parameters
data_root_dir (str) – A existing top level directory for data extraction.
project (Project, default=None) – Project object from gitlab.
project_namespace (str, default=None) – Namespace of the project.
project_name (str, default=None) – Name of the project.
extract_parallel (bool, default=False) – Parallel extraction might fail for some GitLab Server because of server settings.
- EXTRACTIONS_WITHOUT_UPDATE = ['Branches', 'IssueBoards', 'Labels', 'Milestones', 'Projects', 'Releases', 'Snippets', 'Users', 'Wikis', 'Triggers']¶
- extract_commits() None[source]¶
Extracts commits and its sub features from GitLab. Check for update works.
- extract_issue_boards() None[source]¶
Extracts issue boards from GitLab. Check for update does not work.
- extract_issues() None[source]¶
Extracts issues and its sub features from GitLab. Check for update works.
- extract_jobs() None[source]¶
Extracts jobs from GitLab. Check for update works. If updated, then jobs will be extract in pipelines.
- extract_merge_requests() None[source]¶
Extracts merge requests and its sub features from GitLab. Check for update works.
- extract_pipeline_schedules() None[source]¶
Extracts pipeline schedules for pipelines from GitLab. Check for update does not work.
- extract_pipelines() None[source]¶
Extracts pipelines and its sub features from GitLab. Check for update works. If updated, then it will extract jobs, too.
- extract_project() None[source]¶
Extracts general project information from GitLab. Check for update does not work.
- extract_triggers() None[source]¶
Extracts triggers for pipelines from GitLab. Check for update does not work.
- pass_white_black_list(feature) bool[source]¶
Checks if a feature passes the white- and blacklist.
- Parameters
feature (str) – Feature to be checked.
- Returns
True if the feature can be extracted. False if the feature should be ignored.
- Return type
bool
- start(feature_blacklist: list = [], feature_whitelist: list = [], update: bool = True) None[source]¶
Starts a extraction with a blacklist or whitelist for features. The extraction can start from the last commit date or the entire project.
- Parameters
feature_blacklist (list, default=[]) – Features which will be ignored.
feature_whitelist (list, default=[]) – Features which will be extracted. If its empty then all features are extracted which are not in the blacklist.
update (bool, default=True) – Extract only new items after last extration.
gitlab2pandas.gitlab2pandas module¶
- class gitlab2pandas.gitlab2pandas.GitLab2Pandas(data_root_dir: str, project: Optional[gitlab.v4.objects.projects.Project] = None, project_namespace: Optional[str] = None, project_name: Optional[str] = None)[source]¶
Bases:
gitlab2pandas.core.Core- extract_data(extract_parallel: bool = False, feature_blacklist: list = [], feature_whitelist: list = [], update: bool = True) None[source]¶
Extracts GitLab data based on the feature black- or whitelist Parallel extraction might fail for some GitLab Server because of server settings.
- Parameters
extract_parallel (bool, default=False) – Extracting the data parallel.
feature_blacklist (list, default=[]) – Features which will be ignored.
feature_whitelist (list, default=[]) – Features which will be extracted. If its empty then all features are extracted which are not in the blacklist.
update (bool, default=True) – Extract only new items after last extration.
gitlab2pandas.processing module¶
- class gitlab2pandas.processing.Processing(data_root_dir: str, project: Optional[gitlab.v4.objects.projects.Project] = None, project_namespace: Optional[str] = None, project_name: Optional[str] = None)[source]¶
Bases:
gitlab2pandas.core.Core