gitlab2pandas package

gitlab2pandas.core module

class gitlab2pandas.core.Core(data_root_dir: str, project: Optional[gitlab.v4.objects.projects.Project] = None, project_namespace: Optional[str] = None, project_name: Optional[str] = None)[source]

Bases: object

Initializes core object with general information. Decide wheather to initialize with a project object or with the project namespace and name. Extractions can only be done with a project object or after connecting to a server with the project namespace and name.

Parameters
  • data_root_dir (str) – A existing top level directory for data extraction.

  • project (Project, default=None) – Project object from gitlab.

  • project_namespace (str, default=None) – Namespace of the project.

  • project_name (str, default=None) – Name of the project.

class Features[source]

Bases: object

BRANCHES = 'Branches'
COMMITS = 'Commits'
COMMITS_COMMENTS = 'CommitsComments'
COMMITS_DIFFS = 'CommitsDiffs'
COMMITS_REFS = 'CommitsRefs'
COMMITS_STATUSES = 'CommitStatuses'
EVENTS = 'Events'
ISSUES = 'Issues'
ISSUES_AWARD_EMOJIS = 'IssuesAwardEmojis'
ISSUES_CLOSED_BY_MR = 'IssuesClosedByMR'
ISSUES_NOTES = 'IssuesNotes'
ISSUES_NOTES_AWARD_EMOJIS = 'IssuesNotesAwardEmojis'
ISSUES_RESOURCELABELEVENTS = 'IssuesResourcelabelevents'
ISSUES_RESOURCEMILESTONESEVENTS = 'IssuesResourcemilestonesevents'
ISSUES_RESOURCESTATEEVENTS = 'IssuesResourcestateevents'
ISSUE_BOARDS = 'IssueBoards'
ISSUE_BOARDS_LISTS = 'IssueBoardsLists'
JOBS = 'Jobs'
LABELS = 'Labels'
MERGE_REQUESTS = 'MergeRequests'
MERGE_REQUESTS_AWARD_EMOJIS = 'MRsAwardEmojis'
MERGE_REQUESTS_CHANGES = 'MRsChanges'
MERGE_REQUESTS_COMMITS = 'MRsCommits'
MERGE_REQUESTS_DIFFS = 'MRsDiffs'
MERGE_REQUESTS_NOTES = 'MRsNotes'
MERGE_REQUESTS_NOTES_AWARD_EMOJIS = 'MRsNotesAwardEmojis'
MERGE_REQUESTS_RESOURCELABELEVENTS = 'MRsResourcelabelevents'
MERGE_REQUESTS_RESOURCEMILESTONESEVENTS = 'MRsResourcemilestonesevents'
MERGE_REQUESTS_RESOURCESTATEEVENTS = 'MRsResourcestateevents'
MILESTONES = 'Milestones'
PIPELINES = 'Pipelines'
PIPELINES_BRIDGES = 'PipelinesBridges'
PIPELINES_REPORT = 'PipelinesReport'
PIPELINE_SCHEDULES = 'PipelineSchedules'
PROJECTS = 'Projects'
RELEASES = 'Releases'
RUNNERS = 'Runners'
RUNNERS_JOBS = 'RunnersJobs'
SNIPPETS = 'Snippets'
TRIGGERS = 'Triggers'
USERS = 'Users'
WIKIS = 'Wikis'
classmethod to_list() list[source]

Returns a list of strings with all Features.

Returns

A list of strings with all Features.

Return type

list

class FileTypes[source]

Bases: object

JSON = '.json'
PANDAS = '.p'
connect(server_url: str, private_token: Optional[str] = None, oauth_token: Optional[str] = None, job_token: Optional[str] = None) None[source]

Get the project object from GitLab and using the project namespace and name. Only public projects can be accessed (read-only) without a token. Extraction can be done after a connection.

Parameters
  • server_url (str) – Url to the GitLab server.

  • private_token (str, default=None) – Private token or personal token for authentication.

  • project_name (str, default=None) – Oauth token for authentication

  • project_name – Job token for authentication (to be used in CI).

convert_to_excel(excel_filename, features: Optional[list] = None) None[source]

Converts features to an excel file. If no features are passed, then all features will be converted.

Parameters
  • excel_filename (str) – Name for the file.

  • features (list, default=None) – Features to convert. If no features are passed, then all features will be converted.

get_pandas_data_frame(filename: str) Optional[pandas.core.frame.DataFrame][source]

Get a pandas DataFrame from the project directory. The project metadata will be excessed from the top level directory.

Parameters

filename (str) – Name of the file to import.

Returns

  • DataFrame – Return a DataFrame of the existing file.

  • None – Return None because the file does not exists.

get_pandas_data_frame_path(filename: str) Optional[pathlib.Path][source]

Get a pandas DataFrame path from the project directory. The project metadata will be excessed from the top level directory.

Parameters

filename (str) – Name of the feature to get the file path.

Returns

  • Path – Return a str path of the feature.

  • None – Return None because the file does not exists.

save_as_pandas(filename: str, data: pandas.core.frame.DataFrame) None[source]

Saves a pandas DataFrame to the project directory. The project metadata will be saved in the top level directory with a filename as pandas file.

Parameters
  • filename (str) – Name for the file.

  • data (pd.DataFrame) – DataFrame to be saved.

set_input_type(input_file_type: str) bool[source]

Set the input type file and check if the file type is supported by gitlab2pandas. Input file type is needed for the update feature.

Parameters

input_file_type (str) – File ending of the desired input type.

Returns

Return if the input file type was changed.

Return type

bool

set_output_type(output_file_type: str) bool[source]

Set the output type file and check if the file type is supported by gitlab2pandas. Output file type is needed for the automatically dataframe storage of the extrations.

Parameters

output_file_type (str) – File ending of the desired input type.

Returns

Return if the input file type was changed.

Return type

bool

gitlab2pandas.extractions module

class gitlab2pandas.extractions.Extractions(data_root_dir: str, project=None, project_namespace=None, project_name=None, extract_parallel=False)[source]

Bases: gitlab2pandas.core.Core

Initializes extractions object with general information. Decide wheather to initialize with a project object or with the project namespace and name. Extractions can only be done with a project object or after connecting to a server with the project namespace and name.

Parameters
  • data_root_dir (str) – A existing top level directory for data extraction.

  • project (Project, default=None) – Project object from gitlab.

  • project_namespace (str, default=None) – Namespace of the project.

  • project_name (str, default=None) – Name of the project.

  • extract_parallel (bool, default=False) – Parallel extraction might fail for some GitLab Server because of server settings.

EXTRACTIONS_WITHOUT_UPDATE = ['Branches', 'IssueBoards', 'Labels', 'Milestones', 'Projects', 'Releases', 'Snippets', 'Users', 'Wikis', 'Triggers']
extract_branches() None[source]

Extracts branches from GitLab. Check for update does not work.

extract_commits() None[source]

Extracts commits and its sub features from GitLab. Check for update works.

extract_events() None[source]

Extracts events from GitLab. Check for update works.

extract_issue_boards() None[source]

Extracts issue boards from GitLab. Check for update does not work.

extract_issues() None[source]

Extracts issues and its sub features from GitLab. Check for update works.

extract_jobs() None[source]

Extracts jobs from GitLab. Check for update works. If updated, then jobs will be extract in pipelines.

extract_labels() None[source]

Extracts labels from GitLab. Check for update does not work.

extract_merge_requests() None[source]

Extracts merge requests and its sub features from GitLab. Check for update works.

extract_milestones() None[source]

Extracts milestones from GitLab. Check for update does not work.

extract_pipeline_schedules() None[source]

Extracts pipeline schedules for pipelines from GitLab. Check for update does not work.

extract_pipelines() None[source]

Extracts pipelines and its sub features from GitLab. Check for update works. If updated, then it will extract jobs, too.

extract_project() None[source]

Extracts general project information from GitLab. Check for update does not work.

extract_releases() None[source]

Extracts releases from GitLab. Check for update does not work.

extract_snippets() None[source]

Extracts snippets from GitLab. Check for update does not work.

extract_triggers() None[source]

Extracts triggers for pipelines from GitLab. Check for update does not work.

extract_users() None[source]

Extracts users from GitLab. Check for update does not work.

extract_wikis() None[source]

Extracts wiki pages from GitLab. Check for update does not work.

pass_white_black_list(feature) bool[source]

Checks if a feature passes the white- and blacklist.

Parameters

feature (str) – Feature to be checked.

Returns

True if the feature can be extracted. False if the feature should be ignored.

Return type

bool

start(feature_blacklist: list = [], feature_whitelist: list = [], update: bool = True) None[source]

Starts a extraction with a blacklist or whitelist for features. The extraction can start from the last commit date or the entire project.

Parameters
  • feature_blacklist (list, default=[]) – Features which will be ignored.

  • feature_whitelist (list, default=[]) – Features which will be extracted. If its empty then all features are extracted which are not in the blacklist.

  • update (bool, default=True) – Extract only new items after last extration.

gitlab2pandas.gitlab2pandas module

class gitlab2pandas.gitlab2pandas.GitLab2Pandas(data_root_dir: str, project: Optional[gitlab.v4.objects.projects.Project] = None, project_namespace: Optional[str] = None, project_name: Optional[str] = None)[source]

Bases: gitlab2pandas.core.Core

extract_data(extract_parallel: bool = False, feature_blacklist: list = [], feature_whitelist: list = [], update: bool = True) None[source]

Extracts GitLab data based on the feature black- or whitelist Parallel extraction might fail for some GitLab Server because of server settings.

Parameters
  • extract_parallel (bool, default=False) – Extracting the data parallel.

  • feature_blacklist (list, default=[]) – Features which will be ignored.

  • feature_whitelist (list, default=[]) – Features which will be extracted. If its empty then all features are extracted which are not in the blacklist.

  • update (bool, default=True) – Extract only new items after last extration.

gitlab2pandas.processing module

class gitlab2pandas.processing.Processing(data_root_dir: str, project: Optional[gitlab.v4.objects.projects.Project] = None, project_namespace: Optional[str] = None, project_name: Optional[str] = None)[source]

Bases: gitlab2pandas.core.Core

replace_user_id()[source]

Replace user_ids with a pseudonym geneerated by human-id. There might be some user_ids in commits that are not connected to a User.